thaitextaug.word2vec

Modules

class thaitextaug.word2vec.Word2VecAug(model: str, tokenize: object, type: str = 'file')
augment(sentence: str, n_sent: int = 1, p: float = 0.7)List[Tuple[str]]
Parameters
  • sentence (str) – text sentence

  • n_sent (int) – max number for synonyms sentence

  • p (int) – probability

Returns

list of synonyms

Return type

List[Tuple[str]]

modify_sent(sent, p=0.7)List[List[str]]
Parameters
  • sent (str) – text sentence

  • p (float) – probability

Return type

List[List[str]]

class thaitextaug.word2vec.BPEmbAug(lang: str = 'th', vs: int = 100000, dim: int = 300)

Thai Text Augment using word2vec from BPEmb

BPEmb: github.com/bheinzerling/bpemb

augment(sentence: str, n_sent: int = 1, p: float = 0.7)List[Tuple[str]]

Text Augment using word2vec from BPEmb

Parameters
  • sentence (str) – thai sentence

  • n_sent (int) – number sentence

  • p (float) – Probability of word

Returns

list of synonyms

Return type

List[Tuple[str]]

load_w2v()

Load BPEmb model

tokenizer(text: str)List[str]
Parameters

text (str) – thai text

Return type

List[str]

class thaitextaug.word2vec.Thai2fitAug

Text Augment using word2vec from Thai2Fit

Thai2Fit: github.com/cstorm125/thai2fit

augment(sentence: str, n_sent: int = 1, p: float = 0.7)List[Tuple[str]]

Text Augment using word2vec from Thai2Fit

Parameters
  • sentence (str) – thai sentence

  • n_sent (int) – number sentence

  • p (float) – Probability of word

Returns

list of text augment

Return type

List[Tuple[str]]

load_w2v()

Load thai2fit word2vec model

tokenizer(text: str)List[str]
Parameters

text (str) – thai text

Return type

List[str]