Getting Started
Tokenizer function is one of the most necessary method when it comes to preprocessing in NLP.
Example
# coding:utf-8 from indicLP.preprocessing import TextNormalizer language = "hi" normalizerHindi = TextNormalizer(language) text = "खिलाड़ियों" print(normalizerHindi.stem(text)) # Output is "खिलाड़" language = "ta" normalizerTamil = TextNormalizer(language) text = "பெண்கள்" print(normalizerTamil.stem(text)) # Output is "பெண்"
Input Arguments
Following are the input arguments to be provided while using stem method:
- word: A string word that has to be stemmed.
Reference Materials
Following are some reference materials for Stem Method
- Tokenizing Sentences: A reference blogpost discussing the usage of TextNormalizer Class in hardcoded string.
- Stemming and Lemmatization: NLP notes on stemming and lemmatization provided by Stanford.