Stem - Method

A method of TextNormalizer class to perform stemming on given strings.

Getting Started

Tokenizer function is one of the most necessary method when it comes to preprocessing in NLP.

Example

# coding:utf-8
from indicLP.preprocessing import TextNormalizer

language = "hi"

normalizerHindi = TextNormalizer(language)
text = "खिलाड़ियों"
print(normalizerHindi.stem(text))
# Output is "खिलाड़"

language = "ta"

normalizerTamil = TextNormalizer(language)
text = "பெண்கள்"
print(normalizerTamil.stem(text))
# Output is "பெண்"

Input Arguments

Following are the input arguments to be provided while using stem method:

word: A string word that has to be stemmed.

Reference Materials

Following are some reference materials for Stem Method

Tokenizing Sentences: A reference blogpost discussing the usage of TextNormalizer Class in hardcoded string.
Stemming and Lemmatization: NLP notes on stemming and lemmatization provided by Stanford.