Getting Started
Preprocessing Library contains all the necessary tools to clean the input text before passing it to a model.
Example
from indicLP.preprocessing import TextNormalizer, Embedding language = "ta" normalizerTamil = TextNormalizer(language) embedder = Embedding(lang=language)
Classes Contained
Following are the classes contained in preprocessing module:
- TextNormalizer: Class to perform tasks such as tokenizing, stopword removal and stemming.
- Embedding: Class for performing word embedding and finding closely associated words.
Reference Materials
Following are some reference materials for Preprocessing module
- Tokenizing Sentences: A reference blogpost discussing the usage of TextNormalizer Class in hardcoded string.
- Word Embedding: A reference blogpost discussing the usage of Embedding Class in hardcoded string.