Remove Stop Words - Method

A method to remove stopwords from provided list of words.

Getting Started

remove_stop_words method in Embedding class can be used to perform stopword removal in input list.

Example

# coding:utf-8
from indicLP.preprocessing import TextNormalizer
import re

language = "hi"

normalizerHindi = TextNormalizer(language)

text = " नमस्ते! आप कैसे हैं? मुझे आपकी बहुत याद आयी।"
text = re.split('[।;?\.!]',text)
text = list(filter(None, text))

print(text)
# Output is [' नमस्ते', ' आप कैसे हैं', ' मुझे आपकी बहुत याद आयी']

out = normalizerHindi.tokenizer(text, stem = True)
print(out)
# Output is [['▁नमस्त'], ['▁आप', '▁कैस', '▁हैं'], ['▁मुझ', '▁आपक', '▁बहुत', '▁याद', '▁आय']]

clean_out = []
for i in out:
    clean_out.append(normalizerHindi.remove_stop_words(i))
print(clean_out)
#Output is  [['▁नमस्त'], [], ['▁मुझ', '▁याद', '▁आय']]

Input Arguments

Following are the input arguments to be provided while using Remove Stopwords Method:

wordList: A list of string representing the tokenized sentence.

Reference Materials

Following are some reference materials for Remove Stopwords Method

Tokenizing Sentences: A reference blogpost discussing the usage of TextNormalizer Class in hardcoded string.
Dropping common terms: Stop Words: NLP notes on stopwords removal provided by Stanford.