NLP: Part-Of-Speech Tagging
When we try to understand the meaning of a sentence, it is important for us to understand the syntactic relationship between the various words present in the sentence. Sometimes, these relationship between words, can determine the meaning of an ambiguous word. For instance, let us consider the word ‘Like’ in the following two sentences
- I like you.
- I am like you.
In the first sentence, the word ‘like’ is a verb of positive sentiment, whereas in the second sentence ‘like’ acts as an preposition and coveys a different meaning. Therefore, to understand a sentence better, it is important for us to understand the relationship between words.
POS Tagging, is the process of identifying the grammatic tag (part of speech) of the word, based on its definition and/or context. There are various ways, by which we can perform POS Tagging, with the various methods broadly classified into four categories:
- Lexical Technique: Assigning the POS tag, based on the one most frequently occurring with the word in the training data.
- Rule-Based Technique: Identifying and using rules based on previous observations to assign the POS tag of the word. For example, words that end with “ing” or “ed” are in most cases verbs and therefore this can be a rule. Rule-Based Technique are often used with Lexical methods to determine the tag associated with words not previous observed in the training data.
- Probabilistic Technique: In probabilistic methods, the probability of POS Tag sequence occurring is determined and is used to identify the Tag associated with the word. This technique involves methods like Conditional Random Field and Hidden Markov Models.
- Deep Learning Technique: Recurrent Neural Network architectures like LSTM and GRU can be used to extract hidden patterns from the sequence provided to determine the like Tag of the word in question.
POS Tagging using NLTK
We will be using the NLTK library of Python to perform POS Tagging. we will also be using the word_tokenize function, to tokenize the sentence provided into words. Before that, let us make sure we have the required libraries
import nltk
nltk.download('punkt')
from nltk import word_tokenize
Now, we will tokenize the sentence into words. It must be noted here that, we are getting words as tokens because we are using word_tokenize function, however it is possible to tokenize the sentence based on syllable etc.
text = word_tokenize("I shot an elephant in my pajamas.")
print(text)
Finally we will be do POS Tagging on the tokens stored in text.
nltk.pos_tag(text)