Text Summarization Using Gensim

In today’s blogpost, we are going to be discussing about text summarization using the Gensim Library. Text summarization is a very popular task in NLP and plays a crucial role in modern times, due to the abundance of information on web. The intention of text summarization is to represent the large information in a concise manner. The applications on text summarization include media monitoring, video scripting, contract analysis, newsletter summarization etc.

Text summarization can be mainly classified into two categories, namely extraction based and abstraction based summarization.

Extraction Based Summarization

Extraction based summarization generally analyzes and assigns weights to various sections of the sentence and uses these weights to generate summary. Summary is generated using these weights, based on the relevance of the section and removing similar sections.

Figure 1. Extraction Based Text Summarization. Source: Flyodhub.com

Abstraction Based Summarization

Abstraction based summarization the text using a complex deep learning model, which understands the given input text and generate a completely new phrase which summarizes the given input. This can generally be done using attention models, to know which part of the input text needs to be given attention while generating the summary.

Figure 2. Abstraction Based Text Summarization. Source: Flyodhub.com

Implementation

TextRank is insured by the PageRank algorithm, which is generally used to assign weights to web page. PageRank uses this weights to determine on of how important the website is. TextRank similar determines how a part of important a text is important to the meaning of the whole passage is trying to convey. We are going to implementing program to summarize a Wikipedia page using gensim library.

Let us first import the libraries necessary for this.

import gensim
from gensim.summarization import summarize
import wikipedia

Next we are going to get the information from Wikipedia. We are going to try summarize the Youtube page of wikipedia. We are going to this using wikipedia library of python.

search = wikipedia.page("Youtube")
original_text = search.content

Next we will summarize the extracted text from wikipedia using the inbuilt function in gensim library. The summary function gets the most useful and informative sentences from the given input text.

short_summary = summarize(original_text,word_count=100)
print("Summary:")
print(short_summary)

You can find the code for this blogpost in this github repo.

Extraction Based Summarization

Abstraction Based Summarization

Implementation

Let’s Start a Project

Address