Named Entity Recognition (NER) Application

By Bill Sharlow

Day 5: DIY Natural Language Processing Applications

Welcome back to our NLP adventure! Today, we’re delving into Named Entity Recognition (NER), a powerful technique that enables us to identify and extract named entities such as names of people, organizations, locations, dates, and more from text. In this post, we’ll explore the concepts behind NER, discuss its applications, and build a simple NER application using Python libraries like NLTK and spaCy.

Understanding Named Entity Recognition (NER)

Named Entity Recognition (NER) is a subtask of information extraction that aims to locate and classify named entities mentioned in unstructured text into predefined categories such as person names, organization names, and geographic locations. NER plays a crucial role in various NLP applications, including information retrieval, question answering, and document summarization.

Common Types of Named Entities

Named entities can be categorized into several types, including:

  1. Person: Names of individuals, such as “John Smith” or “Mary Johnson”.
  2. Organization: Names of companies, institutions, or agencies, such as “Google” or “NASA”.
  3. Location: Names of geographic locations, such as “New York City” or “Mount Everest”.
  4. Date: Specific dates or periods of time, such as “January 1, 2022” or “the 20th century”.

Building a Simple NER Application with NLTK

Let’s create a basic NER application using NLTK’s named entity recognition module. In this example, we’ll extract named entities from a sample text.

import nltk

def extract_entities(text):
    tokens = nltk.word_tokenize(text)
    tagged_tokens = nltk.pos_tag(tokens)
    named_entities = nltk.ne_chunk(tagged_tokens)
    return named_entities

# Example usage
sample_text = "Apple is headquartered in Cupertino, California, and was founded by Steve Jobs."
entities = extract_entities(sample_text)
print(entities)

Building a Simple NER Application with spaCy

Now, let’s explore NER using spaCy, a library known for its efficient NLP capabilities.

import spacy

def extract_entities_spacy(text):
    nlp = spacy.load('en_core_web_sm')
    doc = nlp(text)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities

# Example usage
sample_text = "Apple is headquartered in Cupertino, California, and was founded by Steve Jobs."
entities = extract_entities_spacy(sample_text)
print(entities)

Conclusion

In today’s post, we’ve explored the concept of Named Entity Recognition (NER) and its significance in extracting named entities from unstructured text. We’ve built a simple NER application using both NLTK and spaCy, showcasing their capabilities in identifying entities within text.

In the next post, we’ll dive into sentiment analysis, another exciting application of NLP that focuses on understanding the sentiment or emotion expressed in text. Stay tuned for more hands-on examples and insights as we continue our NLP journey!

If you have any questions or thoughts on Named Entity Recognition, feel free to share them in the comments section below. Happy extracting, and see you in the next post!

Leave a Comment