Motivating Named Entity Recognition
- An
EntityRecognizer
object will assign named entity labels to aSpan
of tokens - A
Span
object is a slice from aDoc
object - A named entity is a real-world object that is assigned a name
- These names typically refer to a person, country, product, book title, etc.
- Spacy is able to recognize various types of named entities
- An
EntityRecognizer
uses a CNN model to predict these named entities
Accessing Entity Annotations
- The standard way of accessing entity annotations is the
Doc.ents
property - This will produce a sequence of
Span
objects - The
Span
object acts as a sequence of tokens, so we can iterate over the entity - We can also retrieve information entity annotations for each
Token
using an IOB scheme -
The IOB scheme includes the following annotations:
I:
Token is inside an entityO:
Token is outside an entityB:
Token is the beginning of an entity
Sample Code
>>> import spacy
>>> from spacy.tokens import Span
>>> doc = nlp("San Francisco considers banning sidewalk delivery robots")
# Document level entity annotations
>>> [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
[('San Francisco', 0, 13, 'GPE')]
# Token level entity annotations
>>> [doc[0].text, doc[0].ent_iob_, doc[0].ent_type_]
['San', 'B', 'GPE']
>>> [doc[1].text, doc[1].ent_iob_, doc[1].ent_type_]
['Francisco', 'I', 'GPE']
# Create new Doc
>>> doc = nlp("fb is hiring a new vice president of global policy")
# Check if organization
>>> [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
[]
# Set entity annotation
>>> fb_ent = Span(doc, 0, 1, label="ORG") # create Span for new entity
>>> doc.ents = list(doc.ents) + [fb_ent]
>>> [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
[('fb', 0, 2, 'ORG')]
References
Previous
Next