Motivating Named Entity Recognition
- An
EntityRecognizerobject will assign named entity labels to aSpanof tokens - A
Spanobject is a slice from aDocobject - A named entity is a real-world object that is assigned a name
- These names typically refer to a person, country, product, book title, etc.
- Spacy is able to recognize various types of named entities
- An
EntityRecognizeruses a CNN model to predict these named entities
Accessing Entity Annotations
- The standard way of accessing entity annotations is the
Doc.entsproperty - This will produce a sequence of
Spanobjects - The
Spanobject acts as a sequence of tokens, so we can iterate over the entity - We can also retrieve information entity annotations for each
Tokenusing an IOB scheme -
The IOB scheme includes the following annotations:
I:Token is inside an entityO:Token is outside an entityB:Token is the beginning of an entity
Sample Code
>>> import spacy
>>> from spacy.tokens import Span
>>> doc = nlp("San Francisco considers banning sidewalk delivery robots")
# Document level entity annotations
>>> [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
[('San Francisco', 0, 13, 'GPE')]
# Token level entity annotations
>>> [doc[0].text, doc[0].ent_iob_, doc[0].ent_type_]
['San', 'B', 'GPE']
>>> [doc[1].text, doc[1].ent_iob_, doc[1].ent_type_]
['Francisco', 'I', 'GPE']
# Create new Doc
>>> doc = nlp("fb is hiring a new vice president of global policy")
# Check if organization
>>> [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
[]
# Set entity annotation
>>> fb_ent = Span(doc, 0, 1, label="ORG") # create Span for new entity
>>> doc.ents = list(doc.ents) + [fb_ent]
>>> [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
[('fb', 0, 2, 'ORG')]References
Previous
Next