What is a Token?
- A
Tokenrepresents a word, punctuation symbol, whitespace, etc. - Each
Tokenrepresents an input string from aDocobject encoded to hash values - Linguistic annotations are available as
Tokenattributes - To get the readable attribute representation of an attribute, we need to add an underscore
_to its name
More on Token Attributes
text:The input text contentlemma_:Base form of the tokenpos_:Generic part-of-speech tags found heretag_:Specific part-of-speech tags found heredep_:Dependency relation found hereshape_:Orthographic features of tokenis_alpha:Does the token consist of non-alphabetical characters?
Sample Code
>>> doc = nlp("Wow! Spacy is a great tool and I'm wanting to learn more. Please, teach me, sir.")
# Text of token
>>> doc[10].text
'wanting'
# Lemma of token
>>> doc[10].lemma
7597692042947428029 # some hash value
>>> doc[10].lemma_
'want'
# Generic POS of token
>>> doc[10].pos_
'VERB'
# Specific POS of token
>>> doc[10].tag_
'VBG'
# Dependency of token
>>> doc[10].dep_
'ROOT'
# Shape of token
>>> doc[10].shape_
'xxxx'References
Previous
Next