More Spacy Objects
- The
Tokenobject is a word, punctuation mark, etc. - The
Docobject owns the sequence of tokens and all their annotations - The
StringStoreobject is a dictionary mapping hash values to strings - In other words, a
StringStoreobject is a lookup table for hases and their string values - The
Vocabobject is a set ofLexemeobjects - The
Lexemeobject is the hash value that represents the context-independent information about a word - For example, no matter if love is used as a verb or a noun in some context, its spelling and whether it consists of alphabetic characters won't ever change
- Its hash value will always be the same
Summarizing the Spacy Architecture
- The
Docobjects owns the data - The
SpanandTokenare views that point into theDocobject - The
Docobject is constructed by theTokenizer - After the
Docobject is created, it is modified in place by the components of the pipeline - The
Languageobject coordinates these components - Specifically, it takes raw text and sends it through the pipeline, returning an annotated document
- It also orchestrates training and serialization
References
Previous
Next