Introducing the Spacy Pipeline
- When you call
nlpon a text, spacy first tokenizes the text to produce aDocobject - The
Docis then processed in several different steps - This is referred to as the processing pipeline
- The pipeline typically consists of a pos tagger, parser, and entity recognizer
- Each pipeline component returns the processed
Doc, which is then passed on to the next component
Describing the Pipeline
- The
Tokenizercomponent separates raw text into tokens - The
Taggercomponent assigns part-of-speech tags - The
DependencyParsercomponent assigns dependency labels - The
EntityRecognizercomponent detects and labels named entities - The
TextCategorizercomponent assigns document labels
| Name | Component | Creates |
|---|---|---|
| tokenizer | Tokenizer |
Doc |
| tagger | Tagger |
Doc[i].tag_ |
| parser | DependencyParser |
Doc[i].dep_ |
| ner | EntityRecognizer |
Doc[i].ents_ |
| textcat | TextCategorizer |
Doc.cats |
References
Previous
Next