GoldParse

Motivating Model Training

Spacy's models are statistical and every decision they make is a prediction
This prediction is based on the examples the model has seen during training
To train a model, we first need training data
This could be a part-of-speech tag, a named entity, or any other information

Training

Training with Annotations

The GoldParse object is a collection of specified annotations for some given training examples
These annotated training examples are called the gold standard
A GoldParse object is initialized using the Doc object
Specifically, the GoldParse object refers to the labels for some unlabeled training data
Here's an example of a simple GoldParse for part-of-speech tags:

>>> vocab = Vocab(tag_map={"N": {"pos": "NOUN"},
...                        "V": {"pos": "VERB"}})
>>> doc = Doc(vocab, words=["I", "like", "stuff"])
>>> gold = GoldParse(doc, tags=["N", "V", "N"])

References

Previous

Next

Rule-Based Matching

Training NER