Example: Token Classification
Last updated
Last updated
Token classification is a natural language understanding task in which a label is assigned to some tokens in a text
Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging are two popular token classification subtasks. NER models could be trained to recognize specific entities in a text, such as dates, individuals, and locations, while PoS tagging would identify which words in a text are verbs, nouns, and punctuation marks.
This guide will walk you through an example of NER model monitoring using spacy. Let's start by creating a dummy model:
And letβs assume this is how our prediction function looks like (maybe itβs part of an http server, for example):
Each entity will include the text, the embedding, and the prediction as follow:
text (raw input) - entity.text
embedding - entity.vector
prediction - entity.label
The next step would be to store your predictions in a data store, including the embeddings themselves. For more information on storing your predictions, please check out the section.
For example, you could use a Parquet file on S3 or a Postgres table that looks like this:
1
I love cookies and Aporia
[0.77, 0.87, 0.94, ...]
Positive
2021-11-20 13:41:00
2
This restaurant was really bad
[0.97, 0.82, 0.13, ...]
Negative
2021-11-20 13:45:00
3
Hummus is a
type of food
[0.14, 0.55, 0.66, ...]
Natural
2021-11-20 13:49:00
There are 2 unique types in aporia to help you integrate your NLP model - text
, and embedding
.
The text
should be used with your raw_text column. Note that by default, in the UI every string column will be automatically marked as categorical
, but you'll have the option to change it to text
for NLP use cases.
The embedding
as the name suggested, should be used with your embedding column. Note that by default, in the UI every array column will be automatically marked as array
, but you'll have the option to change it to embedding
for NLP use cases.
Create a custom dashboard for your model in Aporia - Drag & drop widgets to show different performance metrics, top drifted features, etc.
Visualize NLP drift using Aporia's Embeddings Projector - Use the Embedding Projector widget within the investigation room, to view drift between different datasets in production, using UMAP for dimension reduction.
Set up monitors to get notified for ML issues - Including data integrity issues, model performance degradation, and model drift. For example:
Make sure the distribution of the different entity labels doesnβt drift across time
Make sure the distribution of the embedding vector doesnβt drift across time
To integrate this type of model follow our .
Check out the for more information about how to connect from different data sources.
This type of model is a , with text
raw input and a embedding
feature.