Example: Token Classification
Token classification is a natural language understanding task in which a label is assigned to some tokens in a text
Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging are two popular token classification subtasks. NER models could be trained to recognize specific entities in a text, such as dates, individuals, and locations, while PoS tagging would identify which words in a text are verbs, nouns, and punctuation marks.
This guide will walk you through an example of NER model monitoring using spacy. Let's start by creating a dummy model:
And letβs assume this is how our prediction function looks like (maybe itβs a part of an http server, for example):
Now letβs add some monitoring to this function π But before that, letβs create a new model in Aporia:
This is a multiclass
model, as each entity can be classified to one of two or more entities.
Now, we can change the predict
function to log predictions to Aporia:
Now, here are some sample monitors you can define:
Make sure the distribution of the different entity labels doesnβt drift across time
Make sure the distribution of the embedding vector doesnβt drift across time
General Metadata
But this is just the very beginning. Here, you can get really creative and start adding more information to each Aporia prediction.
First, if you have any general metadata of your prediction request (unrelated to the NER model itself), you can go ahead and log this metadata as raw inputs. This will let you make sure the model doesnβt drift or bias specific segments of your data (e.g gender, company type, etc.).
Entity-specific Metadata
Letβs start with an example. For each entity, you can log the word count of that entity. Then, youβll be able to monitor drift in the word count between different labels.
For example, you might expect country
entities to be 1-2 words, but organization
entities to have a distribution of 1-5 words, with most organizations having 2-3 words. If suddenly you see an organization
with 10 words - it is an outlier and probably not really an organization :)
Last updated