# Example: Token Classification

Token classification is a natural language understanding task in which a label is assigned to some tokens in a text&#x20;

**Named Entity Recognition (NER)** and **Part-of-Speech (PoS)** tagging are two popular token classification subtasks. NER models could be trained to recognize specific entities in a text, such as dates, individuals, and locations, while PoS tagging would identify which words in a text are verbs, nouns, and punctuation marks.

This guide will walk you through an example of NER model monitoring using spacy. Let's start by creating a dummy model:

```python
import spacy

NER = spacy.load("en_core_web_sm")
```

And let’s assume this is how our prediction function looks like (maybe it’s part of an http server, for example):

```python
def predict(request_id: str, raw_text: str):
  return {
    entity.text: entity.label_ 
    for entity in NER(raw_text).ents
  }
```

Each entity will include the text, the embedding, and the prediction as follow:

* text (raw input) - `entity.text`
* embedding - `entity.vector`
* prediction - `entity.label`

## Storing your Predictions

The next step would be to store your predictions in a data store, including the embeddings themselves. For more information on storing your predictions, please check out the [Storing Your Predictions](https://docs.aporia.com/storing-your-predictions) section.

For example, you could use a Parquet file on S3 or a Postgres table that looks like this:

<table><thead><tr><th width="88.33333333333331">id</th><th width="292">raw_text (text)</th><th width="263">embeddings (embedding)</th><th width="162.66666666666674">prediction (categorical)</th><th width="207">timestamp (datetime)</th></tr></thead><tbody><tr><td>1</td><td>I love cookies and Aporia</td><td><code>[0.77, 0.87, 0.94, ...]</code></td><td><code>Positive</code></td><td>2021-11-20 13:41:00</td></tr><tr><td>2</td><td>This restaurant was really bad</td><td><code>[0.97, 0.82, 0.13, ...]</code></td><td><code>Negative</code></td><td>2021-11-20 13:45:00</td></tr><tr><td>3</td><td><p>Hummus is a </p><p>type of food</p></td><td><code>[0.14, 0.55, 0.66, ...]</code></td><td><code>Natural</code></td><td>2021-11-20 13:49:00</td></tr></tbody></table>

To integrate this type of model follow our [Quickstart](https://docs.aporia.com/introduction/quickstart).

Check out the [data sources section](https://docs.aporia.com/data-sources) for more information about how to connect from different data sources.

### Schema mapping

This type of model is a [multiclass model](https://docs.aporia.com/model-types/multiclass-classification), with `text` raw input and a `embedding` feature.

There are 2 unique types in aporia to help you integrate your NLP model - `text`, and `embedding`.

The `text` should be used with your raw\_text column. Note that by default, in the UI every string column will be automatically marked as `categorical`, but you'll have the option to change it to `text` for NLP use cases.

The `embedding` as the name suggested, should be used with your embedding column. Note that by default, in the UI every array column will be automatically marked as `array`, but you'll have the option to change it to `embedding` for NLP use cases.

## Next steps

* **Create a custom dashboard for your model in Aporia** - Drag & drop widgets to show different performance metrics, top drifted features, etc.
* **Visualize NLP drift using Aporia's Embeddings Projector** - Use the Embedding Projector widget within the investigation room, to view drift between different datasets in production, using UMAP for dimension reduction.
* **Set up monitors to get notified for ML issues** - Including data integrity issues, model performance degradation, and model drift. For example:
  * Make sure the distribution of the different entity labels doesn’t drift across time
  * Make sure the distribution of the embedding vector doesn’t drift across time<br>
