Example: Question Answering
Question answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document.
Throughout the guide, we will use a simple question answering model based on 🤗 HuggingFace👍
This downloads a default pretrained model and tokenizer for Questioning Answering. Now you can use the qa_model
on your target question / context:
Extract Embeddings
To extract embeddings from the model, we'll first need to do two things:
Pass
output_hidden_states=True
to our model params.When we call
pipeline(...)
it does a lot of things for us - preprocessing, inference, and postprocessing. We'll need to break all this, so we can interfere in the middle and get embeddings 😉
In other words:
And finally, to extract embeddings for this prediction:
Storing your Predictions
The next step would be to store your predictions in a data store, including the embeddings themselves. For more information on storing your predictions, please check out the Storing Your Predictions section.
For example, you could use a Parquet file on S3 or a Postgres table that looks like this:
id | question (text) | context (text) | embeddings (embedding) | answer (text) | score (numeric) | timestamp (datetime) |
---|---|---|---|---|---|---|
1 | Where are the best cookies? | The best cookies are in... |
|
| 0.982 | 2021-11-20 13:41:00 |
2 | Where is the best hummus? | The best hummus is in... |
|
| 0.881 | 2021-11-20 13:45:00 |
3 | Where is the best burger? | The best burger is in... |
|
| 0.925 | 2021-11-20 13:49:00 |
To integrate this type of model follow our Quickstart.
Check out the data sources section for more information about how to connect from different data sources.
Schema mapping
This type of model is a multiclass model, with text
raw input and a embedding
feature.
There are 2 unique types in aporia to help you integrate your NLP model - text
, and embedding
.
The text
should be used with your raw_text column. Note that by default, in the UI every string column will be automatically marked as categorical
, but you'll have the option to change it to text
for NLP use cases.
The embedding
as the name suggested, should be used with your embedding column. Note that by default, in the UI every array column will be automatically marked as array
, but you'll have the option to change it to embedding
for NLP use cases.
Next steps
Create a custom dashboard for your model in Aporia - Drag & drop widgets to show different performance metrics, top drifted features, etc.
Visualize NLP drift using Aporia's Embeddings Projector - Use the Embedding Projector widget within the investigation room, to view drift between different datasets in production, using UMAP for dimension reduction.
Set up monitors to get notified for ML issues - Including data integrity issues, model performance degradation, and model drift. For example:
Make sure the distribution of the different entity labels doesn’t drift across time
Make sure the distribution of the embedding vector doesn’t drift across time
Last updated