Aporia Documentation
Get StartedBook a Demo🚀 Cool StuffBlog
V2
V2
  • 📖Aporia Docs
  • 🤗Introduction
    • Quickstart
    • Support
  • 💡Core Concepts
    • Why Monitor ML Models?
    • Understanding Data Drift
    • Analyzing Performance
    • Tracking Data Segments
    • Models & Versions
  • 🚀Deployment
    • AWS
    • Google Cloud
    • Azure
    • Databricks
    • Offline / On-Prem
    • Platform Architecture
  • 🏠Storing your Predictions
    • Overview
    • Real-time Models (Postgres)
    • Real-time Models (Kafka)
    • Batch Models
    • Kubeflow / KServe
  • 🧠Model Types
    • Regression
    • Binary Classification
    • Multiclass Classification
    • Multi-Label Classification
    • Ranking
  • 🌈Explainability
    • SHAP values
  • 📜NLP
    • Intro to NLP Monitoring
    • Example: Text Classification
    • Example: Token Classification
    • Example: Question Answering
  • 🍪Data Sources
    • Overview
    • Amazon S3
    • Athena
    • BigQuery
    • Databricks
    • Glue Data Catalog
    • Google Cloud Storage
    • PostgreSQL
    • Redshift
    • Snowflake
    • Microsoft SQL Server
    • Oracle
  • ⚡Monitors & Alerts
    • Overview
    • Data Drift
    • Metric Change
    • Missing Values
    • Model Activity
    • Model Staleness
    • Performance Degradation
    • Prediction Drift
    • Value Range
    • Custom Metric
    • New Values
    • Alerts Consolidation
  • 🎨Dashboards
    • Overview
  • 🤖ML Monitoring as Code
    • Getting started
    • Adding new models
    • Data Segments
    • Custom metrics
    • Querying metrics
    • Monitors
    • Dashboards
  • 📡Integrations
    • Slack
    • Webhook
    • Teams
    • Single Sign On (SAML)
    • Cisco
  • 🔐Administration
    • Role Based Access Control (RBAC)
  • 🔑API Reference
    • REST API
    • API Extended Reference
    • Custom Segment Syntax
    • Custom Metric Syntax
    • Code-Based Metrics
    • Metrics Glossary
  • ⏩Release Notes
    • Release Notes 2024
    • Release Notes 2023
Powered by GitBook
On this page
  • Integrating Candidate Level Data
  • Integrating Search Level Data
  1. Model Types

Ranking

PreviousMulti-Label ClassificationNextSHAP values

Last updated 1 year ago

Ranking models are often used in recommendation systems, ads, search engines, etc. In Aporia, these models are represented with the ranking model type.

There are 2 common ways to store ranking models' data in the DB - Search Level and Candidate Level. The difference between these formats is mainly if each row in the DB represents a single usage of the ranking model with all of the options the model recommends, or if each row in the DB represents a single option of a specific search.

Aporia natively supports both formats, we recommend using the one closest to your real data.

Integrating Candidate Level Data

Data Format In DB

If you have a ranking or recommendations model and you store your data in a Candidate Level format then your database may look like the following:

id
Search_id
Candidate_id (text)
Position (number)
Features columns
Score (number)
Prediction (boolean)
Actual (boolean)
Timestamp (timestamp)

1

1a

hotel1

1

...

0.9

true

true

2014-10-19 10:23:54

2

1a

hotel2

null

...

-0.4

false

null

2014-10-19 10:23:54

3

1a

hotel3

2

...

0.8

true

false

2014-10-19 10:23:54

4

1b

hotel1

2

...

0.8

true

true

2014-10-19 10:24:24

5

1b

hotel2

3

...

0.7

true

false

2014-10-19 10:24:24

6

1b

hotel3

1

...

0.9

true

false

2014-10-19 10:24:24

Schema mapping

  • id - unique identification of the row in the DB as required for any dataset integration.

  • Search_id - Sometimes called context, should hold the id of a single search (usage of the ranking system).

  • Candidate_id (optional) - Should hold a meaningful identification of the specific candidate.

  • Position - Represent the position of the candidate in the predictions of the recommendation model. For example hold 1 for the top recommendation, 2 for the second... The value of the column should be Null if not in recommendation at all.

  • Features - Any features columns go here, the features should represent each candidate. Search-level features should appear per candidate according to the relevant search_id.

  • Score (optional) - Holds the numeric score if exists, that was generated by the ranking for the specific candidate.

  • Prediction - Boolean that indicates if the candidate was recommended or not (sometimes a virtual value is generated in the query from the score). Sometimes a prediction should appear in the schema multiple times (once per actual it should be compared with).

  • Actual - Boolean that indicates if a recommendation has been used by the user.

  • Timestamp - timestamp of the prediction.

In the Schema mapping, there are optional fields for ranking models that are used to group the candidates of the same search together as part of calculating the recommendation metrics like nDCG:

  • Group By - Should hold the Search_id to group all the candidates of the same search together.

  • Order by - Holds the column that indicates the order of the recommendations within the single search. Mostly use the position column if available.

  • Sort direction - ascend/descend. This is used in case the "order by" parameter orders the recommendations in reverse order of priority.

Integrating Search Level Data

Data Format In DB

If you have a ranking or recommendations model and you store your data in a Search Level format then your database may look like the following:

id
feature1 (numeric)
feature2 (boolean)
recommandations(array)
actual (array)
timestamp (datetime)

1

13.5

True

[item1, item2, ...]

[item3, item4]

2014-10-19 10:23:54

2

-8

False

[item3, item2, ...]

[item3]

2014-10-19 10:24:24

Schema Mapping

  • id - unique identification of the row in the DB as required for any dataset integration.

  • features - Search-level features, should appear as a single value.

  • recommendations - Ordered array of recommendations. Most recommended should appear first.

  • actual - Order of candidates actually used by the user. The actual best option should appear first.

  • Timestamp - timestamp of the prediction.

In the Schema mapping, there are optional fields added for ranking models, "Group By", "Order by" and "Sort direction", these options are relevant only to Candidate Level data and should be left empty with this format.

To integrate this type of model follow our , and build the schema as follows:

Check out the for more information about how to connect from different data sources.

To integrate this type of model follow our , and during the schema, mapping remember to include arrayprediction field and array actual field and link them together. The schema should be as follows:

Check out the for more information about how to connect from different data sources.

🧠
Quickstart
data sources section
Quickstart
data sources section
Ranking grouping parameters