Ranking
Last updated
Last updated
Ranking models are often used in recommendation systems, ads, search engines, etc. In Aporia, these models are represented with the ranking
model type.
There are 2 common ways to store ranking models' data in the DB - Search Level and Candidate Level. The difference between these formats is mainly if each row in the DB represents a single usage of the ranking model with all of the options the model recommends, or if each row in the DB represents a single option of a specific search.
Aporia natively supports both formats, we recommend using the one closest to your real data.
If you have a ranking or recommendations model and you store your data in a Candidate Level format then your database may look like the following:
1
1a
hotel1
1
...
0.9
true
true
2014-10-19 10:23:54
2
1a
hotel2
null
...
-0.4
false
null
2014-10-19 10:23:54
3
1a
hotel3
2
...
0.8
true
false
2014-10-19 10:23:54
4
1b
hotel1
2
...
0.8
true
true
2014-10-19 10:24:24
5
1b
hotel2
3
...
0.7
true
false
2014-10-19 10:24:24
6
1b
hotel3
1
...
0.9
true
false
2014-10-19 10:24:24
To integrate this type of model follow our Quickstart, and build the schema as follows:
id - unique identification of the row in the DB as required for any dataset integration.
Search_id - Sometimes called context, should hold the id of a single search (usage of the ranking system).
Candidate_id (optional) - Should hold a meaningful identification of the specific candidate.
Position - Represent the position of the candidate in the predictions of the recommendation model. For example hold 1 for the top recommendation, 2 for the second... The value of the column should be Null if not in recommendation at all.
Features - Any features columns go here, the features should represent each candidate. Search-level features should appear per candidate according to the relevant search_id.
Score (optional) - Holds the numeric score if exists, that was generated by the ranking for the specific candidate.
Prediction - Boolean that indicates if the candidate was recommended or not (sometimes a virtual value is generated in the query from the score). Sometimes a prediction should appear in the schema multiple times (once per actual it should be compared with).
Actual - Boolean that indicates if a recommendation has been used by the user.
Timestamp - timestamp of the prediction.
In the Schema mapping, there are optional fields for ranking models that are used to group the candidates of the same search together as part of calculating the recommendation metrics like nDCG:
Group By - Should hold the Search_id to group all the candidates of the same search together.
Order by - Holds the column that indicates the order of the recommendations within the single search. Mostly use the position column if available.
Sort direction - ascend/descend. This is used in case the "order by" parameter orders the recommendations in reverse order of priority.
Check out the data sources section for more information about how to connect from different data sources.
If you have a ranking or recommendations model and you store your data in a Search Level format then your database may look like the following:
1
13.5
True
[item1, item2, ...]
[item3, item4]
2014-10-19 10:23:54
2
-8
False
[item3, item2, ...]
[item3]
2014-10-19 10:24:24
To integrate this type of model follow our Quickstart, and during the schema, mapping remember to include array
prediction field and array
actual field and link them together. The schema should be as follows:
id - unique identification of the row in the DB as required for any dataset integration.
features - Search-level features, should appear as a single value.
recommendations - Ordered array of recommendations. Most recommended should appear first.
actual - Order of candidates actually used by the user. The actual best option should appear first.
Timestamp - timestamp of the prediction.
In the Schema mapping, there are optional fields added for ranking models, "Group By", "Order by" and "Sort direction", these options are relevant only to Candidate Level data and should be left empty with this format.
Check out the data sources section for more information about how to connect from different data sources.