Skip to content

XGBoost

Let's go through a simple example of integrating the Aporia SDK with a XGBoost model.

STEP 1: Add Model

Click the Add Model button in the Models page.

Add Model

Enter the model name and optionally a description. Click Next.

STEP 2: Initialize the Aporia SDK

First, we should initialize aporia and load a dataset to train the model.

import uuid
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split

import aporia
aporia.init(token='123', environment='example')

data = pd.read_csv("./path_to_real_file_with_data.csv")
features = data.drop(["will_buy_insurance"], axis=1)
labels = data[["will_buy_insurance"]]

STEP 3: Create Model Version

Next, we'll define a version for the new model:

aporia.create_model_version(
  model_id="my-model",
  model_version="v1",
  model_type="binary",
  features=aporia.pandas.infer_schema_from_dataframe(features),
  predictions=aporia.pandas.infer_schema_from_dataframe(labels)
)

STEP 4: Train Model

Now, let's train an XGBoost model, and log the training data:

dtrain = xgb.DMatrix(features, labels["will_buy_insurance"].values)
xgb_model = xgb.train({"objective": "binary:logistic"}, dtrain)

apr_model = aporia.Model(model_id="my-model", model_version="v1")
apr_model.log_training_set(features=features, labels=labels)

STEP 5: Predict

The last step is to log the predictions performed by the model.

# pred_features is a DataFrame containing the features for the predictions
prediction = xgb_model.predict(xgb.DMatrix(pred_features))

apr_model = aporia.Model(model_id="my-model", model_version="v1")
apr_model.log_prediction(
  id=str(uuid.uuid4()),
  features=aporia.pandas.pandas_to_dict(pred_features),
  predictions={
    "will_buy_insurance": prediction[0] > 0.7
  }
)

Optional steps

Report version with feature importance

Feature importance can be reported using the optional argument feature_importance. The arguments expects a mapping between a feature name and it's importance.

aporia.create_model_version(
  model_id="my-model",
  model_version="v1",
  model_type="binary",
  features=aporia.pandas.infer_schema_from_dataframe(features),
  predictions=aporia.pandas.infer_schema_from_dataframe(labels),
  # Optional
  feature_importance=xgb_model.get_score(importance_type='gain')
)

Notes

  • The function get_score() returns the expected mapping of feature names to feature importance when the feature names are passed to the train Dmatrix. Otherwise, when feature names are not passed, the function will return a mapping to dummie feature names. You can use the fmap argument of get_score to pass the feature names or to train the model with the feature names as follows:
    dtrain = xgb.DMatrix(features, labels["will_buy_insurance"].values, feature_names=features.columns.tolist())
    xgb_model = xgb.train({"objective": "binary:logistic"}, dtrain)
    
  • In case of multiple feature importance values (Example: multiclass models) - please contact us
  • For further information get_score documentation