Skip to content

SDK Reference

Core API


aporia.core.core_api.create_model_version(model_id, model_version, model_type, features, predictions, raw_inputs=None, metrics=None)

Creates a new model version, and defines a schema for it.

Parameters:

Name Type Description Default
model_id str

Model identifier, as received from the Aporia dashboard.

required
model_version str

Model version - this can be any string that represents the model version, such as "v1" or a git commit hash.

required
model_type str

Model type (also known as objective - see notes).

required
features Dict[str, str]

Schema for model features (See notes)

required
predictions Dict[str, str]

Schema for prediction results (See notes)

required
raw_inputs Optional[Dict[str, str]]

Schema for raw inputs (See notes).

None
metrics Optional[Dict[str, str]]

Schema for prediction metrics (See notes).

None

Notes

  • A schema is a dict, in which the keys are the fields you wish to report, and the values are the types of those fields. For example:
      {
          "feature1": "numeric",
          "feature2": "datetime"
      }
    
  • The supported model types are:
    • "regression" - for regression models
    • "binary" - for binary classification models
    • "multiclass" - for multiclass classification models
  • The valid field types (and corresponding python types) are:

    Field Type Python Types
    "numeric" float, int
    "categorical" int
    "boolean" bool
    "string" str
    "datetime" datetime.datetime, or str representing a datetime in ISO-8601 format
    "vector" list of floats
    "text" str (to be used as free text)

Returns:

Type Description
Optional[aporia.model.Model]

Model object for the new version.

aporia.core.core_api.init(token=None, host=None, environment=None, port=None, verbose=None, throw_errors=None, debug=None)

Initialize the Aporia SDK.

Parameters:

Name Type Description Default
token Optional[str]

Authentication token.

None
host Optional[str]

Controller host.

None
environment Optional[str]

Environment in which aporia is initialized (e.g production, staging).

None
port Optional[int]

Controller port. Defaults to 443.

None
verbose Optional[bool]

True to enable verbose error messages. Defaults to False

None
throw_errors Optional[bool]

True to cause errors to be raised as exceptions. Defaults to False.

None
debug Optional[bool]

True to enable debug logs and stack traces in log messages. Defaults to False.

None

Notes

  • The token, host and environment parameters are required.
  • All of the parameters here can also be defined as environment variables:
    • token -> APORIA_TOKEN
    • host -> APORIA_HOST
    • environment -> APORIA_ENVIRONMENT
    • port -> APORIA_PORT
    • verbose -> APORIA_VERBOSE
    • throw_errors -> APORIA_THROW_ERRORS
    • debug -> APORIA_DEBUG
  • Values passed as parameters to aporia.init() override the values from the corresponding environment variables.

aporia.core.core_api.shutdown()

Shuts down the Aporia SDK.

Notes

  • It is advised to call flush() before calling shutdown(), to ensure that all of the data that was sent reaches the controller.

Model Tags


aporia.core.model_tags.add_model_tags(model_id, tags)

Adds or updates tags to an existing model.

Each tag is a key:value pair, key and value must be strings.

Parameters:

Name Type Description Default
model_id str

Model ID

required
tags Dict[str, str]

A mapping of tag keys to tag values

required

Notes

  • Each model is restricted to 10 tags
  • Tag keys are always converted to lowercase
  • If the tags parameter contains tag keys that were already defined for the model, their values will be updated.

aporia.core.model_tags.delete_model_tag(model_id, tag_key)

Deletes a model tag.

Parameters:

Name Type Description Default
model_id str

Model ID

required
tag_key str

Tag key to delete

required

Notes

  • This function is best-effort, it will not fail if the tag doesn't exist.

aporia.core.model_tags.get_model_tags(model_id)

Fetches the tag keys and values of a model.

Parameters:

Name Type Description Default
model_id str

Model ID

required

Returns:

Type Description
Optional[Dict[str, str]]

A dict mapping tag keys to values

Model Object


Model object for logging model events.

aporia.model.Model.__init__(self, model_id, model_version) special

Initializes a model object.

Parameters:

Name Type Description Default
model_id str

Model identifier, as received from the Aporia dashboard.

required
model_version str

Model version - this can be any string that represents the model version, such as "v1" or a git commit hash.

required

aporia.model.Model.flush(self, timeout=None) inherited

Waits for all currently scheduled tasks to finish.

Parameters:

Name Type Description Default
timeout Optional[int]

Maximum number of seconds to wait for tasks to complete. Default to None (No timeout).

None

Returns:

Type Description
Optional[int]

Number of tasks that haven't finished running.

aporia.model.Model.log_actuals(self, id, actuals) inherited

Logs actual values of a single prediction.

Parameters:

Name Type Description Default
id str

Prediction identifier.

required
actuals Dict[str, Union[float, int, str, bool, datetime.datetime]]

Actual prediction results.

required

Note

  • The fields reported in actuals must be a subset of the fields reported in predictions.

aporia.model.Model.log_batch_actuals(self, batch_actuals) inherited

Logs actual values of multiple predictions.

Parameters:

Name Type Description Default
batch_actuals Iterable[dict]

An iterable that produces actuals dicts.

  • Each dict MUST contain the following keys:
    • id (str): Prediction identifier.
    • actuals (Dict[str, FieldValue]): Actual prediction results.
required

Note

  • The fields reported in actuals must be a subset of the fields reported in predictions.

aporia.model.Model.log_batch_prediction(self, batch_predictions) inherited

Logs multiple predictions.

Parameters:

Name Type Description Default
batch_predictions Iterable[dict]

An iterable that produces prediction dicts.

  • Each prediction dict MUST contain the following keys:

    • id (str): Prediction identifier.
    • features (Dict[str, FieldValue]): Values for all the features in the prediction
    • predictions (Dict[str, FieldValue]): Prediction result
  • Each prediction dict MAY also contain the following keys:

    • occurred_at (datetime): Prediction timestamp.
    • metrics (Dict[str, FieldValue]): Prediction metrics
    • confidence (Union[float, List[float]]): Prediction confidence.
    • raw_inputs (Dict[str, FieldValue]): Raw inputs of the prediction.
    • actuals (Dict[str, FieldValue]) Actual prediction results.
required

Notes

  • If occurred_at is None in any of the predictions, it will be reported as datetime.now()

aporia.model.Model.log_batch_pyspark_actuals(self, ids, actuals)

Logs actual values of multiple predictions.

Parameters:

Name Type Description Default
ids DataFrame

Prediction identifiers

required
actuals DataFrame

Actual prediction results of each prediction

required

Notes

  • The ids dataframe must contain exactly one column
  • The ids and actuals dataframes must have the same number of rows

aporia.model.Model.log_batch_pyspark_prediction(self, ids, features, predictions, raw_inputs=None, actuals=None)

Logs multiple predictions.

Parameters:

Name Type Description Default
ids DataFrame

Prediction identifiers

required
features DataFrame

Values for all of the features in each prediction

required
predictions DataFrame

Prediction results.

required
raw_inputs Optional[pyspark.sql.dataframe.DataFrame]

Raw inputs for each prediction.

None
actuals Optional[pyspark.sql.dataframe.DataFrame]

Actual prediction results of each prediction

None

Notes

  • The ids dataframe must contain exactly one column
  • The ids, features, predictions, raw_inputs and actuals dataframes must have the same number of rows

aporia.model.Model.log_batch_pyspark_raw_inputs(self, ids, raw_inputs)

Logs raw inputs of multiple predictions.

Parameters:

Name Type Description Default
ids DataFrame

Prediction identifiers

required
raw_inputs DataFrame

Raw inputs of each prediction

required

Notes

  • The ids dataframe must contain exactly one column
  • The ids and raw_inputs dataframes must have the same number of rows

aporia.model.Model.log_batch_raw_inputs(self, batch_raw_inputs) inherited

Logs raw inputs of multiple predictions.

Parameters:

Name Type Description Default
batch_raw_inputs Iterable[dict]

An iterable that produces raw_inputs dicts.

  • Each dict MUST contain the following keys:
    • id (str): Prediction identifier.
    • raw_inputs (Dict[str, FieldValue]): Raw inputs of the prediction.
required

aporia.model.Model.log_json(self, data) inherited

Logs arbitrary data.

Parameters:

Name Type Description Default
data Any

Data to log, must be JSON serializable

required

aporia.model.Model.log_prediction(self, id, features, predictions, metrics=None, occurred_at=None, confidence=None, raw_inputs=None, actuals=None) inherited

Logs a single prediction.

Parameters:

Name Type Description Default
id str

Prediction identifier.

required
features Dict[str, Union[float, int, str, bool, datetime.datetime]]

Values for all the features in the prediction

required
predictions Dict[str, Union[float, int, str, bool, datetime.datetime]]

Prediction result

required
metrics Optional[Dict[str, Union[float, int, str, bool, datetime.datetime]]]

Prediction metrics.

None
occurred_at Optional[datetime.datetime]

Prediction timestamp.

None
confidence Union[float, List[float]]

Prediction confidence.

None
raw_inputs Optional[Dict[str, Union[float, int, str, bool, datetime.datetime]]]

Raw inputs of the prediction.

None
actuals Optional[Dict[str, Union[float, int, str, bool, datetime.datetime]]]

Actual prediction results.

None

Note

  • If occurred_at is None, it will be reported as datetime.now()

aporia.model.Model.log_pyspark_test_set(self, features, predictions, labels, raw_inputs=None)

Logs test data from PySpark DataFrames.

Parameters:

Name Type Description Default
features DataFrame

Test set features

required
predictions DataFrame

Test set predictions

required
labels DataFrame

Test set labels

required
raw_inputs Optional[pyspark.sql.dataframe.DataFrame]

Test set raw inputs.

None

Notes

  • Each dataframe corresponds to a field category defined in create_model_version:
    • features -> features
    • predictions -> predictions
    • labels -> predictions
    • raw_inputs -> raw_inputs
  • Each column in the dataframe should match a field defined in create_model_version
    • Missing fields will be handled as missing values
    • Columns that do not match a defined field will be ignored
    • The column name must match the field name
  • This function is blocking and may take a while to finish running.

aporia.model.Model.log_pyspark_training_set(self, features, labels, raw_inputs=None)

Logs training data from PySpark DataFrames.

Parameters:

Name Type Description Default
features DataFrame

Training set features

required
labels DataFrame

Training set labels

required
raw_inputs Optional[pyspark.sql.dataframe.DataFrame]

Training set raw inputs.

None

Notes

  • Each dataframe corresponds to a field category defined in create_model_version:
    • features -> features
    • labels -> predictions
    • raw_inputs -> raw_inputs
  • Each column in the dataframe should match a field defined in create_model_version
    • Missing fields will be handled as missing values
    • Columns that do not match a defined field will be ignored
    • The column name must match the field name
  • This function is blocking and may take a while to finish running.

aporia.model.Model.log_raw_inputs(self, id, raw_inputs) inherited

Logs raw inputs of a single prediction.

Parameters:

Name Type Description Default
id str

Prediction identifier.

required
raw_inputs Dict[str, Union[float, int, str, bool, datetime.datetime]]

Raw inputs of the prediction.

required

aporia.model.Model.log_test_set(self, features, predictions, labels, raw_inputs=None, confidences=None)

Logs test data.

Parameters:

Name Type Description Default
features DataFrame

Test set features

required
predictions DataFrame

Test set predictions

required
labels DataFrame

Test set labels

required
raw_inputs Optional[pandas.core.frame.DataFrame]

Test set raw inputs.

None
confidences Optional[numpy.ndarray]

Confidence values for the test predictions.

None

Notes

  • Each dataframe corresponds to a field category defined in create_model_version:
    • features -> features
    • predictions -> predictions
    • labels -> predictions
    • raw_inputs -> raw_inputs
  • Each column in the dataframe should match a field defined in create_model_version
    • Missing fields will be handled as missing values
    • Columns that do not match a defined field will be ignored
    • The column name must match the field name
  • This function is blocking and may take a while to finish running.

aporia.model.Model.log_training_set(self, features, labels, raw_inputs=None)

Logs training data.

Parameters:

Name Type Description Default
features DataFrame

Training set features

required
labels DataFrame

Training set labels

required
raw_inputs Optional[pandas.core.frame.DataFrame]

Training set raw inputs.

None

Notes

  • Each dataframe corresponds to a field category defined in create_model_version:
    • features -> features
    • labels -> predictions
    • raw_inputs -> raw_inputs
  • Each column in the dataframe should match a field defined in create_model_version
    • Missing fields will be handled as missing values
    • Columns that do not match a defined field will be ignored
    • The column name must match the field name
  • This function is blocking and may take a while to finish running.

Utils


aporia.pandas.pandas_utils.infer_schema_from_dataframe(data)

Infers model version schema from a pandas DataFrame or Series.

Field names and types are inferred from column names and types.

Parameters:

Name Type Description Default
data DataFrame

pandas DataFrame or Series

required

Returns:

Type Description
Optional[Dict[str, str]]

A schema describing the data, as required by the create_model_version function.

Notes

  • The field types are inferred using the following logic, based on the column dtypes:
    • dtype="category" with numeric (integer of float) categories -> categorical field
    • dtype="category" with non-numeric categories -> See rules below
    • Array of numeric values (integer or float) -> vector field
    • dtype="bool" -> boolean field
    • dtypes that represent signed/unsigned integers and floating point numbers -> numeric field
    • dtype is "string", "unicode", "object" -> string field
    • dtype is "string", "unicode", "object", more than 50 values which more than 25% of them are unique -> text field
    • dtype is any datetime type (with or without timezone) -> datetime field
  • If data contains a column with a type that doesn't match any of the rules described above, an error will be raised.

aporia.pandas.pandas_utils.pandas_to_dict(data)

Converts a pandas DataFrame or Series to a dict for log_* functions.

Parameters:

Name Type Description Default
data Union[pandas.core.frame.DataFrame, pandas.core.series.Series]

DataFrame or Series to convert.

required

Returns:

Type Description
Optional[Dict[str, Union[float, int, str, bool, datetime.datetime]]]

The data converted to a dict, mapping field names to their values

Notes

  • data must contain column names that match the fields defined in create_model_version
  • If data is a DataFrame, it must contain exactly one row

aporia.pyspark.pyspark_utils.infer_schema_from_pyspark_dataframe(data)

Infers model version schema from a PySpark DataFrame.

Field names and types are inferred from column names and types.

Parameters:

Name Type Description Default
data DataFrame

PySpark DataFrame

required

Returns:

Type Description
Optional[Dict[str, str]]

A schema describing the data, as required by the create_model_version function.

Notes

  • The field types are inferred using the following logic, based on the data schema:
    • Boolean data type -> boolean field
    • Datetime data type -> datetime field
    • Array data type with numeric elements -> vector field
    • Numeric data type:
      • At most 50 unique values, at most 25% of the values are unique -> categorical field
      • Otherwise -> numeric field
    • String data type:
      • At least 50 unique values, at least 25% of the values are unique -> text field
      • Otherwise -> string field
  • If data contains a column with a type that doesn't match any of the rules described above, an error will be raised.
  • If data is a large dataset (> 10000 rows), a sample of the data will be used to infer the schema.

See Also: * https://spark.apache.org/docs/latest/sql-ref-datatypes.html#data-types