This guide describes how to connect Aporia to an PostgreSQL data source in order to monitor a new ML Model in production.
We will assume that your model inputs, outputs and optionally delayed actuals can be queried with SQL. This data source may also be used to connect to your model's training/test set to be used as a baseline for model monitoring.
Create a read-only user for PostgreSQL access
In order to provide access to PostgreSQL, read-only user for Aporia in PostgreSQL.
Please use the SQL snippet below to create a user for Aporia. Before using the snippet, you will need to populate the following:
<aporia_password>: Strong password to be used by the user.
<your_database>: PostgreSQL database with your ML training / inference data.
<your_schema>: PostgreSQL schema with your ML training / inference data.
CREATEUSERaporiaWITHPASSWORD'<aporia_password>';-- Grant access to DB and schemaGRANTCONNECTONDATABASEdatabase_nameTO username;GRANT USAGE ONSCHEMA<your_schema>TO username;-- Grant access to multiple tablesGRANTSELECTON table1 TO username;GRANTSELECTON table2 TO username;GRANTSELECTON table3 TO username;
Creating an PostgreSQL data source in Aporia
To create a new model to be monitored in Aporia, you can call the aporia.create_model(...) API:
Each model in Aporia contains different Model Versions. When you (re)train your model, you should create a new model version in Aporia.
Each raw input, feature or prediction is mapped by default to the column of the same name in the PostgreSQL query.
By creating a feature named amount or a prediction named proba, for example, the PostgreSQL data source will expect a column in the PostgreSQL query named amount or proba, respectively.
Next, create an instance of PostgresJDBCDataSource and pass it to apr_model.connect_serving(...) or apr_model.connect_training(...):
Note that as part of the connect_serving API, you are required to specify additional 2 columns:
id_column - A unique ID to represent this prediction.
timestamp_column - A column representing when did this prediction occur.
data_source = PostgresJDBCDataSource(
url="jdbc:postgresql://<POSTGRES_HOSTNAME>/<DBNAME>",
query='SELECT * FROM "my_db"."model_predictions"',
user="<DB_USER>",
password="<DB_PASSWORD>",
# Optional - use the select_expr param to apply additional Spark SQL
select_expr=["<SPARK_SQL>", ...],
# Optional - use the read_options param to apply any Spark configuration
# (e.g custom Spark resources necessary for this model)
read_options={...}
)
apr_model.connect_serving(
data_source=data_source,
# Names of the prediction ID and prediction timestamp columns
id_column="prediction_id",
timestamp_column="prediction_timestamp",
)