# Overview

**Aporia monitors your models by connecting&#x20;*****directly*****&#x20;to your data.** If you don't store your predictions yet, see our guide on [Storing Your Predictions](https://docs.aporia.com/v1/storing-your-predictions) (recommended), or just [log them directly to Aporia.](https://docs.aporia.com/v1/storing-your-predictions/logging-to-aporia-directly)

Aporia currently supports the following data sources:

* Amazon S3
* BigQuery
* Redshift
* Athena
* Snowflake
* PostgreSQL
* Delta Lake
* Glue Data Catalog

{% hint style="info" %}
If your storage or database are not shown here, please contact your Aporia account manager for further assistance.
{% endhint %}

### Configure Data Source

Connecting to a data source begins with configuring its connection details. For example, to connect to a Postgres database, we can create the following data source object:

```python
data_source = PostgresJDBCDataSource(
  url="jdbc:postgresql://<POSTGRES_HOSTNAME>/<DBNAME>",
  query="SELECT * FROM model_predictions",
  user="<DB_USER>",
  password="<DB_PASSWORD>"
)
```

Please refer to the documentation page of the relevant data source for a complete list of supported parameters and configuration options.

### Connect Serving Data

After creating a data source, we can create a model version and connect it to the data source. For example:

```python
apr_model = aporia.create_model_version(
  model_id="<MODEL_ID>",
  model_version="v1",
  model_type="binary"
  
  raw_inputs={
    "raw_text": "text",
  },

  features={
    "amount": "numeric",
    "owner": "string",
    "is_new": "boolean",
    "embeddings": {"type": "tensor", "dimensions": [768]},
  },

  predictions={
    "will_buy_insurance": "boolean",
    "proba": "numeric",
  },
)

apr_model.connect_serving(
  data_source=data_source,

  # Names of the prediction ID and prediction timestamp columns
  id_column="prediction_id",
  timestamp_column="prediction_timestamp",
)
```

By default, each raw input, feature, and prediction is mapped to the same column in the PostgreSQL query.

As part of the `connect serving` API, you must specify the following two additional columns:

* `id_column` - A unique ID to represent this prediction.
* `timestamp_column` - A column representing when did this prediction occur.

### Integrating Delayed Actuals

Integrating actuals can be done by using the `labels` argument of the `connect_serving` API. To use it, each Aporia prediction can be mapped to a column representing its actual value.

For example, let's assume we have two columns - `will_buy_insurance` (which is the model prediction), and `did_buy_insurance` (the ground truth). To integrate it to Aporia:

```python
apr_model = aporia.create_model_version(
  ...
  predictions={
    "will_buy_insurance": "boolean"
  }
)

apr_model.connect_serving(
  data_source=data_source,

  id_column="prediction_id",
  timestamp_column="prediction_timestamp",

  labels={
    # Prediction name -> Column name representing 
    "will_buy_insurance": "did_buy_insurance"
  }
)
```

The ground truth can be `NULL` until it actually has value, and that's okay.

### Connecting Training / Test Sets

To connect your model version to training or test sets, you can use the `connect_training` and `connect_testing` APIs.

For example:

```python
# Training set
apr_model.connect_training(
  data_source=training_set_data_source,
  id_column="id",
  timestamp_column="timestamp",
)

# Test set
apr_model.connect_testing(
  data_source=test_set_data_source,
  id_column="id",
  timestamp_column="timestamp",
)
```

### Advanced Mapping

Any column that has the same name as a raw input, feature, or prediction in the model schema is mapped to the corresponding raw input, feature, or prediction.

However, you can override this mapping using the `raw_inputs`, `features`, `predictions`, and `labels` arguments to the `connect_serving` / `connect_training` / `connect_testing` APIs. Example:

```python
apr_model.connect_serving(
  data_source=aporia.GlueDataSource(
    database="datalake",
    query="""
      SELECT
        my_id,
        full_name,
        age,
        my_gender_col,
        decision,
        was_decision_correct,
        occurred_at,
      FROM predictions
    """,
  ),

  id_column="my_id",
  timestamp_column="occurred_at",
  raw_inputs={
    "fullname": "full_name",
  }
  features={
    "age": "age",
    "gender": "my_gender_col",
  },
  predictions={
    "will_buy_insurance": "decision",
  },
  labels={
    "will_buy_insurance": "was_decision_correct"
  }
)
```
