With just a few clicks, any Machine Learning model can be integrated and monitored in production with Aporia.

In this guide, we'll walk with you step by step to create your first model in Aporia and integrate your first Serving Dataset. For this example, we will use AWS S3 as our data source.

Create A Model

Go to the models page, and click on "Add Model" on the top right of the page.

You'll be shown a modal in which you could insert the name of the model, select its type, add a short description about the model, and select its color and icon.

Model type can be regression, binary, multiclass, multi-label, or ranking. Please refer to the relevant documentation on each model type for more info.

Click "Add" to create your model.

We will create for you the first version of the model, so you can get to integrate your data fast and easily.

Integrate A Serving Dataset

Now let's integrate our serving data from our S3 bucket.

Click on the connect serving button as seen in the picture:

You'll receive a modal where you'll be able to select the data source to pull your data. You can use data sources that you've already defined, or create a new data source.

Choose your data source, and click continue.

Now you'll need to enter a regex to define the path of the files to ingest from your bucket. Notice that the regex should include the file extension. In addition, you should select the file format of the ingested files.

Click "Test" to see validate the ingestion, and to check the auto infer of the types.

Field Types

It is possible to change the types at the top of each column to different types. The supported field types are as follows:

  • numeric - valid examples: 1, 2.87, 0.53, 300.13

  • boolean - valid examples: True, False

  • categorical - a categorical field. In the case of numeric categories, the type won't be auto-inferred and should be selected manually.

  • datetime - contains either python datetime objects, or an ISO-8601 timestamp string

  • text - freeform text

  • array- useful for categorical arrays and unstructured data

  • embedding - useful for numeric arrays and unstructured data. The arrays should be of fixed size.

  • image_url - useful to report URLs of images (raw inputs for CV models).

Once satisfied with the types, click "Continue" to define the dataset schema.

In the following step, you'll decide which columns you want to see in Aporia for this dataset.

On the left, you'll see all the available fields. You can multi-select fields and set their field group using the gray pop-up at the bottom of the modal. Notice that on the top of the right section, you should set the id column and the timestamp column, so Aporia will know how to interact with your data to calculate metrics.

If you made a mistake you can click on the "x" near the field and then you'll be able to choose a different field group.

You can use the search on the top left of the section and "select all" beneath it to set field groups according to a pattern

If you set any actual fields, you'll need to connect them to the relevant prediction field, so Aporia will be able to calculate your performance metrics.

When you finish defining your schema click "Continue" to view the summary of your dataset and "Finish" if it is correct.

That's it! Now we will start calculating metrics for you, and within a few minutes, you'll be able to see them in the system.

You can already access your model pages within Aporia and start creating dashboards and monitors!

Last updated