Skip to main content

Training

Our training endpoints allow you to programmatically train models based off of datasets you've uploaded to Akkio. They can be viewed as a better-designed v2 of our legacy /v1/models route.

Training is a fundamentally long operation, so is surfaced through a polling-based mechanism where you make the following calls:

  • One call to /new to submit the task
  • Polling calls to /{task_id}/status to check up on your task's status
  • One last call to /{task_id}/result once /status indicates it's done
caution

These endpoints are currently in beta and are subject to change as we refine them.

Example Model

Throughout this page, we'll assume an existing dataset exists in Akkio with fields:

  • Job Title
  • Years of Experience
  • Company Size
  • Industry
  • Location
  • Webinars Attended
  • Whitepapers Downloaded
  • Pages Visited
  • Days Since Last Visit
  • Email Open Rate
  • Email Click Rate
  • Responded to Survey
  • Positive Lead

We'll assume that "Positive Lead" is the attribute we're interested in training the model to predict.

Request Payload

We expect, as input, a JSON object with the following fields.

Mandatory:

  • dataset_id is the ID of the dataset you'd like to create a model off of.

Optional:

  • predict_fields is a list of fields you want to train the model to predict on. This will generally always be exactly one field.
  • ignore_fields is a list of fields you want to ignore such that they are not factored into the trained model.
  • force always trains a fresh model, even if another one has already been trained under the exact same parameters.
  • extra_attention helps improve accuracy on certain datasets where rare cases are common.

Example Call Sequence

Here's a rough outline of how you'd make a request to the Inference bulk predictions endpoints.

HTTP Headers

Header NameRequiredValue
X-API-KeyYesYour team's API key. See Authentication.

1. Create Request

First, we'll submit the task into our asynchronous processing queue.

POST /api/v1/models/train/new

{
"dataset_id": "YTV32jCdVf5DbcMxzvX5",
"predict_fields": ["Positive Lead"]
}

You'll receive an object like this containing a task id:

{
"task_id": "<task_id>"
}

We'll use this in the next request.

2. Query for Task Status

Next, we'll query the status endpoint at a cadence to see whether the task is complete yet. This request might look something like this:

GET /api/v1/models/train/<task_id>/status

Note that you must use the same Task ID that you received from the task creation endpoint above.

This will provide you with a status field set to either SUBMITTED, IN_PROGRESS, FAILED, or SUCCEEDED. You can read more about each state on the Asynchronous Endpoints page.

Here's an example response you might get:

{
"status": "IN_PROGRESS",
"metadata": {
"type": "IN_PROGRESS"
}
}

You should retry ("poll") this endpoint at a regular cadence until you get a response that looks something like this:

{
"status": "SUCCEEDED",
"metadata": {
"type": "SUCCEEDED",
"location": "/api/v1/models/train/<task_id>/result"
}
}
note

The location field is always relative to the API root (https://api.akkio.com/api/v1), not the overall website root (https://api.akkio.com). You'll need to remember to construct the end URL from the site name, API root, and the provided location.

Armed with this information, we'll move to the last request.

3. Query for Result

Armed with the location we got from the status call, we'll make a request for the end result.

GET /api/v1/models/train/<task_id>/result

You'll get a response that looks something like this:

{
"status": "success",
"model_id": "AyooDGG4IB7iJDkgLKJR",
"stats": [
[
{
"field": 12,
"field_name": "Positive Lead",
"field_type": "category",
"class": 0,
"class_name": "0",
"count": 19,
"true positives": 19,
"false positives": 0,
"false negatives": 0,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0,
"frequency": 0.95
},
{
"field": 12,
"field_name": "Positive Lead",
"field_type": "category",
"class": 1,
"class_name": "1",
"count": 1,
"true positives": 1,
"false positives": 0,
"false negatives": 0,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0,
"frequency": 0.05
}
]
],
"field_importance": {
"Job Title": 0.09528297930955887,
"Years of Experience": 0.09974530339241028,
"Company Size": 0.08637341856956482,
"Industry": 0.08510678261518478,
"Location": 0.050445690751075745,
"Webinars Attended": 0.1286168396472931,
"Whitepapers Downloaded": 0.10489732772111893,
"Pages Visited": 0.08934492617845535,
"Days Since Last Visit": 0.07931949943304062,
"Email Open Rate": 0.08199360221624374,
"Email Click Rate": 0.05456322059035301,
"Responded to Survey": 0.04431045055389404,
"Positive Lead": 8.27786672541464e-10
},
"data_story": [
{
"name": "Positive Lead",
"type": "category",
"outcomes": [
{
"outcome": "0",
"causes": [
{
"field": "Webinars Attended",
"top_value": "3",
"bottom_value": "3"
},
{
"field": "Whitepapers Downloaded",
"top_value": "3",
"bottom_value": "7"
},
{
"field": "Years of Experience",
"top_value": "6",
"bottom_value": "7"
},
{
"field": "Job Title",
"top_value": "Executive",
"bottom_value": "Assistant"
},
{
"field": "Pages Visited",
"top_value": "27",
"bottom_value": "1"
}
],
"top_case": 1.0,
"avg_case": 0.94,
"bottom_case": 1.0
},
{
"outcome": "1",
"causes": [
{
"field": "Webinars Attended",
"top_value": "3",
"bottom_value": "3"
},
{
"field": "Whitepapers Downloaded",
"top_value": "7",
"bottom_value": "3"
},
{
"field": "Years of Experience",
"top_value": "7",
"bottom_value": "6"
},
{
"field": "Job Title",
"top_value": "Assistant",
"bottom_value": "Executive"
},
{
"field": "Pages Visited",
"top_value": "1",
"bottom_value": "27"
}
],
"top_case": 0.0,
"avg_case": 0.06,
"bottom_case": 0.0
}
]
}
]
}

You can take this result and use it however you wish.

Additional Resources