All of the REST APIs call presented below use bearer tokens for authorization. The {prefix} of each API is configurable in the hosted servers. This protocol is inspired by Delta Sharing.
|
|
Example:
{ "models": [ { "name": "Model 1", "id": "6d4b571a-80ca-41ef-bc67-b158f4352ad8" }, { "name": "Model 2", "id": "70d9ab9d-9a64-49a8-be4d-d3a678b4ab16" }, { "name": "Model 3", "id": "99914a97-5d2e-4b9f-b81a-1d43c9409162" }, { "name": "Model 4", "id": "8295bfda-7901-43e8-9d31-81fd1c3210ee" }, { "name": "Model 5", "id": "0693c224-3a3f-4fe7-bbbe-c70f93d15f12" } ], "nextPageToken": "3xXc4ZAsqZQwgejt" } |
|
|
Example:
{ "id": "6d4b571a-80ca-41ef-bc67-b158f4352ad8", "name": "Model 1", "revision": 3, "format": { "name": "PMML", "version": "4.3" }, "algorithm": "Neural Network", "tags": [ "Anomaly detection", "Banking" ], "dependency", "", "creator": "John Doe", "description": "This is a predictive model, refer to {input} and {output} for detailed format of each field, such as value range of a field, as well as possible predictions the model will gave. You may also refer to the example data here.", "input": { "fields": [ { "name": "Account ID", "opType": "categorical", "dataType": "string", "taxonomy": "ID", "example": "account abc-001", "allowMissing": false, "description": "unique value" }, { "name": "Account Balance", "opType": "continuous", "dataType": "double", "taxonomy": "currency", "example": "1,378,560.00", "allowMissing": true, "description": "Minimum: 0, Maximum: 999,999,999.00" }, ], "ref": "http://dmg.org/pmml/v4-3/pmml-4-3.xsd" } "output": { "fields": [ { "name": "Churn", "opType": "continuous", "dataType": "string", "taxonomy": "ID", "example": "0.67", "allowMissing": false, "description": "the possibility of the account stop doing business with a company over 6 months" } ], "ref": "http://dmg.org/pmml/v4-3/pmml-4-3.xsd" } "performance": { "metric": "accuracy", "value": 0.85 }, "rating": 5, "url": "uri://link_to_the_model" } |
|
|
|
|
errorCode
and message
for each API callName | Affiliation |
---|---|
Cupid Chan | Pistevo Decision |
Xiangxiang Meng | Redfin |
Deepak Karuppiah | MicroStrategy |
Nancy Rausch | SAS |
Dalton Ruer | Qlik |
Sachin Sinha | Microsoft |
Yi Shao | IBM |
Jeffrey Tang | Predibaes |
Lingyan Yin | Salesforce |
function TrainModel(inputs, outputs, modelOptions, dataConfig) -> UUID |
Example params:
{ “providerSpecificOption”: “value” }, |
Model configuration is based on configs from the open-source Ludwig project. At a minimum, we should be able to define inputs and outputs in a fairly standard way. Other model configuration parameters are subsumed by the options field.
The data stanza provides a bearer token allowing the ML provider to access the required data table(s) for training. The provided SQL query indicates how the training data should be extracted from the source.
Example response:
{ |
Consider also a fully SQL-like interface taking BigQuery ML model creation as an example and generalizing:
CREATE MODEL ( AS (SELECT foo FROM BAR) |
function ListModels() -> List[UUID, Status] |
Example response:
{ |
function GetModelConfig(UUID) -> Config |
Example response:
{ |
The response here is essentially a pared-down version of the original training configuration.
function GetModelStatus(UUID) -> Status |
Example response:
{ |
Get core evaluation metrics for a trained model.
function GetModelMetrics(UUID) -> Metrics |
Example response:
{ |
function PredictWithModel(UUID, dataConfig) -> Predictions |
Example params
{ |
A very similar data stanza to the train request, designating the feature data on which to predict.
Example response (as JSON here for convenience, not necessarily for large responses):
{ |
Note that directly returning a large response set is not a good idea. In practice, the results could be streamed through something like a persistent socket connection.