All of the REST APIs call presented below use bearer tokens for authorization. The {prefix} of each API is configurable in the hosted servers. This protocol is inspired by Delta Sharing.
* Blue text below, either in the diagram or in the description, means it's out of scope of OBAIC and it's up to the BI tool, AI Platform or Data source vendor to implement. OBAIC is the connecting tissue to coordinate the communications among them to extend the capability of these 3 major components
(a) Obtain a token a token with permission associated to the user making the request. This token is going to pass to AI allowing the access to the training data with a SQL statement running against the datastore. (b) BI tool, on behalf of the user, requests AI platform through OBAIC, to train/prepare a model that accepts features of a certain type (numeric, categorical, text, etc.)
Model configuration is based on configs from the open-source Ludwig project. At a minimum, we should be able to define inputs and outputs in a fairly standard way. Other model configuration parameters are subsumed by the options field. The data stanza provides a bearer token allowing the ML provider to access the required data table(s) for training. The provided SQL query indicates how the training data should be extracted from the source. Don't be confused with the Bearer token which is used to authenticate with OBAIC, and the dbToken which is created in 2(a) and AI platform will use that to access the data source for training
|
If we go beyond just REST API, SQL-like is an alternative as the syntax is also well-known Use BigQuery ML model creation as an example and generalizing
|
|
BI tool polls for the status or retrieve the training result. If the training is still in progress, the status will be returned. When training is completed, results and performance of the model will be returned.
|
|
1. When a BI user wants to extend its capability to AI, it reaches out to AI platform and requests a list of available models of which the credential of the provided token is authorized to see
|
|
Example:
{ "models": [ { "name": "Model 1", "id": "6d4b571a-80ca-41ef-bc67-b158f4352ad8" }, { "name": "Model 2", "id": "70d9ab9d-9a64-49a8-be4d-d3a678b4ab16" }, { "name": "Model 3", "id": "99914a97-5d2e-4b9f-b81a-1d43c9409162" }, { "name": "Model 4", "id": "8295bfda-7901-43e8-9d31-81fd1c3210ee" }, { "name": "Model 5", "id": "0693c224-3a3f-4fe7-bbbe-c70f93d15f12" } ], "nextPageToken": "3xXc4ZAsqZQwgejt" } |
2. After the list of models is returned, the BI user can selectively retrieve the detail of the model(s). This step can also be called right after the newly trained model is completed as described in the previous section since modelID is returned as a result of the training request.
|
|
Example:
{ "id": "6d4b571a-80ca-41ef-bc67-b158f4352ad8", "name": "Model 1", "revision": 3, "format": { "name": "PMML", "version": "4.3" }, "algorithm": "Neural Network", "tags": [ "Anomaly detection", "Banking" ], "dependency", "", "creator": "John Doe", "description": "This is a predictive model, refer to {input} and {output} for detailed format of each field, such as value range of a field, as well as possible predictions the model will gave. You may also refer to the example data here.", "input": { "fields": [ { "name": "Account ID", "opType": "categorical", "dataType": "string", "taxonomy": "ID", "example": "account abc-001", "allowMissing": false, "description": "unique value" }, { "name": "Account Balance", "opType": "continuous", "dataType": "double", "taxonomy": "currency", "example": "1,378,560.00", "allowMissing": true, "description": "Minimum: 0, Maximum: 999,999,999.00" }, ], "ref": "http://dmg.org/pmml/v4-3/pmml-4-3.xsd" } "output": { "fields": [ { "name": "Churn", "opType": "continuous", "dataType": "string", "taxonomy": "ID", "example": "0.67", "allowMissing": false, "description": "the possibility of the account stop doing business with a company over 6 months" } ], "ref": "http://dmg.org/pmml/v4-3/pmml-4-3.xsd" } "performance": { "metric": "accuracy", "value": 0.85 }, "rating": 5, "url": "uri://link_to_the_model" } |
3. The BI tools will use the information retrieved from the AI platform to display to the user, including what type of models are available and the performance. It can optionally match the data and suggest what may be the good match based on what the user has.
4. User interacts with the result BI presented and decides what can be a good model to make a prediction on certain set of data. Please note that the model can also be returned as the result of the training step described in the previous section. In the case, the user may bypass these 2 steps and go directly to see the result.
5. Once the BI user/developer decides which model to run for predictions, they will take the appropriate actions in the BI tool to prepare the data and call OBAIC and request it run that model with the data.
Use POST instead of GET
Query should be in the body
|
Explanation
|
Pass by value is recommended only for small data sets
|
|
|
6. AI will connect to the underlying data source and run the prediction using the information provided by BI
7. In case the result cannot be returned immediately because of the prediction volume, BI can poll for the result.
In case of pass by reference
{ "resultStatus": "ready", "result":{ "sourceType":"snowflake", "endpoint":"some/endpoint", "bearerToken":"7CA4D3C152646DDEFB527A958C45B", "query":"SELECT * FROM resultTable" } } |
In case of pass by value:
{ "resultStatus": "ready", "result":[ { "AccountBalance": 100, "YearOpened": 1990, "Churn": 80 }, { "AccountBalance": 200, "YearOpened": 1995, "Churn": 40 }, ] } |
|
|
|
|
errorCode
and message
for each API callName | Affiliation |
---|---|
Cupid Chan | Pistevo Decision |
Xiangxiang Meng | Redfin |
Deepak Karuppiah | MicroStrategy |
Nancy Rausch | SAS |
Dalton Ruer | Qlik |
Sachin Sinha | Microsoft |
Yi Shao | IBM |
Jeffrey Tang | Predibaes |
Lingyan Yin | Salesforce |