Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Open Business and Artificial Intelligence Connectivity (OBAIC) borrows the concept from Open Database Connectivity (ODBC), which is an interface that makes it possible for applications to access data from a variety of database management systems (DBMSs). The aim of OBAIC is to define an interface allowing BI tools to access machine learning models and to run inferencing against those models on a variety of different AI platforms from a variety of AI platforms - “AI ODBC for BI” 
  • Through OBAIC, BI vendors can connect to any AI platform freely without concerning themselves with the underlying implementation, or how the AI platform trains the model or infers results. It's just like what we use for databases via ODBC - the caller doesn't need to concern about how the database stores the data or execute queries.
  • The committee has decided this standard will only define the REST APIs protocol of how AI and BI communicate. The design or the actual implementation of OBAIC, such as whether this should be Server vs Server-less VS Docker, will be left up to the implementing vendors. If this protocol grows to another open-sourced project, that team may provide such implementation guidance/example(s). 
  • There are 3 key aspects designed into this standard: 
    • BI - What specific call do I need this standard to provide so that I can better leverage any underlying the AI/ML platform counterpart?
    • AI - What should be the common denominator be for an AI platform that can provide support for this standard?
    • Data - Shall data be moved around in the communication between AI and BI (passed by value) or will the data remain in it's source location (passed by reference)?

...

  1. An End User is analyzing data using their BI Tool and determines that predictive analytics for the data would be valuable and they wish to " train " a model with the data for that purpose.  This step is the traditional step when a user interacts with BI.
  2. (a) Obtain a token with permission associated to the user making the request. This token is going to pass to AI allowing the access to the training data with a SQL statement running against the datastore. (b) BI tool, on behalf of the user, requests AI platform through OBAIC, to train/prepare a model that accepts features of a certain type (numeric, categorical, text, etc.)

    Expand
    titleAPI to train model using provided dataset

    Model configuration is based on configs from the open-source Ludwig project. At a minimum, we should be able to define inputs and outputs in a fairly standard way. Other model configuration parameters are subsumed by the options field.

    The data stanza provides a bearer token allowing the ML provider to access the required data table(s) for training. The provided SQL query indicates how the training data should be extracted from the source.

    Don't be confused with the Bearer token which is used to authenticate with OBAIC, and the dbToken which is created in 2(a) and AI platform will use that to access the data source for training

    HTTP RequestValue
    Method

    POST

    Header

    Authorization: Bearer {token}

    URL

    {prefix}/models/

    Query Parameters

    {

      "dbToken": "D41C4A382C27A4B5DF824E2D4F148";
      "inputs":[
        {
          "name":"customerAge",
          "type":"numeric"
        },
        {
          "name":"activeInLastMonth",
          "type":"binary"
        }
      ],
      "outputs":[
        {
          "name":"canceledMembership",
          "type":"binary"
        }
      ],
      "modelOptions": {

          “providerSpecificOption”: “value”

       },
      "data":{
          "sourceType":"snowflake",
          "endpoint":"some/endpoint",
          "bearerToken":"...",
          "query":"SELECT foo FROM bar WHERE baz"
      }
    }



    Expand
    titleAlternatively, we may also consider to support SQL-like syntax for Model Training

    If we go beyond just REST API, SQL-like is an alternative as the syntax is also well-known

    Use BigQuery ML model creation as an example and generalizing

    CREATE MODEL (
      customerAge WITH ENCODING (
        type=numeric
      ),
      activeInLastMonth WITH ENCODING (
        type=binary
      ),
      canceledMembership WITH DECODING (
        type=binary
      )
    )
    FROM myData (
      sourceType=snowflake,
      endpoint="some/endpoint",
      bearerToken=<...>,

    AS (SELECT foo FROM BAR)
    WITH OPTIONS ();



    Expand
    title200: Training is started and the corresponding ID is return for future reference


    HTTP ResponseValue
    HeaderContent-Type: application/json; charset=utf-8
    Body

    {
      "modelID": "d677b054-2cd4-4711-959b-971af0081a73"
    }

    • modelID is generated and returned to the caller if training is started successfully. This will be used to check the status of the training, or for future Inference (see Inference section below)



  3. AI Platform provides the implementation to fulfill the request by connecting to the datasource with the provided token and the set of training data specified in SQL. This step is up to how the AI platform interacts with the data source to performance the training. 
  4. BI tool polls for the status or retrieve the training result. If the training is still in progress, the status will be returned. When training is completed, results and performance of the model will be returned.

    Expand
    titleAPI to get model status


    HTTP RequestValue
    Method

    GET

    Header

    Authorization: Bearer {token}

    URL

    {prefix}/modelStatus?modelID=

    Query Parameters

    modelID (type: String): The modelID returned from previous OBAIC call either from training or list of Models.



    Expand
    title200: Status of the Model returned


    HTTP ResponseValue
    HeaderContent-Type: application/json; charset=utf-8
    Body

    {
      "modelID": "d677b054-2cd4-4711-959b-971af0081a73",

      "status": "training",

      "progress": "80",
    }

    • modelID is same ID provided in the request
    • status can be training | inferencing | ready
    • progress is the estimated progress of the current status



  5. BI tool presents the result to the user in their own way, which is the "secret sauce" and unique to each other.

...

5. Once the BI user/developer decides which model to run for predictions, they will take the appropriate actions in the BI tool to prepare the data and call OBAIC and request it run that model with the data.

Use POST instead of GET

Query should be in the body

...

Expand
titleAPI to infer from a model using provided dataset by reference


HTTP RequestValue
Method

GETPOST

Header

Authorization: Bearer {token}

URL

{prefix}/models/{modelID}

Query Parameters

{

  "dbToken": "string";
  "action": "string",
  "data":{
      "sourceType":"string",
      "endpoint":"string",
      "bearerToken":"string",
      "query":"string"
  }
}

Example

{

  "dbToken": "D41C4A382C27A4B5DF824E2D4F148";
  "action": "infer",
  "data":{
      "sourceType":"snowflake",
      "endpoint":"some/endpoint",
      "bearerToken":"7CA4D3C152646DDEFB527A958C45B",
      "query":"SELECT foo FROM bar WHERE baz"
  }
}


...