Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

DACI: How to implement a repository with history?

Page properties
Status

Status
colourYellowGreen
titleIn progressReleased

Impact

Status
colourRed
titleHigh

DriverChris Grote 
Approver
Contributors
Informed
Due date
OutcomeProgressed option 2 to a release state

Tips and info

Tip
titleRecommendations
Info
titleContributors

Contributors: I am seeking the right people to get involved in the decision. Add your comments to this page, let's get the conversation started.

Please add:

  • The people directly impacted by this so we can include them.
  • Any references to previous work and investigations that we can leverage.
  • Any constraints and challenges we need to consider to make this decision and following action plan.
  • Any additional options we should consider before making the decision

    We agreed to examine the potential of Option 2 in more detail, and have now ultimately taken that approach to a released state.



    Background

    A common scenario we come across with almost all metadata repositories we have seen is that they lack the ability to store historical information about metadata and respond to point-in-time inquiries. While Egeria's type system and APIs have been built from the beginning to support such history, we have not yet implemented a backend storage option that implements history.

    Considering this comes up frequently as a common need, even to augment existing metadata repositories, providing such a historical store for metadata could be a somewhat narrow but nonetheless extremely common adoption point for Egeria.

    Current state

    We are currently considering implementation options for an initial approach to such a repository.

    Data for decision support

    • Identification of potential technologies to use as the backing store for such a repository.

    Options considered

     


    Option 1: bi-temporal RDBMSOption 2: bi-temporal graphOption 3: search index

    Description


    Using a bi-temporal relational database like DB2

    Using a bi-temporal graph store like Crux

    Using a search index like Elastic

    Rollout plan



    Start with some initial proof of concept activities like building some of the basic methods in a repository connector.

    Leaving as an alternative approach that was suggested, but no further details available.

    Pros and cons
    Tip
    titleNative

    Handles historical information natively at the storage layer, so should be simpler to implement point-in-time inquiry.

    Warning
    titleNew approach

    Takes a new approach to a backing store (relational) compared to our existing implementations (graph-based)

    Warning
    titleCommercial

    We are unaware of any open source, native bi-temporal RDBMS, so this would put a dependency on licensed commercial software.

    Warning
    titleSchema

    Requires a fixed schema, which raises questions about how to both handle efficient queries (not storing things as unqueryable blobs) but also manage history when the type system itself (schema?) may have changed over the course of that history (ie. deprecated attributes and types)

    Tip
    titleNative

    Handles historical information natively at the storage layer, so should be simpler to implement point-in-time inquiry.

    Tip
    titleSimilar to existing

    Close alignment with our current repository approaches that are more graph-focused than relational.

    Tip
    titleEmbedded option

    Provides a simple option to run in an embedded capacity, which could be useful for demonstration purposes (not requiring additional infrastructure and components).

    Tip
    titlePluggable backends

    Implemented using pluggable characteristics for its own backends, including both open source and commercial options.

    Tip
    titleSchemaless

    It sounds like each document in Crux is essentially schema-less (tuples / triples-based), so it may be feasible to store multiple versions of a type across the history of a given instance of metadata (question)


    Risks



    Note
    titleScalability

    The resource requirements that might be necessary for a "true production" rollout are unclear, or the volume to which it can scale. (We heard mention of "16 TB" (sounds plenty) but also "10 million triples" (with history, and one triple per attribute value, per instance, this sounds small?) – from subsequent conversations we confirmed that this is 10 billion triples rather than million, alleviating our immediate concerns.


    Estimated cost and effort





    FAQ

    Q1.

    A1.


    References


    Expand
    RelevanceLink
    Original GitHub issue https://github.com/odpi/egeria/issues/2545
    Discussion with Crux team2020-11-27 Meeting notes







    Follow-up action items

    •