Date

Attendees

Goals

Discussion items

TimeItemWhoNotes
60minsCrux and EgeriaAll
  • Open discussion, please see notes below

Notes

Crux is an open source project that implements a graph database, which supports efficient point-in-time queries. The implementation is in Clojure, and Crux can either be embedded as a library or run separately.

Crux offers a choice of backend stores - LMDB or RocksDB.

Crux uses graph indexes which are more flexible than the indexes you would typically find in an RDBMS. The project actually started out trying to build a graph layer on top of Oracle, but the team discovered that was a non-starter and built Crux.

Compared to Datomic (https://www.datomic.com) if you perform a historical query in Datomic it will result in index scans. In contrast, Crux always incorporates history state into every query and doesn't rely on index scans.

In the Crux graph model each entity (vertex) or relationship (edge) is stored as a document. A document can include (untyped) references to other documents; so a relationship would be stored as a document with references to two other (entity) documents. A document can also have properties. A document is effectively the value of an entity (or relationship) at a point in time. The 'history' or 'evolution' of the object is stored as a time-series of documents. This schema avoids the need for JOINs.

Deletes are soft (reversible). Evictions are permanent. I'm not sure this is quite like our delete (soft) and purge - because I think Egeria may have separate requirements for permanent deletion (i.e. an instance is not recoverable) vs totally forgetting something ever existed, e.g. after 7 years. I'm not sure quite how we support these separately today.

The query language in Crux is called Datalog. This is a recursive language, similar to Prolog or SparQL. It is much lower level than Gremlin for example, and really seems to consist of a small number of logical operations (AND, OR, NOT and some predicates).

In terms of mapping Egeria concepts to Crux our instance properties (core and type defined) would all be stored as document properties. Despite the time-series nature of Crux, Jeremy recommended that core properties such as createTime, updateTime are retained as first class properties of instances rather than being tempted to rely on the time series information.

In Crux, to find out when a document was added to the database you need to perform a scan.

Jeremy described Crux as having historical query capability but also important that it provides 'consistent queries'.

Action items