Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Formatting, added some actions, and link to suggested tutorial on Datalog

...

Goals

Discussion items

TimeItemWhoNotes
60minsCrux and EgeriaAll
  • Open discussion, please see notes below

Notes

Crux:

  • Jeremy Taylor (Product Manager)
  • Jon Pither (Technical)
  • Steve (Commercial Manager)

...

  • Chris Grote
  • David Radley
  • Graham Wallis

Crux is an open source project that implements a graph database, which supports efficient point-in-time queries. The implementation is in Clojure, and Crux can either be embedded as a library or run separately.

Crux offers a choice of backend stores - LMDB or RocksDB.

  • LMDB is up to 3x faster than Rocks.
  • A RocksDB can scale up to around 16TB - beyond that you would need to scale out by replicating. Crux actually started out built on Kafka with the ability to scale out the backend, for size or availability. The incorporation of an embedded Rocks implementation came later but is good for evaluation/development purposes.

Crux uses graph indexes which are more flexible than the indexes you would typically find in an RDBMS. The project actually started out trying to build a graph layer on top of Oracle, but the team discovered that was a non-starter and built Crux.

...

The query language in Crux is called Datalog. This is a recursive language, similar to Prolog or SparQL. It is much lower level than Gremlin for example, and really seems to consist of a small number of logical operations (AND, OR, NOT and some predicates).

  • It supports wildcard searches - so you should be able to search for all entities that have a particular substring in an attribute, but it does not support 'wild' wildcarding or 'wild' navigation - i.e. in both cases you need to know the attribute you are testing the predicate against. This would it impossible to find all paths between an arbitrary pair of vertices (entities, for example).
  • Suggested initial learning via: http://www.learndatalogtoday.org (very simple tutorial, but the final chapter also illustrates how the very simple predicates can be combined into your own defined "rules" (think functions) that abstract commonly used combinations of predicates for your particular schema and scenarios)
  • There will be a deeper dive on Datalog when Jeremy will be presenting at 'Reclojure' on Thursday 3rd December (15:00 - 18:00).
  • Performance tests using the Waterloo Diversity SparQL benchmark, with 2 to 3GB of data and 10M 'triples', have shown that compared to neo4j or rdf4j, Crux gets to within 'an order of magnitude'.
  • A graph query will result in the generation of tens of thousands of Rocks lookups but there is extensive caching.

In terms of mapping Egeria concepts to Crux our instance properties (core and type defined) would all be stored as document properties. Despite the time-series nature of Crux, Jeremy recommended that core properties such as createTime, updateTime are retained as first class properties of instances rather than being tempted to rely on the time series information.

...

Jeremy described Crux as having historical query capability but also important that it provides 'consistent queries'.

Action items