Page History

Versions Compared

Old Version 1

changes.mady.by.user Graham Wallis

Saved on Nov 27, 2020

compared with

New Version 2

changes.mady.by.user Chris Grote

Saved on Nov 27, 2020

Key

This line was added.
This line was removed.
Formatting was changed.

Comment: Formatting, added some actions, and link to suggested tutorial on Datalog

...

Graham Wallis
Chris Grote
David Radley
Jeremy Taylor (Product Manager)
Jon Pither (Technical)
Steve H. (Commercial Manager)

Goals

Take a first look at the Crux OSS database and potential use related to Repository with history

Discussion items

Time	Item	Who	Notes
60mins	Crux and Egeria	All	Open discussion, please see notes below

Notes

Crux:

Jeremy Taylor (Product Manager)
Jon Pither (Technical)
Steve (Commercial Manager)

...

Chris Grote
David Radley
Graham Wallis
Crux is an open source project that implements a graph database, which supports efficient point-in-time queries. The implementation is in Clojure, and Crux can either be embedded as a library or run separately.
Crux offers a choice of backend stores - LMDB or RocksDB.
LMDB is up to 3x faster than Rocks.
A RocksDB can scale up to around 16TB - beyond that you would need to scale out by replicating. Crux actually started out built on Kafka with the ability to scale out the backend, for size or availability. The incorporation of an embedded Rocks implementation came later but is good for evaluation/development purposes.
Crux uses graph indexes which are more flexible than the indexes you would typically find in an RDBMS. The project actually started out trying to build a graph layer on top of Oracle, but the team discovered that was a non-starter and built Crux.
...
The query language in Crux is called Datalog. This is a recursive language, similar to Prolog or SparQL. It is much lower level than Gremlin for example, and really seems to consist of a small number of logical operations (AND, OR, NOT and some predicates).
It supports wildcard searches - so you should be able to search for all entities that have a particular substring in an attribute, but it does not support 'wild' wildcarding or 'wild' navigation - i.e. in both cases you need to know the attribute you are testing the predicate against. This would it impossible to find all paths between an arbitrary pair of vertices (entities, for example).
Suggested initial learning via: http://www.learndatalogtoday.org (very simple tutorial, but the final chapter also illustrates how the very simple predicates can be combined into your own defined "rules" (think functions) that abstract commonly used combinations of predicates for your particular schema and scenarios)
There will be a deeper dive on Datalog when Jeremy will be presenting at 'Reclojure' on Thursday 3rd December (15:00 - 18:00).
Performance tests using the Waterloo Diversity SparQL benchmark, with 2 to 3GB of data and 10M 'triples', have shown that compared to neo4j or rdf4j, Crux gets to within 'an order of magnitude'.
A graph query will result in the generation of tens of thousands of Rocks lookups but there is extensive caching.
In terms of mapping Egeria concepts to Crux our instance properties (core and type defined) would all be stored as document properties. Despite the time-series nature of Crux, Jeremy recommended that core properties such as createTime, updateTime are retained as first class properties of instances rather than being tempted to rely on the time series information.
...
Jeremy described Crux as having historical query capability but also important that it provides 'consistent queries'.

Action items

Graham Wallis to raise during Innovation section of next week's TSC (potential to start a small special interest group, eg. under innovation and adoption )
Chris Grote / Graham Wallis / David Radley to consider attendance to DataLog session at Reclojure (TBC)
Chris Grote / Graham Wallis to follow-up with Crux on next steps, post TSC discussion