One of the most common expectations we come across in the metadata landscape is the ability to access historical views of metadata.

The benefits of historical metadata appear in various use cases:

Addressing such use cases can be thought of as a simple matter of providing point-in-time inquiry. Egeria provides interfaces to make such point-in-time inquiries, and a native repository (based on Crux) to efficiently serve them.

However, the reality of temporality introduces some nuances of which we should be aware – particularly in the context of the distributed metadata landscape that Egeria enables.

Bi-temporality in regards to history: "validity" vs "transaction" time

The first place to delve deep is perhaps the conceptual: distinguishing what we really mean by "history" and recognising that there are in fact multiple dimensions involved. There are various excellent examples and articles already out there which I will not hazard to repeat, but suffice to say that it is important to distinguish between these concepts:

The first represents when something really happened, in the real world. For example, the Declaration of Independence was signed on July 4, 1776.

The latter represents when something was known (e.g. captured by a system). For example, King George III only learned of the Declaration of Independence about a month later (around August 10, 1776); however, America's Independence Day has always been recognised as July 4, not August 10.

In an ideal universe, we might hope that the two would be the same – but the real world is rarely so simple.

What we often need to do is capture what we believe to be the truth about reality (the valid time), but accept that we may need to retroactively change that based on new information. That new information could arrive at any point (transaction time), but needs to be able to change what we previously believed to be a historical truth. Being able to identify both dimensions allows us to know what reality was in the past (via valid time), but also the point at which we learned about that reality (which could have been some days, weeks, months or even years after the fact) (through the transaction time).

In Egeria, this comes to life through the distributed nature of the metadata landscape, for example:

Reflecting the past, as best we can...

To cater for these differences in times, and the potential for out-of-order events and information, in Egeria we focus entirely on valid time:

This has a number of implications, not all of which may be immediately obvious:

Other considerations

If you are thinking of embarking on creating your own historical repository to integrate with Egeria, you may want to think through some of these other nuances and implications that we came across when developing the Crux connector:

Classifications

Classifications are sort-of instances (extending InstanceAuditHeader, but not InstanceHeader), so you may want to store them as a unique kind of instance just like entity and relationship (this is the approach taken by the JanusGraph connector, for example).

However, classifications are only accessible through an entity – they cannot be retrieved independently – so you may also want to consider whether it is more optimal to store them directly on the entity to which they are inherently attached (as we have done with the Crux connector).

In either case, remember that the lifecycle of a classification (it's creation and update times, and versions) is still handled independently from that of the entity – and this could have an impact on history:

Instance re-identification

Egeria provides an interface for re-identification (changing the GUID) of instances as well – both entities and relationships. To implement these with regards to retaining history, you will need to make use of the reIdentifiedFromGUID property that was added to InstanceHeader in release 2.9:

In this way, a consumer can then still navigate through the history: