Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Recent talks
  • Release 0.10.0
  • Flink integration
    • Entry point: built Flink example app to find out if metadata, schema extractable
    • Maciej also successfully read data from Iceberg
    • Flink provides two APIs
    • Created integration tests for all use cases, added them to CircleCI
    • New Java client: different configs for HTTP, Kafka endpoints
    • Missing feature: make sure crashing integration doesn't kill a Flink job
    • Coming soon: experimental version
      • not focused on streaming currently
      • focus: how to extract info from Flink
      • feedback from community desired
    • Q & A
      • Will: is the code an extension of OL or an integration?
        • an integration akin to the dbt integration
      • Willy: any changes to the spec/schema? Is the state part of the payload?
        • new state should be added (currently "other")
  • New docs site
    • Up until today, docs have been on the website and spread throughout READMEs
    • Docusaurus deployment now available
    • Changes to structure as well as content welcome
    • Not currently live but will be soon
    • Can be hosted at docs.openlineage.io
    • Everything is in Markdown
    • Another motivation: Keboola use case not part of the codebase, so a docs site could describe it
    • Next milestone: we all decide to publish it
    • Q & A
      • Willy: let's add a section on defining custom facets
      • Ross: feel free to add another page stub
      • Ross: also need a FAQ
      • Julien: we could autogenerate some docs
      • Ross: there are downsides to such an approach
      • Julien: let's open issues when answers aren't good enough
      • Willy: descriptions of facets could be improved
      • Julien: we could version them
      • Ross: I'll look for signs that people are not finding docs on the version they are using
  • Discussion: streaming in Flink integration
    • Has there been any evolution in the thinking on support for streaming?
      • Julien: start event, complete event, snapshots in between limited to certain number per time interval
      • Paweł: we can make the snapshot volume configurable
    • Does Flink support sending data to multiple tables like Spark?
      • Yes, multiple outputs supported by OpenLineage model
      • Marquez, the reference implementation of OL, combines the outputs
    • Looking forward to seeing this documented on the new docs site
  • Open discussion
    • What's the logical approach to avoid overloading the backend with lineage events? [Colin]
      • Paweł: we only send events when checkpoints change; configurable for more events
      • Will: at Microsoft we're working on a fix that caches and consolidates OL events
    • It'd be awesome to see example payloads for streaming in docsdocs [Colin]
      • Ross: they're currently spread out; it'd be nice to have them in one place
    • How can we create custom facets? [Sandeep]
      • Julien: two options; anyone can create a custom facet without asking permission, or open a proposal/issue

...