Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

All are welcome.

Table of Contents

Next meeting:

...

Feb 9th 2022 (9am PT)

Jan 12th 2022 (9am PT)

Attendees:

Tentative agenda:


  • TSC:


    • Mike Collado: Eng, Datakin
    • Mandy Chessel: Lead Egeria project
    • Maciej Obuchowski: Eng GetInData, OpenLineage contributor
    • Willy Lulciuc: Co-creator of Marquez
    • Julien: OpenLineage Project lead
  • And:
    • Michael Robinson: dev rel
    • Ross Turk: VP marketing Datakin
    • Minkyu Park: Dev at Datakin, learning about MQZ and OL.
    • Conor Beverland: Senior Dir of Product, Astronomer
    • Srikanth Venkat, Product Management, Privacera

Agenda:

Meeting: 

...

Meeting: 

Notes:

0.4 release [Willy]:

  • Encouraged by Iceberg adoption
  • Using the new features highly recommended

0.5 preview [Willy]:

  • Thanks go to Mike Collado for work on PRs, proposal; also to Mandy for work on HTTP backend over last two months
  • HTTP client will decrease confusion about how to capture metadata

Tasklistener for OL Integration [Maciej]:

1.10 required modifying each DAG, which was cumbersome and not compatible with 2.1

2.1: lineage backend comparable to Apache Atlas’ old backend

  • benefit: provides all info about events
  • downside: cannot notify about task starts/failures

2.3: Airflow Event Listener

  • Status: not merged yet, in final reviews for deployment with 0.6
  • Improvements: transparent, less exposure, enables pull model using queue, enables Egeria and other projects in the future (e.g., DataHub)
  • Discussion [Julien, Maciej, Willy, Mike]:
    • generic: supports additional functionality 
    • extendable to different kinds of events, e.g., scheduling
    • makes more data available 
    • much less brittle because depends on public API
    • requires little configuration
    • will not do away with registration of listeners/extractors
    • entry point mechanism comparable to service loaded in Java, requires env variables
    • theoretically possible to back port it to earlier versions of Airflow (as far as 1.10)
    • possibly helpful to document that we have 3 approaches but are not recommending older ones, mention that this changes only how we collate
    • older approaches can be deprecated; it will be important to monitor the community to determine timing of this

Egeria Support for OpenLineage [Mandy]:

  • Monthly releases
  • OpenLineage support ready in recent release
  • Metaphor: Lego blocks
    • OL events can brought in through API or proxy backend with Kafka
    • events augmentable in Egeria, storable or publishable in Marquez or Kafka for distribution or to log store (e.g., file system)
  • Can validate that a process is running correctly
  • See documentation in Egeria about proxy backend and extensions, API mechanism
  • Diagram in documentation illustrates capabilities
  • Discussion [Julien, Mandy, Srikanth, Mike]:
  • Egeria sees value of OpenLineage
  • Engine is uncoupled from receivers
  • Endpoint is simple, allowing independent management of processes
  • Some transformation of payload during storage
  • Kafka integration coming in 0.5
  • Customers expect ability to filter data
  • Varying granularity of metadata already possible through versioning with Marquez 

Open Discussion:

Proposal to convert licenses to SPDX [Michael]: no objections

Dec 8th 2021 (9am PT)

TODO: add notes

...