...
All are welcome.
Table of Contents |
---|
Next meeting:
...
Feb 9th 2022 (9am PT)
Jan 12th 2022 (9am PT)
Attendees:
Tentative agenda:
- TSC:
- Mike Collado: Eng, Datakin
- Mandy Chessel: Lead Egeria project
- Maciej Obuchowski: Eng GetInData, OpenLineage contributor
- Willy Lulciuc: Co-creator of Marquez
- Julien: OpenLineage Project lead
- And:
- Michael Robinson: dev rel
- Ross Turk: VP marketing Datakin
- Minkyu Park: Dev at Datakin, learning about MQZ and OL.
- Conor Beverland: Senior Dir of Product, Astronomer
- Srikanth Venkat, Product Management, Privacera
Agenda:
- OpenLineage recent releases overview [Julien]
- OpenLineage 0.4 release overview:
- OpenLineage 0.4 release overview: https://github.com/OpenLineage/OpenLineage/releases/tag/0.4.0
- Databricks install README and init scripts (by Will)
- Iceberg integration (by Pawel)
- Kafka read and write support (by Olek and Mike)
- Arbitrary parameters supported in HTTP URL construction (by Will)
- Increased coverage (Pawel/Maciej)
- OpenLineage 0.5 release overviewhttps://github.com/OpenLineage/OpenLineage/comparereleases/tag/0.4.0...main
- Egeria support for OpenLineage [Mandy]
- Airflow TaskListener for OpenLineage integration [Maciej]
- Open discussion
Meeting:
...
- 0
- Databricks install README and init scripts (by Will)
- Iceberg integration (by Pawel)
- Kafka read and write support (by Olek and Mike)
- Arbitrary parameters supported in HTTP URL construction (by Will)
- Increased coverage (Pawel/Maciej)
- OpenLineage 0.5 release overview
- 0
- Egeria support for OpenLineage [Mandy]
- Airflow TaskListener for OpenLineage integration [Maciej]
- Open discussion
Meeting:
- Slides
- Passcode:
- Zoom link
Notes:
0.4 release [Willy]:
- Encouraged by Iceberg adoption
- Using the new features highly recommended
0.5 preview [Willy]:
- Thanks go to Mike Collado for work on PRs, proposal; also to Mandy for work on HTTP backend over last two months
- HTTP client will decrease confusion about how to capture metadata
Tasklistener for OL Integration [Maciej]:
1.10 required modifying each DAG, which was cumbersome and not compatible with 2.1
2.1: lineage backend comparable to Apache Atlas’ old backend
- benefit: provides all info about events
- downside: cannot notify about task starts/failures
2.3: Airflow Event Listener
- Status: not merged yet, in final reviews for deployment with 0.6
- Improvements: transparent, less exposure, enables pull model using queue, enables Egeria and other projects in the future (e.g., DataHub)
- Discussion [Julien, Maciej, Willy, Mike]:
- generic: supports additional functionality
- extendable to different kinds of events, e.g., scheduling
- makes more data available
- much less brittle because depends on public API
- requires little configuration
- will not do away with registration of listeners/extractors
- entry point mechanism comparable to service loaded in Java, requires env variables
- theoretically possible to back port it to earlier versions of Airflow (as far as 1.10)
- possibly helpful to document that we have 3 approaches but are not recommending older ones, mention that this changes only how we collate
- older approaches can be deprecated; it will be important to monitor the community to determine timing of this
Egeria Support for OpenLineage [Mandy]:
- Monthly releases
- OpenLineage support ready in recent release
- Metaphor: Lego blocks
- OL events can brought in through API or proxy backend with Kafka
- events augmentable in Egeria, storable or publishable in Marquez or Kafka for distribution or to log store (e.g., file system)
- Can validate that a process is running correctly
- See documentation in Egeria about proxy backend and extensions, API mechanism
- Diagram in documentation illustrates capabilities
- Discussion [Julien, Mandy, Srikanth, Mike]:
- Egeria sees value of OpenLineage
- Engine is uncoupled from receivers
- Endpoint is simple, allowing independent management of processes
- Some transformation of payload during storage
- Kafka integration coming in 0.5
- Customers expect ability to filter data
- Varying granularity of metadata already possible through versioning with Marquez
Open Discussion:
Proposal to convert licenses to SPDX [Michael]: no objections
Dec 8th 2021 (9am PT)
TODO: add notes
...