...
- Attendees:
- TSC:
Ryan Blue
Maciej Obuchowski
Michael Collado
Daniel Henneberger
Willy Lulciuc
Mandy Chessell
Julien Le Dem
- And:
Peter Hicks
Minkyu Park
Daniel Avancini
- TSC:
- Meeting recording:
- zoom link
- Passcode: =RBUj01C
- Meeting notes:
- Agenda:
- Coming in OpenLineage 0.1
- OpenLineage spec versioning
- Clients
- Marquez integrations imported in OpenLineage
- Apache Airflow:
- BigQuery
- Postgres
- Snowflake
- Redshift
- Great Expectations
- Apache Spark
- dbt
- Apache Airflow:
- OpenLineage 0.2 scope discussion
- Facet versioning mechanism (Issue #153)
- OpenLineage Proxy Backend (Issue #152)
- Kafka client
- Roadmap
- Open discussion
- Coming in OpenLineage 0.1
- Slides: https://docs.google.com/presentation/d/1Lxp2NB9xk8sTXOnT0_gTXicKX5FsktWa/edit#slide=id.ge80fbcb367_0_14
- Notes:
- OpenLineage 0.1 is being published
- Coming in OpenLineage 0.1
- OpenLineage spec versioning
- Clients (Java, Python)
- Marquez integrations imported in OpenLineage
- Apache Airflow:
- BigQuery
- Postgres
- Snowflake
- Redshift
- Great Expectations
- Apache Spark
- dbt
- Question: How is airflow capturing openlineage events?
- openlineage-airflow installed on the airflow instance
- adapters per operator
- Apache Airflow:
- OpenLineage 0.2 scope discussion
- Facet versioning mechanism (Issue #153)
- OpenLineage Proxy Backend (Issue #152)
- Questions:
- What is the advantage of the proxy backend?
- The consumer does not need to implement an endpoint and can consume from kafka
- can configure what to do with events independently of various integrations
- first step to having a routing mechanism:
- to send events to multiple consumer
- to have rule-based routing
- to enable archiving the event in addition to sending them
- Is it included in OpenLineage?
- Yes (Otherwise it would have to be in Egeria)
- Does it include error management or retry policy? What if the proxy dies? Do we care about durability?
- Yes we care about durability
- first implementation to be synchronous. single transaction to Kafka per event.
- future might be configurable to adjust depending on context (guaranteed delivery vs performance batching)
- What technology should we use?
- Proposed: Java + spring boot (like Egeria)
- discussion to use Java + dropwizard like Marquez
- general consensus on using java. (framework TBD)
- In the future, might have a go implementation to enable lightweight sidecar pattern
- What is the advantage of the proxy backend?
- Questions:
- Kafka client
- Roadmap
- Open discussion
How do we define extension points for integrations? For example hooks, spark and airflow for the user to add adapters/facets without having to modify OL.
- TODO: create a ticket to track this
- Apache Iceberg interest in OpenLineage:
- Would want to add additional notifications
- how many files read or written
- How long a commit took.
- How many attempts to commit were needed?
- TODO: create ticket to enable Iceberg facets to be added to OpenLineage events
- Iceberg needs to send events independently of where the library is used. (example: plain java process or other)
- TODO: need ticket for this => #167 Iceberg integration
- TODO: ticket for PrestoDB/Trino integrations
- => #164 Trino and #165 PrestoDB
- Would want to add additional notifications
- Egeria has a weekly community call
- September 1st will be about OpenLineage
- Also an incoming webinar
- Agenda:
...