...
- Announcements [Julien]
- The first Data Lineage Meetup will be taking place in Providence on March 9th at 6 pm. More information: https://openlineage.io/blog/data-lineage-meetup/
- Recent release 0.20.4 [Michael R.]
Added
- Airflow: add new extractor for GCSToGCSOperator#1495@sekikn
Adds a new extractor for this operator. - Flink: resolve topic names from regex, support 1.16.0 #1522@pawel-big-lebowski
Adds support for Flink 1.16.0 and makes the integration resolve topic names from Kafka topic patterns. - Proxy: implement lineage event validator for client proxy #1469@fm100
Implements logic in the proxy (which is still in development) for validating and handling lineage events.
Changed
- CI: use ruff instead of flake8, isort, etc., for linting and formatting #1526@mobuchowski
Adopts the ruff package, which combines several linters and formatters into one fast binary.
- Airflow: add new extractor for GCSToGCSOperator#1495@sekikn
- Thanks to all our contributors!
- More details: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md
- AIP: OpenLineage in Airflow [Julien]
- Motivations
- Key goal of project: provide a central spec everyone case use for lineage
- Ultimate goal for integrations: house them in their home projects, not OpenLineage
- Specific challenge of separate, locally hosted integrations: changes to Airflow have broken the integration
- First-class, built-in support would mean more stability and less effort
- Two-fold proposal
- turn the integration OpenLineage-Airflow package into an Airflow provider
- the lineage extraction logic will live in the operators themselves, not in separate extractors
- Benefits
- increased stability
- easier maintenance over time
- Downside
- burden of maintenance shifts to Airflow community
- but this is logical, and the Airflow community will grow as a result
- More information:
- Motivations
Widget Connector url https://docs.google.com/document/d/1aN5i8WV2Za7XiHTtyrewZscQ-4eXs1ZNfPw58JscFEw/edit?usp=sharing
- Next step: to hold a vote on the Airflow mailing list
- Q & A:
- Maciej: Jakub and I will be there to help in the Airflow community
- Julien: I agree, and contributors will likely become Airflow committers
- Enrico: if you were to write a provider today, would you start externally or in Airflow?
- Julien: I would start externally and iterate, then submit for provider status
- Julien: Ross, is the current posture in Airflow to expect provider codebase owners to maintain their code in separate repositories?
- Ross: yes, due to ease of maintenance when APIs change, etc.
- Discussion topic: real-world implementation of OpenLineage (i.e., "What IS lineage, anyway?") [Sheeri]
- Ross: opened an issue about creating a validation suite
- ideas: make Marquez into a validation suite, use the seed data
- Sheeri: minimum coverage: nodes and transformations
- what do you think?
- Brad: best practices for clean extractions but allow for extensibility (e.g., external extractors)
- we plan to use all the core elements (datasets, runs, jobs, etc.)
- John: two pieces are involved: validating emitted events and assessing compliance of facets
- also: naming conventions are becoming unwieldy
- Maciej:
- Ross: opened an issue about creating a validation suite
January 12, 2023 (10am PT)
...