...
- Announcements [Julien]
- The first Data Lineage Meetup will be taking place in Providence on March 9th at 6 pm. More information: https://openlineage.io/blog/data-lineage-meetup/
- Recent release 0.20.4 [Michael R.]
Added
- Airflow: add new extractor for GCSToGCSOperator#1495@sekikn
Adds a new extractor for this operator. - Flink: resolve topic names from regex, support 1.16.0 #1522@pawel-big-lebowski
Adds support for Flink 1.16.0 and makes the integration resolve topic names from Kafka topic patterns. - Proxy: implement lineage event validator for client proxy #1469@fm100
Implements logic in the proxy (which is still in development) for validating and handling lineage events.
Changed
- CI: use ruff instead of flake8, isort, etc., for linting and formatting #1526@mobuchowski
Adopts the ruff package, which combines several linters and formatters into one fast binary.
- Airflow: add new extractor for GCSToGCSOperator#1495@sekikn
- Thanks to all our contributors!
- More details: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md
- AIP: OpenLineage in Airflow [Julien]
- Motivations
- Key goal of project: provide a central spec everyone case use for lineage
- Ultimate goal for integrations: house them in their home projects, not OpenLineage
- Specific challenge of separate, locally hosted integrations: changes to Airflow have broken the integration
- First-class, built-in support would mean more stability and less effort
- Two-fold proposal
- turn the integration OpenLineage-Airflow package into an Airflow provider
- the lineage extraction logic will live in the operators themselves, not in separate extractors
- Benefits
- increased stability
- easier maintenance over time
- Downside
- burden of maintenance shifts to Airflow community
- but this is logical, and the Airflow community will grow as a result
- More information:
- Motivations
...
Anchor AIP OpenLineage in Airflow AIP OpenLineage in Airflow
...
url https://docs.google.com/document/d/1aN5i8WV2Za7XiHTtyrewZscQ-4eXs1ZNfPw58JscFEw/
...
edit#heading=h.1fv5dvtexgcg
- Next step: to hold a vote on the Airflow mailing list
- Q & A:
- Maciej: Jakub and I will be there to help in the Airflow community
- Julien: I agree, and contributors will likely become Airflow committers
- Enrico: if you were to write a provider today, would you start externally or in Airflow?
- Julien: I would start externally and iterate, then submit for provider status
- Julien: Ross, is the current posture in Airflow to expect provider codebase owners to maintain their code in separate repositories?
- Ross: yes, due to ease of maintenance when APIs change, etc.
- Discussion topic: real-world implementation of OpenLineage (i.e., "What IS lineage, anyway?") [Sheeri]
- Ross: opened an issue about creating a validation suite
- ideas: make Marquez into a validation suite, use the seed data
- Sheeri: minimum coverage: nodes and transformations
- what do you think?
- Brad: best practices for clean extractions but allow for extensibility (e.g., external extractors)
- we plan to use all the core elements (datasets, runs, jobs, etc.)
- John: two pieces are involved: validating emitted events and assessing compliance of facets
- also: naming conventions are becoming unwieldy
- Maciej: : we have been experimenting with providing different facets – custom facets are not a bad thing, and not everything belongs in the core spec
- Julien: custom facets are intended for specific requirements not supported by the core spec
- we need to balance between centralization, where everything must be approved, and chaos, where nothing is – it's a trade-off
- Sheeri: would everyone be willing to write down their custom facets somewhere?
- Julien: we need a place where core and custom facets are all defined – maybe we should work from a Google doc or a PR
- Eric: there is a lot of opportunity to discover custom facets
- setting up an incentive structure to create/share custom facets would be valuable
- Julien: there is a mechanism for discovering custom facets
- a list of all the existing custom facets is available at runtime
- a registration process might be useful for static discovery
- See the Slack channel that is available for continuing this discussion: #spec-compliance
- Ross: opened an issue about creating a validation suite
January 12, 2023 (10am PT)
...