Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Announcements [Julien]
  • Recent release 0.20.4 [Michael R.]
    • Added

      • Airflow: add new extractor for GCSToGCSOperator#1495@sekikn
        Adds a new extractor for this operator.
      • Flink: resolve topic names from regex, support 1.16.0 #1522@pawel-big-lebowski
        Adds support for Flink 1.16.0 and makes the integration resolve topic names from Kafka topic patterns.
      • Proxy: implement lineage event validator for client proxy #1469@fm100
        Implements logic in the proxy (which is still in development) for validating and handling lineage events.

      Changed

      • CI: use ruff instead of flake8, isort, etc., for linting and formatting #1526@mobuchowski
        Adopts the ruff package, which combines several linters and formatters into one fast binary.
    • Thanks to all our contributors!
    • More details: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md
  • AIP: OpenLineage in Airflow [Julien]
    • Motivations 
      • Key goal of project: provide a central spec everyone case use for lineage
      • Ultimate goal for integrations: house them in their home projects, not OpenLineage
      • Specific challenge of separate, locally hosted integrations: changes to Airflow have broken the integration
      • First-class, built-in support would mean more stability and less effort
    • Two-fold proposal
      • turn the integration OpenLineage-Airflow package into an Airflow provider
      • the lineage extraction logic will live in the operators themselves, not in separate extractors
    • Benefits
      • increased stability
      • easier maintenance over time
    • Downside
      • burden of maintenance shifts to Airflow community
      • but this is logical, and the Airflow community will grow as a result
    • More information:

                 

Widget Connector
urlhttps://docs.google.com/document/d/1aN5i8WV2Za7XiHTtyrewZscQ-4eXs1ZNfPw58JscFEw/edit?usp=sharing

    • Next step: to hold a vote on the Airflow mailing list
    • Q & A:
      • Maciej: Jakub and I will be there to help in the Airflow community
      • Julien: I agree, and contributors will likely become Airflow committers
      • Enrico: if you were to write a provider today, would you start externally or in Airflow?
      • Julien: I would start externally and iterate, then submit for provider status
      • Julien: Ross, is the current posture in Airflow to expect provider codebase owners to maintain their code in separate repositories?
      • Ross: yes, due to ease of maintenance when APIs change, etc.
  • Discussion topic: real-world implementation of OpenLineage (i.e., "What IS lineage, anyway?") [Sheeri]
    • Ross: opened an issue about creating a validation suite
      • ideas: make Marquez into a validation suite, use the seed data
    • Sheeri: minimum coverage: nodes and transformations
      • what do you think?
    • Brad: best practices for clean extractions but allow for extensibility (e.g., external extractors)
      • we plan to use all the core elements (datasets, runs, jobs, etc.)
    • John: two pieces are involved: validating emitted events and assessing compliance of facets
      • also: naming conventions are becoming unwieldy
    • Maciej: 

January 12, 2023 (10am PT)

...