Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Announcements [Julien]
  • Recent release 0.20.4 [Michael R.]
    • Added

      • Airflow: add new extractor for GCSToGCSOperator#1495@sekikn
        Adds a new extractor for this operator.
      • Flink: resolve topic names from regex, support 1.16.0 #1522@pawel-big-lebowski
        Adds support for Flink 1.16.0 and makes the integration resolve topic names from Kafka topic patterns.
      • Proxy: implement lineage event validator for client proxy #1469@fm100
        Implements logic in the proxy (which is still in development) for validating and handling lineage events.

      Changed

      • CI: use ruff instead of flake8, isort, etc., for linting and formatting #1526@mobuchowski
        Adopts the ruff package, which combines several linters and formatters into one fast binary.
    • Thanks to all our contributors!
    • More details: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md
  • AIP: OpenLineage in Airflow [Julien]
    • Motivations 
      • Key goal of project: provide a central spec everyone case use for lineage
      • Ultimate goal for integrations: house them in their home projects, not OpenLineage
      • Specific challenge of separate, locally hosted integrations: changes to Airflow have broken the integration
      • First-class, built-in support would mean more stability and less effort
    • Two-fold proposal
      • turn the integration OpenLineage-Airflow package into an Airflow provider
      • the lineage extraction logic will live in the operators themselves, not in separate extractors
    • Benefits
      • increased stability
      • easier maintenance over time
    • Downside
      • burden of maintenance shifts to Airflow community
      • but this is logical, and the Airflow community will grow as a result
    • More information:

      ...

      Anchor
      AIP OpenLineage in Airflow
      AIP OpenLineage in Airflow

...

    • urlhttps://docs.google.com/document/d/1aN5i8WV2Za7XiHTtyrewZscQ-4eXs1ZNfPw58JscFEw/

...

    • edit#heading=h.1fv5dvtexgcg
    • Next step: to hold a vote on the Airflow mailing list
    • Q & A:
      • Maciej: Jakub and I will be there to help in the Airflow community
      • Julien: I agree, and contributors will likely become Airflow committers
      • Enrico: if you were to write a provider today, would you start externally or in Airflow?
      • Julien: I would start externally and iterate, then submit for provider status
      • Julien: Ross, is the current posture in Airflow to expect provider codebase owners to maintain their code in separate repositories?
      • Ross: yes, due to ease of maintenance when APIs change, etc.
  • Discussion topic: real-world implementation of OpenLineage (i.e., "What IS lineage, anyway?") [Sheeri]
    • Ross: opened an issue about creating a validation suite
      • ideas: make Marquez into a validation suite, use the seed data
    • Sheeri: minimum coverage: nodes and transformations
      • what do you think?
    • Brad: best practices for clean extractions but allow for extensibility (e.g., external extractors)
      • we plan to use all the core elements (datasets, runs, jobs, etc.)
    • John: two pieces are involved: validating emitted events and assessing compliance of facets
      • also: naming conventions are becoming unwieldy
    • Maciej: we have been experimenting with providing different facets – custom facets are not a bad thing, and not everything belongs in the core spec
    • Julien: custom facets are intended for specific requirements not supported by the core spec
      • we need to balance between centralization, where everything must be approved, and chaos, where nothing is – it's a trade-off
    • Sheeri: would everyone be willing to write down their custom facets somewhere?
    • Julien: we need a place where core and custom facets are all defined – maybe we should work from a Google doc or a PR
    • Eric: there is a lot of opportunity to discover custom facets
      • setting up an incentive structure to create/share custom facets would be valuable
    • Julien: there is a mechanism for discovering custom facets
      • a list of all the existing custom facets is available at runtime
      • a registration process might be useful for static discovery
    • See the Slack channel that is available for continuing this discussion: #spec-compliance

January 12, 2023 (10am PT)

...