Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Next meeting: Feb 9th 2022 (9am PT)

Tentative agenda:

Attendees:

  • TSC:
    • Mike Collado: Staff Software Engineer, Datakin
    • Maciej Obuchowski: Software Engineer, GetInData, OpenLineage contributor
    • Julien Le Dem: OpenLineage Project lead
  • And:
    • Michael Robinson: Dev Rel Engineer
    • Ross Turk: VP of Marketing, Datakin
    • Minkyu Park: Senior Software Engineer, Datakin
    • Srikanth Venkat: Product Manager, Privacera
    • John Thomas: Support Engineer, Datakin
    • Peter Scharling: EI Group
    • Peter Hicks: Senior Software Engineer, Datakin
    • Dalin Kim: Data Engineer, Northwestern Mutual
    • Kevin Mellott: Data Engineer, Northwestern Mutual
    • Will Johnson: Senior Cloud Solution Architect, Azure Cloud, Microsoft
    • Kelsy Brennan: EI Group
    • Aaron Colcord: Data Engineer, Northwestern Mutual

Agenda:

  • OpenLineage recent release overview (0.5.1) [Julien]
    • No 0.5.0 due to bug
    • Support for dbt-spark adapter
    • New backend to proxy OL events
    • Support for custom facets
  • TaskInstanceListener now official way to integrate with Airflow [Julien]
    • Integration runs on worker side
    • Will be in next OL release of airflow (2.3)
    • Thanks to Maciej for his work on this
  • Apache Flink integration [Julien]
    • Ticket for discussion available
    • Integration test setup
    • Early stages
  • Dagster integration demo [Dalin]
    • Initiated by Dalin Kim
    • OL used with Dagster on orchestration layer
    • Utilizes Dagster sensor
    • Introduces OL sensor that can be added to Dagster repo definition
    • Uses cursor to keep track of ID
    • Looking for feedback after review complete
    • Discussion:
      • Dalin: needed: way to interpret Dagster asset for OL
      • Julien: common code from Great Expectations/Dagster integrations
      • Michael C: do you pass parent run ID in child job when sending the job to MZ?
      • Hierarchy can be extended indefinitely – parent/child relationship can be modeled
      • Maciej: the sensor kept failing – does this mean the events persisted despite being down?
      • Dalin: yes - the sensor’s cursor is tracked, so even if repo goes down it should be able to pick up from last cursor
      • Dalin: hoping for more feedback
      • Julien: slides will be posted on slack channel, also tickets
  • Open discussion
    • Will: how is OL ensuring consistency of datasets across integrations? 
    • Julien: (jokingly) Read the docs! Naming conventions for datasets can be found there
    • Julien: need for tutorial on creating integrations
    • Srikanth: have done some of this work in Atlas
    • Kevin: are there libraries on the horizon to play this role?
    • Julien: yes
    • Srikanth: good idea to have model spec to provide enforceable standard
    • Julien: agreed; currently models are based on the JSON schema spec
    • Julien: contributions welcome; opening a ticket about this makes sense
    • Will: Flink integration: MZ focused on batch jobs
    • Julien: we want to make sure we need to add checkpointing
    • Julien: there will be discussion in OLMZ communities about this
      • In MZ, there are questions about what counts as a version or not
    • Julien: a consistent model is needed
    • Julien: one solution being looked into is Arrow
    • Julien: everyone should feel welcome to propose agenda items (even old projects)
    • Srikanth: who are you working with on the Flink comms side? Will get back to you.

Meeting:

  • Zoom link
  • Zoom password: RTf3k*DZ
  • Slides
  • OpenLineage recent release overview (0.5.1)
  • Flink effort
  • Dagster integration
  • Open discussion

Jan 12th 2022 (9am PT)

Attendees:

...