You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

The OpenLineage Technical Steering Committee meetings are Monthly on the Second Wednesday 9:00am to 10:00am US Pacific and the link to join the meeting is https://us02web.zoom.us/j/81831865546?pwd=RTladlNpc0FTTDlFcWRkM2JyazM4Zz09

All are welcome.

Aug 11th 2021

  • Attendees: 
    • TSC:
  • Meeting recording:
  • Meeting notes:
    • Agenda:
      • Coming in OpenLineage 0.1
        • OpenLineage spec versioning
        • Clients
        • Marquez integrations imported in OpenLineage
          • Apache Airflow:
            • BigQuery 
            • Postgres
            • Snowflake
            • Redshift
            • Great Expectations
          • Apache Spark
          • dbt
      • OpenLineage 0.2 scope discussion
        • Facet versioning mechanism
        • OpenLineage Proxy Backend (Issue #152)
        • Kafka client
      • Roadmap
      • Open discussion

July 14th 2021

  • Attendees: 
    • TSC:
      • Julien Le Dem
      • Mandy Chessel
      • Michael Collado
      • Willy Lulciuc
  • Meeting recording:
  • Meeting notes
    • Agenda:
    • Notes: 

      Mission statement:

      Spec versioning mechanism:

      • The goal is to commit to compatible changes once 0.1 is published

      • We need a follow up to separate core facet versioning


      => TODO: create a separate github ticket.
      • The lineage event should have a field that identifies what version of the spec it was produced with

        • => TODO: create a github issue for this

      • TODO: Add issue to document version number semantics (SCHEMAVER)

      Extend Event State notion:

      OpenLineage 0.1:

      • finalize a few spec details for 0.1 : a few items left to discuss.

        • In particular job naming

        • parent job model

      • Importing Marquez integrations in OpenLineage

      Open Discussion:

      • connecting the consumer and producer

        • TODO: ticket to track distribution mechanism

        • options:

          • Would we need a consumption client to make it easy for consumers to get events from Kafka for example?

          • OpenLineage provides client libraries to serialize/deserialize events as well as sending them.

        • We can have documentation on how to send to backends that are not Marquez using HTTP and existing gateway mechanism to queues.

        • Do we have a mutual third party or the client know where to send?

      • Source code location finalization

      • job naming convention

        • you don't always have a nested execution

          • can call a parent

        • parent job

        • You can have a job calling another one.

        • always distinguish a job and its run

      • need a separate notion for job dependencies

      • need to capture event driven: TODO: create ticket.


      TODO(Julien): update job naming ticket to have the discussion.

June 9th 2021

  • Attendees: 
    • TSC:
      Julien Le Dem: Marquez, Datakin
      Drew Banin: dbt, CPO at fishtown analytics
      Maciej Obuchowski: Marquez, GetIndata consulting company
      Zhamak Dehghani: Datamesh, Open protocol of observability for data ecosystem is a big piece of Datamesh
      Daniel Henneberger: building a database, interested in lineage
      Mandy Chessel: Lead of Egeria, metadata exchange. lineage is a great extension that volunteers lineage
      Willy Lulciuc: co-creator of Marquez
      Michael Collado: Datakin, OpenLineage end-to-end holistic approach.
    • And:
      Kedar Rajwade: consulting on distributed systems.
      Barr Yaron: dbt, PM at Fishtown analytics on metadata.
      Victor Shafran: co-founder at databand.ai pipeline monitoring company. lineage is a common issue
    • Excused: Ryan Blue, James Campbell
  • Meeting recording:
  • Meeting notes:

    Agenda:

    • project communication

    • Technical charter review

    • medium term roadmap discussion

    Notes:

    • project communication

      • github: for specs, designs, reviews and building consensus (issues and PRs)

      • email: for announcements, notes, etc

      • Slack: transient discussions, does not maintain history. Any decision making or notes should go to persistent medium (email and github)

      • monthly meeting: recorded, notes and recording published on the wiki

    • Technical Charter review:

      • TODO: Finalize the mission statement. TSC members to comment in the doc.


  • No labels