Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The OpenLineage Technical Steering Committee meetings are Monthly on the Second Thursday from 10:00am to 11:00am on the Second Wednesday from 9:30am to 10:30am US Pacific. Here's the meeting info.

All are welcome.

Table of Contents

Next meeting: May 8, 2024 (9:30am PT)

April 10, 2024 (9:30am PT)

Attendees:

  • TSC:
    • Julien Le Dem, OpenLineage project lead, LF AI & Data
    • Michael Robinson, Community Manager, Astronomer
    • Harel Shein, Lineage at Datadog
  • And:

...

    • Sheeri Cabral, Product Manager, ETL, Collibra
    • Eric Veleker, Partnerships, Atlan
    • Jens Pfau, Engineering Manager, Lineage, Google
    • David Twaddell, Architect, HSBC


Agenda:

  • Announcements
  • Recent release highlights
  • Discussion items
    • supporting job-to-job dependencies in the spec
    • improving naming conventions
  • Open discussion

Meeting:


March 13, 2024 (9:30am PT)

Tentative agenda:

  • Announcements
  • Recent releasesrelease 1.9.1 highlights
  • Scala 2.13 support in Spark overview by @Damien Hawes
  • Circuit breaker in Spark & Flink, built-in lineage in Spark @Paweł Leszczyński
  • Discussion items
  • Open discussion

...

  • TSC:
    • Julien Le Dem, OpenLineage project lead, LF AI & Data
    • Michael Robinson, Community Manager, Astronomer
    • Damien Hawes, Booking.com
    • Harel Shein, Engineering Manager, AstronomerDatadog
    • Maciej Obuchowski, Software Engineer, GetInData, OpenLineage committer
    • Mike Collado, Sr. Software Engineer, Snowflake
  • And:
    • Suraj Gupta, Atlan
    • Eric Veleker, Atlan
    • Sheeri Cabral, Product Manager, Collibra
    • Ernie Ostic, IBM/Manta

...

  • Recent releases
  • Announcements
  • Coming soon: simplified job hierarchy in the Spark integration
  • Discussion items
  • Open discussion

Meeting:

Widget Connector
urlhttp://youtube.com/watch?v=O7-ZNCbt880

Widget Connector
urlhttp://youtube.com/watch?v=z-MdLO3lxR8

Widget Connector
urlhttp://youtube.com/watch?v=hvUIaziS2TI

Widget Connector
urlhttp://youtube.com/watch?v=Ql7DR59wdpE

Notes:

Summary

    1. We have added a new communication resource, a LinkedIn company page.
    2. We announced a new committer, Damien Hawes, from Booking.com, who has made significant contributions to the project.
    3. Astronomer and Collibra are co-sponsoring a data lineage meetup on March 19th at the Microsoft New England Conference Center.
    4. Members have talks upcoming at Kafka Summit and Data Council.
    5. We discussed upcoming improvments to job hierarchy in Spark and how this can help answer questions about job scheduling and dependencies.
    6. Damien shared his contributions to the Apache Spark integration, specifically addressing versioning conflicts with Scala.
    7. Eric provided a general update on the interest in and adoption of OpenLineage, particularly in the enterprise space.
    8. Atlan is considering releasing a DAG (Directed Acyclic Graph) instead of a plugin to help users with configuration and troubleshooting.
    9. The next monthly call will be held at a different "location," and participants were encouraged to look out for the updated Zoom link.

...

Update on Spark Integration
    - Damien shared his experience with the scalar two point 13 support to Apache Spark integration. They deployed the open line spark integration into their own internal pipelines and it worked well.
    - However, when they moved to new clusters running different versions of scalar, the jobs failed due to conflicting scalar major versions. The reason for this is that when Java code is compiled, the compiler injects the full class names or full type signature of a method, which includes what its return type is and what its input ran types are.
    - When calling a method in Apache Spark, if the same method has two different types signatures, the JVM throws a runtime error. The solution to this is to compile the entire application for an entire project against the Apache Spark libraries.
    - Damien explains how to configure the app to consume relevant jars and run integration tests for different versions of Spark, with the exception of Spark 2.4 which only uses Scala 2.12. Maciej thanks Damien for his contribution and expresses a desire for faster reviews.
    - Michael Robinson congratulates Damien on becoming a committer and thanks him for his contributions. Eric provides a general update on interest in airflow and spark integrations, with a focus on enterprise adoption and versioning conflicts.
    - They plan to release a Dag instead of a plugin to help with configurations. Michael Robinson concludes the call and announces the next meeting.

...

the call and announces the next meeting.

January 11, 2024 (10am PT)

Attendees:

  • TSC:
    • Julien Le Dem, OpenLineage project lead, LF AI & Data
    • Harel Shein, Datadog Engineering
    • Michael Robinson, Community Manager, Astronomer
  • And:
    • Tatiana Al-Chueyr, Staff Software Engineer, Astronomer
    • Alex Jaglale, Executive, DataGalaxy
    • Jens Pfau, Engineering Manager, Google
    • Eric Veleker, Atlan

Agenda:

  • Recent releases
  • Announcements
  • Discussion items
  • Open discussion

Meeting:

Widget Connector
urlhttp://youtube.com/watch?v=6_XOON9kf6E

Widget Connector
urlhttp://youtube.com/watch?v=itbm8hHAtPQ

Notes:

Summary

    1. We closed their first ever annual ecosystem survey and the results will be published soon.
    2. There is a meetup coming up on January 31st in London, which will be our first in London. It will be an in-person event.
    3. We have a talk at the Kafka Summit in London in March, with key contributors speaking.
    4. We recently released version 1.7.0, with important compatibility notice for the Airflow integration.
    5. There was a discussion about possible improvements to job hierarchy semantics in the Spark integration.
    6. Julien updated the registry proposal and it is close to being implemented.
    7. Eric (Atlan) shared that there is growing demand and adoption of OpenLineage, and organizations are pressing forward due to the perceived business value.
    8. Eric mentioned the need for better documentation and support for different versions and integrations.
    9. Jens suggested expanding the integration matrix to include more dimensions, such as types of data sources and facets.

...

December 14, 2023 (10am PT)

Attendees:

  • TSC:
    • Julien Le Dem, OpenLineage project lead, LF AI & Data
    • Harel Shein, Datadog Engineering
    • Michael Robinson, Community Manager, Astronomer
    • Mandy Chessell, Egeria Project Lead
    • Pawel Leszczynski, Software Engineer, Astronomer/GetInData
  • And:
    • Eric Veleker, Atlan

Agenda:

  • Recent releases
  • Announcements
  • Proposal updates
  • Open discussion

Meeting:

Widget Connector
urlhttp://youtube.com/watch?v=HW3Dd75UXLY

Widget Connector
urlhttp://youtube.com/watch?v=ozxLWjSOfiY

Widget Connector
urlhttp://youtube.com/watch?v=GN-ic0bjNoo

Notes:

Summary

    1. Harel Shein provided announcements about upcoming meetups and shared metrics on community growth.
    2. Harel Shein discussed the release of version 1.6.2, highlighting new features and bug fixes.
    3. Harel Shein shared metrics on contributors and commits, showing an increase in both.
    4. Jens Pfau presented two proposals for column-level lineage, focusing on transformation types and descriptions.
    5. Mandy Chessell suggested including the name of the masking function as an additional property for masking transformations.
    6. Harel Shein expressed appreciation for the proposals and encouraged community members to review and provide feedback.
    7. Eric Veleker expressed gratitude for the momentum and adoption of open lineage, thanking the community for their hard work.
    8. Harel Shein echoed Eric's sentiments and acknowledged the project's growth and industry standard status.
    9. Harel Shein thanked all contributors and adopters for their contributions to the community.

...

  • TSC:
    • Paweł Leszczyński, Software Engineer, GetInData
    • Julien Le Dem, OpenLineage project lead
    • Michael Robinson, Community team, Astronomer
    • Jakub Dardziński, Software Engineer, GetInData
    • Harel Shein, Engineering Manager, Astronomer
    • Maciej Obuchowski, Software Engineer, Astronomer/GetInData, OpenLineage committer
    • Paweł Leszczyński, Software Engineer, Astronomer/GetInData
  • And:
    • Eric Veleker, Atlan
    • Harsh Loomba, Engineer, Upgrade
    • Sheeri Cabral, Product Manager, Collibra
    • Peter Huang, Software Engineer, Apple
    • Jens Pfau, Engineering Manager, Google
    • Shubhambharadwaj, Associate Manager

...