Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Recent releases
  • Demo: creating a new OpenLineage consumer [Daniel]
  • Discussion topic: real-world implementation of OpenLineage (i.e., "What IS lineage, anyway?")
  • Announcement & discussion topic: the thinking behind namespaces
  • Open discussion

January 12, 2023 (10am PT)

...

Attendees:

  • TSC:
      • Announcements
      • Recent release 0.19.2
      • Update on column-level lineage
      • Overview of recent improvements to the Airflow integration
      • Discussion topic: real-world implementation of OpenLineage (i.e., "What IS lineage, anyway?")
      • Announcement & discussion topic: the thinking behind namespaces

      Meeting:

      ...

        • Mike Collado, Staff Software Engineer, Astronomer
        • Julien Le Dem, OpenLineage Project lead
        • Willy Lulciuc, Co-creator of Marquez
        • Michael Robinson, Software Engineer, Dev. Rel., Astronomer
        • Maciej Obuchowski, Software Engineer, GetInData, OpenLineage contributor
        • Mandy Chessell, Egeria Project Lead
        • Daniel Henneberger, 
        • Will Johnson, Senior Cloud Solution Architect, Azure Cloud, Microsoft
        • Jakub "Kuba" Dardziński, Software Engineer, GetInData, OpenLineage contributor
      • And:
        • Petr Hajek, Information Management Professional, Profinit
        • Harel Shein, Director of Engineering, Astronomer
        • Minkyu Park, Senior Software Engineer, Astronomer
        • Sam Holmberg, Software Engineer, Astronomer
        • Ernie Ostic, SVP of Product, MANTA
        • Sheeri Cabral, Technical Product Manager, Lineage, Collibra
        • John Thomas, Software Engineer, Dev. Rel., Astronomer

      Agenda:

      • Announcements
      • Recent release 0.19.2
      • Update on column-level lineage
      • Overview of recent improvements to the Airflow integration
      • Discussion topic: real-world implementation of OpenLineage (i.e., "What IS lineage, anyway?")
      • Announcement & discussion topic: the thinking behind namespaces

      Meeting:

      Widget Connector
      urlhttp://youtube.com/watch?v=hSDTXrZqQmQ

      Notes:

      • Announcements
        • OpenLineage earned Incubation status with the LFAI & Data Foundation at their December TAC meeting!
          • Represents our maturation in terms of governance, code quality assurance practices, documentation, more
          • Required earning the OpenSSF Silver Badge, sponsorship, at least 300 GitHub stars
          • Next up: Graduation (expected in early summer)
      • Recent release 0.19.2 [Michael R.]
      • Column-level lineage update [Maciej]
        • What is the OpenLineage SQL parser?
          • At its core, it’s a Rust library that parses SQL statements and extracts lineage data from it 
          • 80/20 solution - we’ll not be able to parse all possible SQL statements - each database has custom extensions and different syntax, so we focus on standard SQL.
          • Good example of complicated extension: Snowflake COPY INTO https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
          • We primarily use the parser in Airflow integration and Great Expectations integration
          • Why? Airflow does not “understand” a lot of what some operators do, for example PostgreSqlOperator
          • We also have Java support package for parser   
        • What changed previously?
          • Parser in current release can emit column-level lineage!
          • Last OL meeting Piotr Wojtczak, primary author of this change presented new core of parser that enabled that functionality
            https://www.youtube.com/watch?v=Lv_bODeAVYQ
          • Still, the fact that Rust code can do that does not mean we have it for free everywhere
        • What has changed recently?
          • We wrote “glue code” that allows us to use new parser constructs in Airflow integration
          • Error handling just got way easier: SQL parser can “partially” parse SQL construct, and report errors it encountered, with particular statements that caused it.
        • Usage
          • Airflow integration extractors based on SqlExtractor (ex. PostgreSqlExtractor, SnowflakeExtractor, TrinoExtractor…) are now able to extract column-level lineage
          • Close future: Spark will be able to extract lineage from JDBCRelation.
      • Recent improvements to the Airflow integration [Kuba]
        • OpenLineage facets
          • Facets are pieces of metadata that can be attached to the core entities: run, job or dataset
          • Facets provide context to OpenLineage events
          • They can be defined as either part of the OpenLineage spec or custom facets
        • Airflow generic facet
          • Previously multiple custom facets with no standard
            • AirflowVersionRunFacet as an example of rapidly growing facet with version unrelated information
          • Introduced AirflowRunFacet with Task, DAG, TaskInstance and DagRun properties
          • Old facets are going to be deprecated soon. Currently both old and new facets are emitted
            • AirflowRunArgsRunFacet, AirflowVersionRunFacet, AirflowMappedTaskRunFacet will be removed
            • All information from above is moved to AirflowRunFacet
        • Other improvements (added in 0.19.2)
          • SQL extractors now send column-level lineage metadata
          • Further facets standardization

            • Introduced ProcessingEngineRunFacet
              • provides processing engine information, e.g. Airflow or Spark version
            • Improved support for nominal start & end times
              • makes use of data interval (introduced in Airflow 2.x)
              • nominal end time now matches next schedule time
            • DAG owner added to OwnershipJobFacet
            • Added support for S3FileTransformOperator and TrinoOperator (@sekikn’s great contribution)

      December 8, 2022 (10am PT)

      ...

      • TSC:
        • Mike Collado, Staff Software Engineer, Astronomer
        • Julien Le Dem, OpenLineage Project lead
        • Maciej Obuchowski, Software Engineer, GetInData, OpenLineage contributor
        • Mandy Chessell, Egeria Project Lead
        • Willy Lulciuc, Co-creator of Marquez
        • Paweł Leszczyński, Software Engineer, GetInData
        • Ross Turk, Senior Director of Community, Astronomer
        • Howard Yoo, Staff Product Manager, Astronomer
        • Tomasz Nazarewicz, Software Engineer, GetInData
        • Michael Robinson, Software Engineer, Dev. Rel., Astronomer
      • And:
        • Ann Mary Justine, Research Engineer, HP Enterprise
        • Martin Foltin, Master Technologist, HP Enterprise
        • Sam Holmberg, Software Engineer, AstronomerPaweł Leszczyński, Software Engineer, GetInData
        • Aalap Tripathy, Principal Research Engineer, HP Enterprise
        • Petr Hajek, Information Management Professional, Profinit
        • Harel Shein, Director of Engineering, Astronomer
        • Minkyu Park, Senior Software Engineer, Astronomer
        • Ross Turk, Senior Director of Community, Astronomer
        • Benji Lampel, Ecosystem Engineer, AstronomerHoward Yoo, Staff Product Manager, Astronomer
        • Suparna Bhattacharya, Distinguished Technologist, HP Enterprise
        • John Thomas, Software Engineer, Dev. Rel., Astronomer
        • Sergey Serebryakov, Research Engineer, HP Enterprise
        • Glyn Bowden, Chief Technologist, HP Enterprise, CMF
        • Nigel Jones, Maintainer, Egeria/IBM
        • Tomasz Nazarewicz, Software Engineer, GetInData
        • Sheeri Cabral, Technical Product Manager, Lineage, CollibraMichael Robinson, Software Engineer, Dev. Rel., Astronomer
        • Prachi Mishra, Senior Software Engineer, Astronomer

      ...