Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • TSC:
    • Mike Collado: Eng, Datakin
    • Mandy Chessel: Lead Egeria project
    • Maciej Obuchowski: Eng GetInData, OpenLineage contributor
    • Willy Lulciuc: Co-creator of Marquez
    • Julien: OpenLineage Project lead
  • And:
    • Michael Robinson: dev relDev Rel
    • Ross Turk: VP marketing Marketing Datakin
    • Minkyu Park: Dev at Datakin, learning about MQZ and OL.
    • Conor Beverland: Senior Dir of Product, Astronomer
    • Srikanth Venkat, Product Management, Privacera
    • Mark Taylor, Technical P.M., Microsoft
    • Harish Sune, Technical Architect, NE Analytics
    • Joshua Wankowski, Associate Data Engineer, Northwestern Mutual
    • Arpita Grange, Senior Technical Lead for Business Intelligence Solutions, Asurion

Agenda:

...

Proposal to convert licenses to SPDX [Michael]: no objections

Dec 8th 2021 (9am PT)

TODOAttendees: add notes

Agenda:

  • SPDX headers [Mandy Chessel]
  • Azure Purview + OpenLineage [Will Johnson, Mark Taylor]
  • Logging backend (OpenTelemetry, ...) [Julien Le Dem]
  • Open discussion

Meeting recording:

TSC:

  • Mike Collado, Staff Engineer, Datakin
  • Willy Lulciuc, Co-creator of Marquez, Datakin
  • Mandy Chessel, Egeria Project Lead
  • Julian Le Dem, OpenLineage Project Lead, CTO Datakin

And:

  • Peter Hicks, Software Engineer, Datakin
  • Srikanth Venkat, Product Management, Microsoft
  • Ross Turk, VP Marketing, Datakin
  • Maciej Obuchowski: Engineer GetInData, OpenLineage contributor
  • John Thomas, Support Engineer, Datakin
  • Minkyu Park, Engineer, Datakin
  • Michael Robinson, Dev Rel Engineer
  • Will Johnson, Senior Cloud Solution Architect, Azure Cloud, Microsoft
  • Mark Taylor, Principal Technical PM, Microsoft
  • Travis Hilbert, Associate Consultant, Microsoft

Agenda:

  • SPDX headers [Mandy]
  • Azure Purview + OpenLineage [Will and Mark]
  • Logging backend (OpenTelemetry) [Julien]
  • Open discussion

Meeting recording:

Notes:

Software Package Data Exchange (SPDX) Tags [Mandy]

  • Open standard for creating software bill of materials
  • Includes set of short identifiers for open source licenses
    • both human readable and machine processable
    • easy to maintain and validate
  • Full license added in License file at top of git repository
  • Each file includes the SPDX-License-Identifier tag
  • Proposed: we use this approach in OpenLineage
  • Becoming a best practice in open source development
  • Julien: "a no brainer" 
  • Next question: how to integrate (implement going forward or add tags throughout project?)
  • Willy: throughout existing; should also do with Marquez
  • Mike: update build check to check for tags in new source files? 
  • Julien: must find right build plugins, two passes might be necessary
  • Julien: all agreed?; adopted; someone should create issue
  • Julien: Maven plugins exist to check and add tag if missing

Azure Purview Integration [Srikanth, Will]

  • Overview of Azure Purview
    • Metadata and governance platform across MS, new 
    • End-to-end governance practices
    • Goal is to fill gaps in lineage
  • Database Lineage in Azure Purview
    • Began as hackathon project at Microsoft
    • Sought way to send lineage data directly to Purview (rather than use architecture of Marquez)
    • Azure Functions used to send data from Databricks through serverless compute and event hub to Purview
    • Required adapter pattern to make emissions conform to Atlas
    • Challenges:
      • automating getting most recent OL jar into Databricks; created PR for this with emit script
      • needed to use API key passed in URL parameter; support for this integrated with PR
    • Have goal of extending use of OpenLineage inside of Spark further 
    • Motivation: didn't want to be dependent on catalog API, particular flavor of Spark
    • Plans include other integrations, including dbt
    • Want to be respectful of OpenLineage's global scope, even if it means metadata on Purview side not real-time 
    • Want to incorporate filtering capability, make it customizable based on particular connector
    • Interest extends beyond Databricks (e.g., Snowflake)
    • Eager to see issue #181 addressed: ability to tack on a MS jar to installation where OpenLineage is
    • Possible PR in future: emit metadata outside a run (e.g., as dataset facets); would meet need at MS

Logging backends [Julien]

  • Open suggestion: add ability to send events to a logging aggregator (e.g., Datadog)
  • Mandy: needed in addition to proxy backend?
  • Proxy backend could be distribution endpoint, first location for this
  • Use case: experimentation
  • Proposed: open a ticket

Discussion

  • Azure PRs, other merged PRs will be in 0.4
  • zoom link
  • Passcode: SnEa9zJ?
  • Slides

Nov 10th 2021 (9am PT)

Attendees:

...