...
- TSC:
- Mike Collado: Eng, Datakin
- Mandy Chessel: Lead Egeria project
- Maciej Obuchowski: Eng GetInData, OpenLineage contributor
- Willy Lulciuc: Co-creator of Marquez
- Julien: OpenLineage Project lead
- And:
- Michael Robinson: dev relDev Rel
- Ross Turk: VP marketing Marketing Datakin
- Minkyu Park: Dev at Datakin, learning about MQZ and OL.
- Conor Beverland: Senior Dir of Product, Astronomer
- Srikanth Venkat, Product Management, Privacera
- Mark Taylor, Technical P.M., Microsoft
- Harish Sune, Technical Architect, NE Analytics
- Joshua Wankowski, Associate Data Engineer, Northwestern Mutual
- Arpita Grange, Senior Technical Lead for Business Intelligence Solutions, Asurion
Agenda:
- OpenLineage recent releases overview [Julien]
- OpenLineage 0.4 release overview: https://github.com/OpenLineage/OpenLineage/releases/tag/0.4.0
- Databricks install README and init scripts (by Will)
- Iceberg integration (by Pawel)
- Kafka read and write support (by Olek and Mike)
- Arbitrary parameters supported in HTTP URL construction (by Will)
- Increased coverage (Pawel/Maciej)
- OpenLineage 0.5 release overview
- OpenLineage 0.4 release overview: https://github.com/OpenLineage/OpenLineage/releases/tag/0.4.0
- Egeria support for OpenLineage [Mandy]
- Airflow TaskListener for OpenLineage integration [Maciej]
- Open discussion
...
Proposal to convert licenses to SPDX [Michael]: no objections
Dec 8th 2021 (9am PT)
TODOAttendees: add notes
Agenda:
- SPDX headers [Mandy Chessel]
- Azure Purview + OpenLineage [Will Johnson, Mark Taylor]
- Logging backend (OpenTelemetry, ...) [Julien Le Dem]
- Open discussion
Meeting recording:
TSC:
- Mike Collado, Staff Engineer, Datakin
- Willy Lulciuc, Co-creator of Marquez, Datakin
- Mandy Chessel, Egeria Project Lead
- Julian Le Dem, OpenLineage Project Lead, CTO Datakin
And:
- Peter Hicks, Software Engineer, Datakin
- Srikanth Venkat, Product Management, Microsoft
- Ross Turk, VP Marketing, Datakin
- Maciej Obuchowski: Engineer GetInData, OpenLineage contributor
- John Thomas, Support Engineer, Datakin
- Minkyu Park, Engineer, Datakin
- Michael Robinson, Dev Rel Engineer
- Will Johnson, Senior Cloud Solution Architect, Azure Cloud, Microsoft
- Mark Taylor, Principal Technical PM, Microsoft
- Travis Hilbert, Associate Consultant, Microsoft
Agenda:
- SPDX headers [Mandy]
- Azure Purview + OpenLineage [Will and Mark]
- Logging backend (OpenTelemetry) [Julien]
- Open discussion
Meeting recording:
Notes:
Software Package Data Exchange (SPDX) Tags [Mandy]
- Open standard for creating software bill of materials
- Includes set of short identifiers for open source licenses
- both human readable and machine processable
- easy to maintain and validate
- Full license added in License file at top of git repository
- Each file includes the SPDX-License-Identifier tag
- Proposed: we use this approach in OpenLineage
- Becoming a best practice in open source development
- Julien: "a no brainer"
- Next question: how to integrate (implement going forward or add tags throughout project?)
- Willy: throughout existing; should also do with Marquez
- Mike: update build check to check for tags in new source files?
- Julien: must find right build plugins, two passes might be necessary
- Julien: all agreed?; adopted; someone should create issue
- Julien: Maven plugins exist to check and add tag if missing
Azure Purview Integration [Srikanth, Will]
- Overview of Azure Purview
- Metadata and governance platform across MS, new
- End-to-end governance practices
- Goal is to fill gaps in lineage
- Database Lineage in Azure Purview
- Began as hackathon project at Microsoft
- Sought way to send lineage data directly to Purview (rather than use architecture of Marquez)
- Azure Functions used to send data from Databricks through serverless compute and event hub to Purview
- Required adapter pattern to make emissions conform to Atlas
- Challenges:
- automating getting most recent OL jar into Databricks; created PR for this with emit script
- needed to use API key passed in URL parameter; support for this integrated with PR
- Have goal of extending use of OpenLineage inside of Spark further
- Motivation: didn't want to be dependent on catalog API, particular flavor of Spark
- Plans include other integrations, including dbt
- Want to be respectful of OpenLineage's global scope, even if it means metadata on Purview side not real-time
- Want to incorporate filtering capability, make it customizable based on particular connector
- Interest extends beyond Databricks (e.g., Snowflake)
- Eager to see issue #181 addressed: ability to tack on a MS jar to installation where OpenLineage is
- Possible PR in future: emit metadata outside a run (e.g., as dataset facets); would meet need at MS
Logging backends [Julien]
- Open suggestion: add ability to send events to a logging aggregator (e.g., Datadog)
- Mandy: needed in addition to proxy backend?
- Proxy backend could be distribution endpoint, first location for this
- Use case: experimentation
- Proposed: open a ticket
Discussion
Nov 10th 2021 (9am PT)
Attendees:
...