...
Widget Connector | ||
---|---|---|
|
Notes:
Announcements
- A warm welcome to new committer Harel Shein (harels)! Harel's main contributions have been to project leadership, facilitating discussions, and advocating for the project. Thanks, Harel!
- Upcoming talks include one by Paweł Leszczyński at the Data Science Summit in Warsaw/online, November 23-24, and another by Julien Le Dem at Scale By The Bay in Oakland, CA, on November 15.
- The call for papers deadline for Data Council has been extended to November 17th.
Recent Releases
- OpenLineage 1.5.0
- Added
- Flink: add Flink lineage for Cassandra Connectors #2175@HuangZhenQiu
- Spark: support rdd and toDF operations available in Spark Scala API #2188@pawel-big-lebowski
- Spark: support Databricks Runtime 13.3 #2185@pawel-big-lebowski
- Changed
- Airflow: loosen attrs and requests versions #2107@JDarDagran
- dbt: render yaml configs lazily #2221@JDarDagran
- Thanks to all the contributors, including new contributor @sophiely!
- Added
Recent Additions to the Flink Integration - Peter Huang (Apple)
- I work on the Flink team at Apple with a focus on meeting legal requirements
- Current priorities include improving lineage from Iceberg
- Users here also employ Cassandra, so we have contributed Cassandra support
- Apple has an open-source contribution review process, and I can't contribute more at the moment
- I hope that the review process will be completed in the coming weeks, so we can make more contributions
- Planned improvements include:
- addition of more catalog information to Iceberg lineage
- support for Flink 1.18
Recent Additions to the Spark Integration - Paweł Leszczyński (GetInData)
- Added support for Spark 3.5
- Added support for Databricks Runtime (most recent version)
- 2188: fix in Scala integration
- RDD issue was hard to reproduce
- 2233: Jackson library upgrade
- Jackson library in the project was an old version
- upgrade includes a security vulnerability fix
- merged but not yet released
- Planned:
- Support for Iceberg and Delta for Spark 3.5
- Spark parentRun AKA Spark Application Events (by mobuchowski)
- Meetup talk: "How to become a spark-openlineage contributor in 5 steps?"
Proposals in Discussion - Julien Le Dem (Project Lead)
- Open proposals:
- 2187: ColumnLineageDatasetFacet
- privacy use cases
- 2186: formalizing transformation types
- column lineage facet improvements
- 2163: define an integration certification process for OpenLineage
- defines integration certification process
- currently collecting use cases
- related to registry proposal
- input/feedback needed
- 2162: dataset support in Spark LogicalPlan Nodes
- optional API we could add to the Nodes
- prototype coming soon
- 2161: registry of producers and consumers
- comments welcome on the PR on GitHub
- producers would be able to register custom facet prefix, URI and link to documentation, etc.
- consumers would be able to declare the facets you consume, link to documentation, etc.
- name registration:
- unique naming
- name would be used in shorter URI prefixes
- CI validation would enforce consistent facet naming and validate facet schemas
- documentation would be published automatically
- additional documentation for specific use cases
- self-contained registry containing all facets for producers and consumers
- name path in registry with CODEOWNERS file for delegation to circumvent review process
- path for facet JSON
- more information
- Pros:
- producers and consumers would be able to define codeowners to approve changes to the registry
- CI could guarantee that changes would not produce inconsistencies
- producers would not need to host and maintain their own subset of the registry
- publication would be automated
- freedom and independence for defining custom facets without the project being a bottleneck
- Cons:
- registered entities would have to maintain their list of codeowners
- Q&A:
- producers that define multiple facets?
- granularity of this and other aspects might or might not be desirable
- consumed facets: mandatory or optional?
- always optional
- custom facets or core facets?
- core facets currently in a different dir, but it would be nice to move them to the registry
- add tests as with core facets?
- would be useful as examples and for validation
- could be optional
- please add this to the PR
- producers that define multiple facets?
- 2187: ColumnLineageDatasetFacet
October 12, 2023 (10am PT)
...