Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Widget Connector
urlhttp://youtube.com/watch?v=Lc-IVvMleJU

Notes:

Announcements

  • A warm welcome to new committer Harel Shein (harels)! Harel's main contributions have been to project leadership, facilitating discussions, and advocating for the project. Thanks, Harel!
  • Upcoming talks include one by Paweł Leszczyński at the Data Science Summit in Warsaw/online, November 23-24, and another by Julien Le Dem at Scale By The Bay in Oakland, CA, on November 15.
  • The call for papers deadline for Data Council has been extended to November 17th.

Recent Releases

Recent Additions to the Flink Integration - Peter Huang (Apple)

  • I work on the Flink team at Apple with a focus on meeting legal requirements
  • Current priorities include improving lineage from Iceberg
  • Users here also employ Cassandra, so we have contributed Cassandra support
  • Apple has an open-source contribution review process, and I can't contribute more at the moment
  • I hope that the review process will be completed in the coming weeks, so we can make more contributions
  • Planned improvements include:
    • addition of more catalog information to Iceberg lineage
    • support for Flink 1.18

Recent Additions to the Spark Integration - Paweł Leszczyński (GetInData)

  • Added support for Spark 3.5
  • Added support for Databricks Runtime (most recent version)
  • 2188: fix in Scala integration
    • RDD issue was hard to reproduce
  • 2233: Jackson library upgrade
    • Jackson library in the project was an old version
    • upgrade includes a security vulnerability fix
    • merged but not yet released
  • Planned:
    • Support for Iceberg and Delta for Spark 3.5
    • Spark parentRun AKA Spark Application Events (by mobuchowski)
    • Meetup talk: "How to become a spark-openlineage contributor in 5 steps?"

Proposals in Discussion - Julien Le Dem (Project Lead)

  • Open proposals:
    • 2187: ColumnLineageDatasetFacet
      • privacy use cases
    • 2186: formalizing transformation types
      • column lineage facet improvements
    • 2163: define an integration certification process for OpenLineage
      • defines integration certification process
      • currently collecting use cases
      • related to registry proposal
      • input/feedback needed
    • 2162: dataset support in Spark LogicalPlan Nodes
      • optional API we could add to the Nodes
      • prototype coming soon
    • 2161: registry of producers and consumers
      • comments welcome on the PR on GitHub
      • producers would be able to register custom facet prefix, URI and link to documentation, etc.
      • consumers would be able to declare the facets you consume, link to documentation, etc.
      • name registration:
        • unique naming
        • name would be used in shorter URI prefixes
      • CI validation would enforce consistent facet naming and validate facet schemas
      • documentation would be published automatically
      • additional documentation for specific use cases
      • self-contained registry containing all facets for producers and consumers
        • name path in registry with CODEOWNERS file for delegation to circumvent review process
        • path for facet JSON
        • more information
      • Pros:
        • producers and consumers would be able to define codeowners to approve changes to the registry
        • CI could guarantee that changes would not produce inconsistencies
        • producers would not need to host and maintain their own subset of the registry
        • publication would be automated
        • freedom and independence for defining custom facets without the project being a bottleneck
      • Cons:
        • registered entities would have to maintain their list of codeowners
      • Q&A:
        • producers that define multiple facets?
          • granularity of this and other aspects might or might not be desirable
        • consumed facets: mandatory or optional?
          • always optional
        • custom facets or core facets?
          • core facets currently in a different dir, but it would be nice to move them to the registry
        • add tests as with core facets?
          • would be useful as examples and for validation
          • could be optional
          • please add this to the PR

October 12, 2023 (10am PT)

...