Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Announcements [Julien]
  2. Recent releases [Michael R.]
  3. Recent Releases
        - Michael shared a release update on 1.1.0, including support for configuring OpenLineage based on the Flink integration, solving the problem of multiple jobs writing to different data sets with the same job name in Spark, and adding missing Java docs to the Java client. The default behavior can be turned off with an environment variable, and more information is available in the release notes.
        - Michael also thanked new contributors and mentioned bug fixes.
        - Maciej and Julien discussed the fact that Airflow changes are not included in the changelog and that the Airflow-OpenLineage is now part of the Airflow project.

  4. Demo: Spark integration tests in Databricks runtime [Pawel]
        - Pawel thanked the participants and introduced himself. He talked about upgrading the Spark version and the issues they faced with Databricks integration.
        - They had to manually test the changes which was time-consuming. However, Databricks released a Java library that allowed them to run integration tests easily.
        - They also implemented a file transport system to capture lineage events and verify that the events contain what they expected. This change helped speed up their work and have better code.
        - Julien asked if there were any questions.

  5. Discussion items
    1. Open Lineage Registry Proposal [Julien]
          - Julien explained the concept of OpenLineage and the need for a registry to define custom facets and producers. He shared a Google doc for feedback and listed the goals of the registry, including allowing third parties to register their implementation or custom extension and shortening the producer and skim URL values.
          - Custom facets are an easy way to extend the spec without requiring any approval, and producers and consumers can do the list of facets they produce without requiring approval.
          - Mandy joined the call and expressed support for the idea of a registry but suggested that facets should be themed to avoid every producer defining their own facets. She proposed having a set of themes like data facets and meeting assets to cluster similar facets together in the registry.
          - Mandy expresses concern about naming custom facets after specific technologies, as it can lead to unnecessary duplication. Julien explains that the airflow facet is specific to airflow and provides benefits for generic things.
          - Core facets are sometimes added, and there are things specific to what people are doing. Mandy agrees and gives an example of how types are aligned with technologies, leading to duplication.
          - Ernie suggests adding a protocol for something in the registry to become a core facet. Julien explains that there is a template for adding to the spector and that custom facets can be defined as long as they have a prefix to the facet name and publish the schema.
          - To become a core facet, a proposal can be opened on the open is project and usage of the custom facet can be leveraged to show that it works.
          - Mandy suggests having a state on the registry to show whether something is private, under proposal, or being adopted. Julien agrees and explains that some custom facets are specifically in the domain of the producer and should live in the registry, while others are shared.
          - Nick interjects and expresses his appreciation for the community aspect of the open lineage. He suggests that producers provide examples and tests for consumers to use.
          - Mandy asks for clarification on what he means by tests, and Nick explains that it could be a set of payloads or actually running the runtime to produce events.
          - Nick would like to see both examples and payloads for consumers and producers, respectively. He suggests that putting them in a registry would facilitate everything all around like the tests.
          - Julien explains that for the core spec, they have the definition of facets, Jason schema for each asset, and documentation. They also added an example of each core asset and a test for the schema validation.
          - He suggests making it easier for producers to describe what facet they're producing.
          - Mandy asks who did the recent addition, and Julien explains that it was part of getting data. Mandy thanks him for the information.
          - Julien suggests that there could be more done to make it easier for producers to describe what facet they're producing. Nick agrees and suggests a framework for testing where producers can provide enough information for the test to be generated.
          - Julien explains that they currently use schema validation, but it's just a small portion of what Nick is describing. Nick agrees that it's a start.
          - Julien suggests that producers need a registry mechanism to create their own facets and make them explicitly defined. Consumers would also benefit from a programmatic definition of facets they're consuming.
          - He mentions the open lineage website's ecosystem page and how it points to documentation, but a more programmatic definition would be great.
          - Nick agrees that it would be great to have a more programmatic definition of facets.
          - Julien proposed a registry and discussed the trade-offs between a self-contained registry and delegating to other registries. He also mentioned the benefits of using shorter URLs for custom facets.
          - Nick asked about how other communities handle this and suggested looking at successful practices of similar organizations. Pawelleszczynski agreed.
          - There were questions about whether there should be a registry folder under spec or in the opening tab organization, and how to handle core facets and versioning. The group discussed using an owners file in a repo to approve updates to the registry.
          - Julien emphasized that this was just to start the conversation and that there were many different ways to implement the registry.
          - Julien mentioned producing a list of schema URL as a third party and discussed the benefits of a self-contained registry, including the ability to run checks against it and ensure consistency.
          - Julien explained that defining a name and putting a list of information would allow for shorter URLs for custom facets.
          - Julien used ol: as an example of a shorter prefix for schema URLs.
          - Julien mentioned that there were questions about whether there should be a registry rep in the opening tab organization and whether it should be a registry folder under spec.
          - Julien discussed using a Jason file to contain information about customers and their defined names.
          - Julien compared the registry to the even repository and discussed using an owners file to approve updates to the registry.
          - Julien mentioned using ti to verify consistency and avoid breaking the registry.
          - Nick asked about successful practices of similar organizations in handling registries.
          - Nick mentioned that smaller organizations might be more flexible while larger organizations might have more legal requirements for using other registries.
          - Pawelleszczynski agreed with Nick's suggestion to look at successful practices of similar organizations.
          - Julien explains that data-driven decisions are important and mentions the trade-off of how complicated it is to maintain a repository and whether it is self-service for producers. He suggests adding files to an existing open source repo for small organizations, while big organizations may need legal approval to contribute.
          - He also mentions the need for licensing and PR processes.
          - Nick responds with agreement.
          - Julien shares that he will share the draft dock on Open Lineage Slack for feedback and follow the OpenLineage proposal process. He mentions other ideas for implementation, such as the Men repository and the Evan repository, and welcomes other examples.
          - He also asks if there are any questions or things people want to share about OpenLineage.

...