Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Recent releases (0.13.0, 0.13.1, 0.14.0, 0.14.1) [Michael R.]
  • Native data quality in Airflow with OpenLineage [Benji]MANTA integrations using OpenLineage [Petr]
    • Why Airflow?
      • In-pipeline checks
      • Immediate alerts
      • Lineage support
    • Use case
      • static checks
        • typed values
        • data ranges
        • temporal intervals
    • Two providers
      • SQL column check operator
        • "On Rails operator"
        • supports tolerance
        • supports partitioning with parameter
        • available checks:
          • min
          • max
          • unique check
          • distinct check
          • null check
        • qualifiers:
          • greater_than
          • geq_to
          • less_than
          • leq_to
          • equal_to
      • SQL table check operator
        • flexible
        • supports static checks
        • supports partitioning with parameter
        • uses cases:
          • checks that include aggregate values using the whole table
          • row count checks
          • schema checks
          • comparisons between multiple columns, both aggregated and not aggregated
    • Innovation: operators can now give data quality data directly to a lineage consumer (e.g., Marquez)
    • Note: the UI in the demo is part of the Datakin product
    • Can you talk about the OL packets?
      • the existing OL data quality facets are being used
  • MANTA integrations using OpenLineage [Petr]
    • MANTA & MANTA Flow tools
      • unique column-level lineage parser of most data technologies
      • parses code to create database and reconstruct detailed column-level based on static analysis
      • represents end-to-end dependencies across technologies on enterprise level (indirect and direct)
      • challenge: integrating runtime lineage
      • MANTA connectors
        • reverse-engineer code
      • integration gets lineage from OpenLineage producers 
        • e.g., Keboola, dbt, Airflow, Snowflake, Spark
        • converts the OpenLineage json files to MANTA objects
        • currently limited to the table level
        • for some technologies, Marquez libraries were used
      • MANTA repository model
        • underlying graph database
        • nodes: hierarchically organized objects
        • edges: relations
        • layers: physical, logical, runtime...
        • resources: all integration OL metadata sources
          • used to distinguish the sources of metadata
      • column-level project
        • we currently can get it if provided in facets
        • idea: extend the OpenLineage model for facet extensions which MANTA then analyzes statically
        • passes code, encoded using BASE64, in artifacts in job facets
      • status: in testing, beginning with Keboola
      • hope: to use the integration to increase number of producers we can consumer lineage from
    • Q & A
      • Have you used json files for metadata in the past?
      • No, but we are now and also using API calls
      • Egeria was in a similar situation
  • Open Discussion
    • common metadata framework project at HP Enterprise will be added to agenda for a future meeting

August 11, 2022 (10am PT)

...