...
- Announcements:
- We recently removed support for Airflow 1.x
- Ross gave a talk on OpenLineage at ApacheCon in New Orleans last week
- Upcoming opportunities to give talks about OpenLineage:
- Data Teams Summit (January 2023)
- Subsurface Live (January 2023)
- Data Council Austin (March 2023)
- Giving a talk on data lineage soon? Ping Michael R. on Slack to let us know.
- Recent release 0.15.1 [Michael R.]
Added
- Airflow: improve development experience #1101 @JDarDagran
- Documentation: update issue templates for proposal & add new integration template #1116 @rossturk
- Spark: add description for URL parameters in readme, change overwriteName to appName #1130 @tnazarew
Changed
- Airflow: lazy load BigQuery client #1119 @mobuchowski
Fixed
- Spark: fix column lineage #1069 @pawel-big-lebowski
- Spark: set log level of Init OpenLineageContext to DEBUG #1064 **new contributor @varuntestaz**
- Java client: update version of SnakeYAML #1090 **new contributor Lukáš AKA @TheSpeedding**
- CI: build macos release package on medium resource class #1131 @mobuchowski
- Project roadmap review [Harel]
- Improved understanding of Airflow
- Track DAG runs
- Native lineage in operators
- Increased adoption of OpenLineage consumers
- Collaborate with data catalogs
- Coverage by event producers
- Increased support for Snowflake access history using tags
- Data quality frameworks
- Start thinking about data consumption integrations (e.g., on the BI layer)
- Continue experimenting with a Flink integration, streaming in general
- Increased support of column level lineage (e.g., SQL operators)
- Column-level lineage workshop [Howard]
- Tutorial available in the OpenLineage/workshops GitHub repo
- Uses Jupyter and Spark
- Covers:
- Installing Marquez and Jupyter
- Using column lineage feature in a Jupyter notebook
- Requires:
- Docker 17.05+
- Docker Compose 1.29.1+
- Git (preinstalled on most versions of MacOS; verify with
git version
) - 4 GB of available memory (the minimum for Docker — more is strongly recommended)
- Preconfigured, including a token for Jupyter
- Notebook contains scripts to set up environment, run Marquez, start Spark session
- Allows you to see Marquez in action and understand how the APIs work
- scripts return the JSON payloads
- Other features are also well-suited to Jupyter notebooks, so more tutorials will be forthcoming
- We welcome your contribution of additional tutorials!
September 8, 2022 (10am PT)
...