...
- Recent releases [Michael R.]
- 0.18.0
Added
- Airflow: support
SQLExecuteQueryOperator
#1379 @JDarDagran - Airflow: introduce a new extractor for
SFTPOperator
#1263 @sekikn - Airflow: add Sagemaker extractors #1136 @fhoda
- Airflow: add S3 extractor for Airflow operators #1166 @fhoda
- Spec: add spec file for
ExternalQueryRunFacet
#1262 @howardyoo - Docs: add a TSC doc #1303 @merobi-hub
Bug fixes and more details: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md
- Airflow: support
- 0.17.0
Added
- Spark: support latest Spark 3.3.1 #1183 @pawel-big-lebowski
- Spark: add Kinesis Transport and support config Kinesis in Spark integration #1200 @yogyang
- Spark: disable specified facets #1271 @pawel-big-lebowski
- Python: add facets implementation to Python client #1233 @pawel-big-lebowski
- SQL: add Rust parser interface #1172 @StarostaGit @mobuchowski
- Proxy: add helm chart for the proxy backed #1068 @wslulciuc
- Spec: include possible facets usage in spec #1249 @pawel-big-lebowski
- Website: publish YML version of spec to website #1300 @rossturk
- Docs: update language on nominating new committers #1270 @rossturk
Changed
- Website: publish spec into new website repo location #1295 @rossturk
- Airflow: change how pip installs packages in tox environments #1302 @JDarDagran
Bug fixes and more details: https://github.com/OpenLineage/OpenLineage/blob/main/CHANGELOG.md
- 0.18.0
- Rust implementation of the SQL integration [Piotr]
- About me: dev with GetInData
- Goal of project: to make adding more language support in the future easier to add
- Separated into components: separate backend package for integration with language bindings with new Java interface
- Components
openlineage_sql
: main implementation with table + column lineage extractionopenlineage_sql_python
: Python bindings, uses thepyo3
create, produces a Python wheelopenlineage_sql_java
: Java bindings, using JNI, produces a jar
- Changes
- switch to a visitor pattern to traverse the AST
- introduce Context Frames (like scopes) to resolve aliases, implicit contexts and shadowing
- column lineage is a synthesized attribute over the tree – easy to compute with a visitor
- Demo
- Shout outs
- Maciej Obuchowski (@mobuchowski)
- Will Johnson (@wjohnson)
- Hannah Moazam (@hmoazam)
- Open discussion
- Spark implementation: where do deps need to be added? [Will]
- it depends on which sub-project you want to modify
- if you want to modify all, import the dependency in
shared
- Implementing the spec discussion [Sheeri]
- 100% compliance is not required – it's a spec, after all, just like "standard" SQL
- bottom line: compatibility between producers and consumers
- minimum viable lineage
- at least one circle
- zero or more lines
- associated information
- data model: event runs a job on a dataset
- What's required by the spec?
- run: UUID
- run state: transition, event time
- job: namespace, job name
- datasets: namespace, dataset name
- But what is a run?
- all the events for one UUID
- Necessary per run:
- at least one box
- at least one line
- everything else is optional
- eventTime, etc.
- OL query example:
- run ID required for a run (but not a job, which can/should be a view)
- inputs
- outputs
- producer
- schemaURL
- start event
- complete event
- Needed: discussion of what it means to be compliant with the spec, perhaps a test/self-test
- maybe the test outputs categories (e.g., "design lineage") for compatibility between producers and consumers
- Following up on main threads here [Julien]:
- create Slack channel, Google docs
- Sheeri will take the lead
- we'll write a proposal that we eventually add to the spec
- create Slack channel, Google docs
- Spark implementation: where do deps need to be added? [Will]
November 10, 2022 (10am PT)
...