...
Widget Connector | ||
---|---|---|
|
Notes:
- Docs Site UpdateUpdate [Ross]
- Lots of activity:
- 19 closed PRs!
- Infra Infrastructure is becoming robust but not ready to launch yet
- URL: openlineage.io/docs
- NeedNeeded:
- additions to About, Getting Started
- additions to Object Model section
- Complete Completion of the Integration landing page
- Stretch goal for next month: put it in production
- Lots of activity:
- Recent releasesreleases [Michael R.]
- 0.11.0
- Added:
- PMD to Java and Spark builds in CI #898 @merobi-hub
- HTTP option to override timeout and properly close connections in openlineage-java lib. #909 @mobuchowski
- Dynamic mapped tasks support to Airflow integration #906 @JDarDagran
- SqlExtractor to Airflow integration #907 @JDarDagran
- Changed:
- Render templates as start of integration tests for TaskListener in the Airflow integration #870 @mobuchowski
- When testing extractors in the Airflow integration, set the extractor length assertion dynamic #882 @denimalpaca
- Fixed:
- Spark casting error and session catalog support for iceberg in Spark integration #856 @wslulciuc
- Dependencies bundled with openlineage-java lib. #855 @collado-mike
- PMD reported issues #891 @pawel-big-lebowski
- Added:
- 0.12.0
- Added:
- Spark 3.3.0 support #950 @pawel-big-lebowski
- Apache Flink integration #951 @mobuchowski
- Ability to extend column level lineage mechanism #922 @pawel-big-lebowski
- ErrorMessageRunFacet #897 @mobuchowski
- SQLCheckExtractors #717 @denimalpaca
- RedshiftSQLExtractor & RedshiftDataExtractor #930 @JDarDagran
- Dataset builder for AlterTableCommand #927 @tnazarew
- Changed:
- Airflow integration: allow lineage metadata to flow through inlets and outlets #914 @fenil25
- Limit Delta events #905 @pawel-big-lebowski
- Fixed:
- Fix noclassdef error #942 @pawel-big-lebowski
- Limit size of serialized plan #917 @pawel-big-lebowski
- Added:
- 0.11.0
- Extractors: example and tutorialtutorial [Maciej]
- Airflow: defined tasks composed of pieces of code executed by operators (which number in the hundreds)
- Extraction of data
- Operator example
- accesses operator object
- processes it in customizable way
- runtime information can also be extracted
- additional method (`extract_on_complete`)
- Metadata matches the structure of the OpenLineage spec
- supplemented by facets (`job_facets`)
- How to expose:
- set up env vars supplying full paths to extractor classes (separated by commas)
- Help available from OpenLineage side:
- SQL parser
- common library covering a few systems
- community help on Slack and Github (please contribute your custom extractors!)
- Operator example
- Typical problems
- incorrect path provided
- more debugging info would help in this case – help welcome!
- Imports from Airflow
- Python prevents import cycles, leading to extractor failure
- use local imports instead, with type checking
- incorrect path provided
- What's the future?
- debugability
- additional coverage – PythonOperator, TaskFlow
- watching AIP-44 in Airflow to make it more data-aware
- covering hooks
- e.g., with PythonOperator
- See also: new doc about this on the forthcoming docs site
- Q & A
- Does the documentation link out to the extractors currently in the Airflow library? Helpful for examples
- we need to add links to the doc
- Does the documentation link out to the extractors currently in the Airflow library? Helpful for examples
- Open Discussion
- Mandy: presenting at Open Source Summit, Dublin, 9/15
- Ross: talking at ApacheCon in New Orleans
- Ross: should we create a calendar of events?
- Maciej: we're looking for feedback on the Flink integration
- let us know if it solves your problems, etc.
- Mandy: Egeria running a hackathon as part of the Grace Hopper Open Source Day event on 9/16; theme: sustainability
...