Info

title	Help Us Improve the Wiki

This Wiki is owned by the LF AI Foundation Community. Contributions are always welcomed to help improve it! In the upper right of this page, select Log In to contribute. You will need a Linux Foundation ID (created at https://identity.linuxfoundation.org/) to log in. For a Confluence overview, click here.

Tip
Welcome to the LF AI & Data Foundation wiki, where you will find information with a cross project focus. For individual projects, follow the links below.

The LF AI & Data Foundation is a project of The Linux Foundation that supports open source innovation in artificial intelligence, machine learning, deep learning and data open source projects. The LF AI & Data Foundation was created to support numerous technical projects within this important space.

With the LF AI & Data Foundation, members are working to create a neutral space for harmonization and acceleration of separate technical projects focused on AI, ML, DL and Data technologies.

For more information, please view the How to Get Involved deck.

Questions? Please email info@lfaidata.foundation.

Technical Advisory Council

Wiki: Technical Advisory Council Home

Email: tac-general@lists.lfaidata.foundation

Outreach Committee

Wiki: Outreach Committee Home

Email: outreach-committee@lists.lfaidata.foundation

& Workstreams

Committee Name	Mailing List - Subscribe To:	Zoom Link	Meeting Cadence and Day/Time	Contact Person
Technical Advisory Committee (TAC)	tac-general+subscribe@lists.lfaidata.foundation	https://zoom-lfx.platform.linuxfoundation.org/meeting/95332329356?password=c708f2ee-fb78-4a12-91a3-47daa19b708f	Biweekly on Thursdays at 6am PT/9am ET/1300h UTC	Vini Jaiswal, TAC Chair
Generative AI Commons	gen-ai-commons+subscribe@lists.lfaidata.foundation	https://zoom-lfx.platform.linuxfoundation.org/meeting/94803498072?password=be13e19e-48c3-4e51-bb32-be076bf79352	Biweekly on Tuesdays at 7am PT/10am ET/1400h UTC	Matt White, Director Anni Lai, GAC Chair Arnaud Le Hors, GAC Vice Chair
Generative AI Commons Applications Workstream	gac-applications-workstream+subscribe@lists.lfaidata.foundation	https://zoom-lfx.platform.linuxfoundation.org/meeting/93618826131?password=b56fb629-d5d5-4995-a442-fd71347a24ae	Biweekly on Wednesdays at 8am PT/11am ET/1500h UTC	Sachin Varghese and Raghavan Muthuregunathan, Workstream Leads
Generative AI Commons Frameworks Workstream	gac-frameworks-workstream+subscribe@lists.lfaidata.foundation	https://zoom-lfx.platform.linuxfoundation.org/meeting/91503008905?password=c2bcd124-eb36-4ff5-bf35-b001b1e9ad20	Biweekly on Tuesdays at 8:30am PT/11:30am ET/15h30 UTC.	Ahmed Abdelmonsef, Workstream Lead
Generative AI Commons Education and Outreach Workstream	gac-education-outreach-workstream+subscribe@lists.lfaidata.founda tion	https://zoom-lfx.platform.linuxfoundation.org/meeting/94104696248?password=8b861c67-fef0-4931-8095-d9c18f4a5347	Biweekly on Tuesdays at 7am PT/10am ET/1400h UTC	Ofer Hermoni, Workstream Lead
Generative AI Commons Models and Data Workstream	gac-models-and-data-workstream+subscribe@lists.lfaidata.founda tion	https://zoom-lfx.platform.linuxfoundation.org/meeting/92955579052?password=3d31aec7-df21-4800-9010-d942c27e06e8	Biweekly on Thursdays at 8:30am PT/11:30am ET/15h30 UTC.	Nick Chase, Workstream Lead
Generative AI Commons Responsible AI Workstream	gac-responsible-ai-workstream+subscribe@lists.lfaidata.founda tion	https://zoom-lfx.platform.linuxfoundation.org/meeting/91390993751?password=8a107f34-6986-409a-a101-b3ee093cb117	Biweekly on Thursdays at 7am PT/10am ET/1400h UTC	Susan Malaika, Workstream Lead
BI & AI Committee	biai-announce+subscribe@lists.lfaidata.founda tion biai-discussion+subscribe@lists.lfaidata.founda tion biai-private+subscribe@lists.lfaidata.founda tion	https://zoom-lfx.platform.linuxfoundation.org/meeting/95947262019?password=058cdfdd-72a4-4e0f-b9d6-b6aaef4e6869	Biweekly on Wednesdays at 1pm PT/4pm ET/2000h UTC.	Cupid Chan, Committee Chair
ML Security Committee	mlsecurity-committee+subscribe@lists.lfaidata.founda tion	https://zoom-lfx.platform.linuxfoundation.org/meeting/93501761975?password=fd4b4b5c-be2a-4bed-93e6-c47453bbaa69.	Monthly on the Second Thursday of the Month - 8am PT/11am ET/1500h UTC	Alejandro Saucedo, Committee Chair
Outreach Committee (in formation)	outreach-committee+subscribe@lists.lfaidata.foundation	TBA	TBA	Richard Bian, Committee Chair

Current Projects

Project

Status

Description

Image Added

SANDBOX

1chipML is an open source library for basic numerical crunching and machine learning for microcontrollers. As the Internet of Things and Edge Computing are becoming a ubiquitous reality, we need to a reliable and open framework to use on limited and low power demanding hardware.

GitHub: https://github.com/1chipML/1chipML

Image Added

GRADUATE

Acumos is an Open Source Platform, which supports design, integration and deployment of AI models. Furthermore, Acumos supports an AI marketplace that empowers data scientists to publish adaptive AI models, while shielding them from the need to custom develop fully integrated solutions.

Current Projects

Project

Status

Description

Image Removed

Sandbox

1chipML is an open source library for basic numerical crunching and machine learning for microcontrollers. As the Internet of Things and Edge Computing are becoming a ubiquitous reality, we need to a reliable and open framework to use on limited and low power demanding hardware.

GitHub: https://github.com/1chipML/1chipML

Image Removed

Graduate

Acumos is an Open Source Platform, which supports design, integration and deployment of AI models. Furthermore, Acumos supports an AI marketplace that empowers data scientists to publish adaptive AI models, while shielding them from the need to custom develop fully integrated solutions.

GitHub: https://github.com/acumos

Image Removed

INCUBATION

Adlik is an end-to-end optimizing framework for deep learning models. The goal of Adlik is to accelerate deep learning inference process both on cloud and embedded environment.

GitHub: https://github.com/Adlik

Image Removed

INCUBATION

Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data.

GitHub: https://github.com/amundsen-io

Image Removed

Graduate

Angel is a high-performance distributed machine learning platform based on the philosophy of Parameter Server. It is tuned for performance with big data from Tencent and has a wide range of applicability and stability, demonstrating increasing advantage in handling higher dimension model.

GitHub: https://github.com/Angel-ML/angel

Image Removed

Graduate

Adversarial Robustness Toolbox (ART) provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats.

GitHub: https://github.com/Trusted-AI/adversarial-robustness-toolbox

Image Removed

INCUBATION

AI Explainability 360 is an open source toolkit that can help users better understand the ways that machine learning models predict labels using a wide variety of techniques throughout the AI application lifecycle.

GitHub: https://github.com/Trusted-AI/AIX360

Image Removed

INCUBATION

AI Fairness 360 is an extensible open source toolkit that can help users understand and mitigate bias in machine learning models throughout the AI application lifecycle.

GitHub: https://github.com/Trusted-AI/AIF360

Image Removed

Sandbox

Artigraph is a tool to improve the authorship, management, and quality of data. It emphasizes that the core deliverable of a data pipeline or workflow is the data, not the tasks. Artigraph aims to shift tooling focus towards managing the entire data lifecycle (lineage, metadata, schema, storage formats and systems, etc).

GitHub: https://github.com/artigraph/artigraph

Image Removed

Sandbox

BeyondML is a framework for developing sparse neural networks that can perform multiple tasks across multiple data domains. This framework provides value to the community by:

- simplifying of the development and deployment of advanced machine learning capabilities for use on low-end devices and in dynamic environments characteristic of the resource-constrained edge

- reducing in the complexity and cost of deploying ML models or systems of models to cloud platforms

- reducing in the carbon footprint of deployed ML models

GitHub: https://github.com/Beyond-ML-Labs

Image Removed

INCUBATIONDatashim is enabling and accelerating data access for Kubernetes/Openshift workloads in a transparent and declarative way. It's opensource since September of 2019 and it is growing to support use-cases related to data access in AI projects.

GitHub: https://github.com/

IBM/dataset-lifecycle-framework

acumos

Image Added

Image Removed

INCUBATION

DataPractices.org was pioneered by data.world as a “Manifesto for Data Practices” of four values and 12 principles that illustrate the most effective, ethical, and modern approach to data teamwork. As a member of the foundation, datapractices.org will expand to offer open courseware and establish a collaborative approach to defining and refining data best practices.

Adlik is an end-to-end optimizing framework for deep learning models. The goal of Adlik is to accelerate deep learning inference process both on cloud and embedded environment.

GitHub

Github

: https://github.com/

datadotworld/data-practices-site

Adlik

Image Added

Image Removed

INCUBATION

DELTA is a deep learning based end-to-end natural language and speech processing platform. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3.

Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data.

GitHub: https://github.com/

didi/delta

Image Removed

SANDBOXDocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer the multi-modal data with a Pythonic API.

amundsen-io

Image Added

GRADUATE

Angel is a high-performance distributed machine learning platform based on the philosophy of Parameter Server. It is tuned for performance with big data from Tencent and has a wide range of applicability and stability, demonstrating increasing advantage in handling higher dimension model.

GitHub: https://github.com/Angel-ML/

docarray

angel

Image Removed

Image Added

INCUBATION

Elastic Deep Learning (EDL) optimizes the global utilization of the cluster running deep learning job and the waiting time of job submitters. It includes two parts: a Kubernetes controller for the elastic scheduling of distributed deep learning jobs, and a fault-tolerable deep learning framework.

GRADUATE

Adversarial Robustness Toolbox (ART) provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats.

GitHub: https://github.com/

PaddlePaddle/edl

Image Removed

GraduateEgeria is an open source project dedicated to making metadata open and automatically exchanged between tools and platforms, no matter which vendor they come from.

Trusted-AI/adversarial-robustness-toolbox

Image Added

INCUBATION

AI Explainability 360 is an open source toolkit that can help users better understand the ways that machine learning models predict labels using a wide variety of techniques throughout the AI application lifecycle.

GitHub: https://github.com/

odpi

Trusted-AI/

egeria

AIX360

Image Removed

Image Added

Sandbox

INCUBATION

Elyra

Image Removed

INCUBATIONFATE (Federated AI Technology Enabler) is the world's first industrial grade federated learning open source framework to enable enterprises and institutions to collaborate on data while protecting data security and privacy. It implements secure computation protocols based on homomorphic encryption and multi-party computation (MPC). Supporting various federated learning scenarios, FATE now provides a host of federated learning algorithms, including logistic regression, tree-based algorithms, deep learning and transfer learning.

Image Removed

INCUBATION

Feast is an open source feature store for machine learning. It was developed as a collaboration between Gojek and Google in 2018. Feast aims to: -- Provide scalable and performant access to feature data for ML models during training or serving. -- Provide a consistent view of features for both training and serving. -- Enable re-use of features through discovery, documentation, and metadata tracking. --Ensures model performance by tracking, validating, and monitoring features in production.

Image Removed

Sandbox

Feathr is an enterprise-grade, high-performance feature store. Feathr automatically computes your feature values and joins them to your training data, using point-in-time-correct semantics to avoid data leakage, and supports materializing and deploying your features for use online in production.

GitHub:

AI Fairness 360 is an extensible open

-source low code / no code framework for creating reproducible, scalable and component based data science pipelines. It allows senior data scientist to create reusable components easily. Citizen data scientists can reuse their code without programming skills. MLOps engineers are provided with tested and maintainable deliverables, and scale on Kubeflow, Airflow and others.

source toolkit that can help users understand and mitigate bias in machine learning models throughout the AI application lifecycle. GitHub: https://github.com/Trusted-AI/AIF360
Image Added	SANDBOX	Artigraph is a tool to improve the authorship, management, and quality of data. It emphasizes that the core deliverable of a data pipeline or workflow is the data, not the tasks. Artigraph aims to shift tooling focus towards managing the entire data lifecycle (lineage, metadata, schema, storage formats and systems, etc). GitHub: https://github.com/artigraph/artigraph
Image Added	SANDBOX	BeyondML is a framework for developing sparse neural networks that can perform multiple tasks across multiple data domains. This framework provides value to the community by: - simplifying of the development and deployment of advanced machine learning capabilities for use on low-end devices and in dynamic environments characteristic of the resource-constrained edge - reducing in the complexity and cost of deploying ML models or systems of models to cloud platforms - reducing in the carbon footprint of deployed ML models GitHub: https://github.com/Beyond-ML-Labs
Image Added	SANDBOX	Within the Bitol project, the primary objective is to tackle multiple challenges, such as data normalization, ensuring the relevance of documentation, establishing service-level expectations, simplifying data and tool integration, and promoting a data product-oriented approach. These efforts offer several advantages, including stimulating innovation and streamlining integration processes. Github:

https://github.com/

linkedin/feathr

Image Removed

Sandbox

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model. Our goal is to support training, fine-tuning, and deployment of large-scale models on various downstream tasks with multi-modality. Currently, we are focusing on NLP models and tasks. In the near future, we will support for other modalities.

GitHub:

bitol-io

Image Added

SANDBOX

CLAIMED (Component Library for AI, Machine Learning, ETL and Data Science) is a runtime and programming language agnostic Data & AI component framework abstracting away all complexity for advanced MLOps and TrustedAI. Donated by the IBM Center for Open Source Data and AI Technologies, the project aims to democratize Data & AI by providing empowerment for Data Scientists to take on MLOps and TrustedAI, as well as Citizen Data Scientists to to use no-code/low-code tooling.

GitHub:

https://github.com/

BAAI

claimed-

Open

framework/

FlagAI

Image Removed

Image Added

Graduate

INCUBATION

Flyte is a production-grade, declarative, structured and highly scalable cloud-native workflow orchestration platform. It allows users to describe their ML/Data pipelines using Python, Java or (in the future other languages) and Flyte manages the data flow, parallelization, scaling and orchestration of these pipelines. Flyte builds on top of Docker containers and kubernetes

Datashim is enabling and accelerating data access for Kubernetes/Openshift workloads in a transparent and declarative way. It's opensource since September of 2019 and it is growing to support use-cases related to data access in AI projects.

GitHub: https://github.com/

flyteorg/flyte

Image Removed

INCUBATION

ForestFlow is a scalable policy-based cloud-native machine learning model server. ForestFlow strives to strike a balance between the flexibility it offers data scientists and the adoption of standards while reducing friction between Data Science, Engineering and Operations teams.

GitHub:

IBM/dataset-lifecycle-framework

Image Added

INCUBATION

DataPractices.org was pioneered by data.world as a “Manifesto for Data Practices” of four values and 12 principles that illustrate the most effective, ethical, and modern approach to data teamwork. As a member of the foundation, datapractices.org will expand to offer open courseware and establish a collaborative approach to defining and refining data best practices.

GitHub:

https://github.com/

ForestFlow/ForestFlow

Image Removed

GRADUATE

Horovod, a distributed training framework for TensorFlow, Keras and PyTorch, improves speed, scale and resource allocation in machine learning training activities. Uber uses Horovod for self-driving vehicles, fraud detection, and trip forecasting. It is also being used by Alibaba, Amazon and NVIDIA. Contributors to the project outside Uber include Amazon, IBM, Intel and NVIDIA.

GitHub:

datadotworld/data-practices-site

Image Added

SANDBOX

DeepCausality is a hyper-geometric computational causality library that enables fast and deterministic context-aware causal reasoning over complex multi-stage causality models. Deep Causality adds only minimal overhead and thus is suitable for real-time applications without additional acceleration hardware.

GitHub:

https://github.com/

horovod/horovod

Image Removed

INCUBATIONJanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster

deepcausality-rs/deep_causality

Image Added

SANDBOX

DeepRec is a high-performance recommendation deep learning framework based on TensorFlow.

GitHub:

https://github.com/

janusgraph

DeepRec-AI/

janusgraph

DeepRec

Image Removed

Image Added

INCUBATIONKedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering best-practice and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning

INCUBATION

DELTA is a deep learning based end-to-end natural language and speech processing platform. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3.

GitHub: https://github.com

/kedro-org

Image Removed

INCUBATIONKompute is a general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing use cases

/didi/delta

Image Added

INCUBATION

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer the multi-modal data with a Pythonic API.

GitHub: https://github.com/

KomputeProject

docarray

Image Removed

Image Added

INCUBATION

KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX. It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability.

Elastic Deep Learning (EDL) optimizes the global utilization of the cluster running deep learning job and the waiting time of job submitters. It includes two parts: a Kubernetes controller for the elastic scheduling of distributed deep learning jobs, and a fault-tolerable deep learning framework.

GitHub: https://github.com/PaddlePaddle/edl

Image Added

GRADUATE

Egeria is an open source project dedicated to making metadata open and automatically exchanged between tools and platforms, no matter which vendor they come from.

GitHub: https://github.com/

kserve

odpi/egeria

Image Removed

Image Added

INCUBATIONGitHub:

SANDBOX

Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code. All you need to provide is your data, a list of fields to use as inputs, and a list of fields to use as outputs, Ludwig will do the rest. Simple commands can be used to train models both locally and in a distributed way, and to use them to predict on new data.

Elyra is an open-source low code / no code framework for creating reproducible, scalable and component based data science pipelines. It allows senior data scientist to create reusable components easily. Citizen data scientists can reuse their code without programming skills. MLOps engineers are provided with tested and maintainable deliverables, and scale on Kubeflow, Airflow and others.

Github:

https://github.com/

uber

elyra-ai/

ludwig

elyra

Image RemovedGitHub:

Image Added

INCUBATION

Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more.

FATE (Federated AI Technology Enabler) is the world's first industrial grade federated learning open source framework to enable enterprises and institutions to collaborate on data while protecting data security and privacy. It implements secure computation protocols based on homomorphic encryption and multi-party computation (MPC). Supporting various federated learning scenarios, FATE now provides a host of federated learning algorithms, including logistic regression, tree-based algorithms, deep learning and transfer learning.

GitHub:

https://github.com/

MarquezProject

FederatedAI

Image Removed

Image Added

INCUBATION

Milvus

GitHub: https://github.com/milvus-io

Image Removed

INCUBATION

NNStreamer (Neural Network Support as Gstreamer Plugins) is a set of Gstreamer plugins that support ease and efficiency for Gstreamer developers adopting neural network models and neural network developers managing neural network pipelines and their filters.

GitHub:

Feast is an open source

similarity search engine for massive-scale feature vectors. Built with heterogeneous computing architecture for the best cost efficiency. Searches over billion-scale vectors take only milliseconds with minimum computing resources. Milvus can be used in a wide variety of scenarios to boost AI development.

feature store for machine learning. It was developed as a collaboration between Gojek and Google in 2018. Feast aims to: -- Provide scalable and performant access to feature data for ML models during training or serving. -- Provide a consistent view of features for both training and serving. -- Enable re-use of features through discovery, documentation, and metadata tracking. --Ensures model performance by tracking, validating, and monitoring features in production.

Github:

https://github.com/

nnstreamer

Image Removed

Sandbox

OpenBytes aims to facilitate wider sharing of, and collaboration with, data in the AI community through the promotion of data standards and formats and enabling contributions of data. The value of this project lies in its stimulus on academic interest and AI innovation by promoting high-quality datasets and pushing the boundaries of science further.

feast-dev/feast

Image Added

SANDBOX

Feathr is an enterprise-grade, high-performance feature store. Feathr automatically computes your feature values and joins them to your training data, using point-in-time-correct semantics to avoid data leakage, and supports materializing and deploying your features for use online in production.

GitHub: https:

//github.com/linkedin/

Project-OpenBytes

feathr

OpenDataology

Image Added

Sandbox

OpenDataology is an open source dataset license compliance analysis project. It enables users of publicly available datasets and users who curate a dataset from multiple data sources (particularly for use as a part of machine learning models) to identify the potential license compliance risks. The project is primarily comprised of three key components.

A dataset license compliance analysis workflow that ascertains the final allowed rights and the required obligations associated with using a publicly available dataset or a dataset that is curated from multiple data sources for any purpose.
A growing database and a web portal that documents the final rights and obligations (after the license compliance analysis is conducted) associated with the datasets and the data sources analyzed in our project. The database also documents the metadata collected and used to conduct the compliance workflow
An online license generation toolkit that creators of dataset to generate custom licenses depending on the exact rights and obligations that they want to allow (instead of having to rely of existing available and limited dataset specific licenses)

GitHub: https://github.com/OpenDataology/OpenDataology

SANDBOX

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model. Our goal is to support training, fine-tuning, and deployment of large-scale models on various downstream tasks with multi-modality. Currently, we are focusing on NLP models and tasks. In the near future, we will support for other modalities.

GitHub: https://github.com/BAAI-Open/FlagAI

Image Added

GRADUATE

Flyte is a production-grade, declarative, structured and highly scalable cloud-native workflow orchestration platform. It allows users to describe their ML/Data pipelines using Python, Java or (in the future other languages) and Flyte manages the data flow, parallelization, scaling and orchestration of these pipelines. Flyte builds on top of Docker containers and kubernetes.

GitHub: https://github.com/flyteorg/flyte

Image Added

INCUBATION

ForestFlow is a scalable policy-based cloud-native machine learning model server. ForestFlow strives to strike a balance between the flexibility it offers data scientists and the adoption of standards while reducing friction between Data Science, Engineering and Operations teams

Image Removed

INCUBATIONOpenDS4All is a project created to accelerate the creation of data science curricula at academic institutions. Our goal is to provide recommendations, slide sets, sample Jupyter notebooks, and other materials for creating, customizing, and delivering data science and data engineering education

.

GitHub: https://github.com/

odpi

ForestFlow/

OpenDS4All

ForestFlow

Image Removed

Image Added

Graduate

ONNX is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners

GRADUATE

Horovod, a distributed training framework for TensorFlow, Keras and PyTorch, improves speed, scale and resource allocation in machine learning training activities. Uber uses Horovod for self-driving vehicles, fraud detection, and trip forecasting. It is also being used by Alibaba, Amazon and NVIDIA. Contributors to the project outside Uber include Amazon, IBM, Intel and NVIDIA.

GitHub: https://github.com/

onnx

horovod/horovod

Image Removed

Image Added

Graduate

Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling.

GitHub: https://github.com/pyro-ppl/pyro

Image Removed

Sandbox

RosaeNLG is a template-based Natural Language Generation (NLG) automates the production of relatively repetitive texts based on structured input data and textual templates, run by a NLG engine. Production usage is widespread in large corporations, especially in the financial industry.

SANDBOX

Intersectional Fairness (ISF) is a bias detection and mitigation technology for intersectional bias, which combinations of multiple protected attributes cause.
ISF leverages the existing single-attribute bias mitigation methods to make a machine-learning model fair regarding intersectional bias.
Approaches applicable to ISF are pre-, in-, and post-processing. For now, ISF supports Adversarial Debiasing, Equalized Odds, Massaging, and Reject Option Classification.

GitHub: https:

//github.com/

RosaeNLG

intersectional-fairness/isf

Image Removed

Image Added

INCUBATION

SOAJS is an open source microservices and API management platform, SOAJS eliminates the IT plumbing challenges, so you can deploy microservices significantly earlier and faster. IT initiatives such as digital transformation are simplified, accelerated, cost reduced, and risk mitigated. Our fully integrated, world-class API lifecycle management, multi-cloud orchestration, release management, and IT Ops automation capabilities eliminate your IT organization’s modernization pain

The goal of the Interoperability Initiative is to enable voice and conversational AI to work like the web. Before us is a future where users can freely find and make use of any conversational assistant and language model that addresses their goals, just as they do now with web pages. Our path toward achieving this goal is to define, develop, and promote standards – beginning with an open, universal application programming interface (API) – which will enable any conversational assistant that follows the standards to freely interoperate with other standards-using assistants to connect, communicate, and transfer content and control across assistants, platforms, and language models. We encourage you to learn more about our interoperability work by following the links below, where you can learn about our approach, process, future work, and how to get involved.
Image Added	INCUBATION	JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. GitHub: https://github.com/janusgraph/janusgraph
Image Added	INCUBATION	Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering best-practice and applies them to machine-learning code; applied concepts include modularity, separation of concerns and versioning. GitHub: https://github.com/kedro-org
Image Added	INCUBATION	Kompute is a general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing use cases. GitHub: https://github.com/KomputeProject
Image Added	INCUBATION	KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX. It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability. GitHub: https://github.com/kserve
Image Added	SANDBOX	LakeSoul is a cloud-native Lakehouse framework developed by DMetaSoul team, and supports scalable metadata management, ACID transactions, efficient and flexible upsert operation, schema evolution, and unified streaming & batch processing. GitHub: https://github.com/meta-soul/LakeSoul
Image Added	INCUBATION	Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code. All you need to provide is your data, a list of fields to use as inputs, and a list of fields to use as outputs, Ludwig will do the rest. Simple commands can be used to train models both locally and in a distributed way, and to use them to predict on new data. GitHub: https://github.com/uber/ludwig
Image Added	GRADUATE	Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more. GitHub: https://github.com/MarquezProject
Image Added	INCUBATION	Milvus is an open source similarity search engine for massive-scale feature vectors. Built with heterogeneous computing architecture for the best cost efficiency. Searches over billion-scale vectors take only milliseconds with minimum computing resources. Milvus can be used in a wide variety of scenarios to boost AI development. GitHub: https://github.com/milvus-io
Image Added	INCUBATION	NNStreamer (Neural Network Support as Gstreamer Plugins) is a set of Gstreamer plugins that support ease and efficiency for Gstreamer developers adopting neural network models and neural network developers managing neural network pipelines and their filters. GitHub: https://github.com/nnstreamer
Image Added	SANDBOX	OpenBytes aims to facilitate wider sharing of, and collaboration with, data in the AI community through the promotion of data standards and formats and enabling contributions of data. The value of this project lies in its stimulus on academic interest and AI innovation by promoting high-quality datasets and pushing the boundaries of science further. GitHub: https://github.com/Project-OpenBytes

Image Added	SANDBOX	OpenDataology is an open source dataset license compliance analysis project. It enables users of publicly available datasets and users who curate datasets from multiple data sources (particularly for use as a part of machine learning models) to identify the potential license compliance risks. OpenDataology consists primarily of three key components. A dataset license compliance analysis workflow that ascertains the final allowed rights and the required obligations associated with using a publicly available dataset or a dataset that is curated from multiple data sources for any purpose. A growing database and web portal that documents the final rights and obligations (after the license compliance analysis is conducted) associated with the datasets and the data sources analyzed in the project. The database also documents the metadata collected and used to conduct the compliance workflow. An online license generation toolkit that creators of datasets can use to generate custom licenses, depending on the exact rights and obligations that they want to allow (instead of having to rely of existing available and limited dataset specific licenses). GitHub: https://github.com/OpenDataology/OpenDataology
Image Added	INCUBATION	OpenDS4All is a project created to accelerate the creation of data science curricula at academic institutions. Our goal is to provide recommendations, slide sets, sample Jupyter notebooks, and other materials for creating, customizing, and delivering data science and data engineering education. GitHub: https://github.com/odpi/OpenDS4All
Image Added	INCUBATION	OpenFL is a Python 3 library for federated learning that enables organizations to collaboratively train a model without sharing sensitive information. GitHub: https://github.com/intel/openfl
Image Added	GRADUATE	OpenLineage proposes an open standard and API for lineage collection that data processing engines can implement to publish at run time details of the data sources that it is reading, the types of processing it is performing and the destination of the results. GitHub: https://github.com/OpenLineage
Image Added	GRADUATE	ONNX is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners. GitHub: https://github.com/onnx
Image Added	GRADUATE	Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling. GitHub: https://github.com/pyro-ppl/pyro
Image Added	SANDBOX	The Recommenders repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The module recommenders contains functions to simplify common tasks used when developing and evaluating recommender systems. Github: https://github.com/Microsoft/Recommenders
Image Added	SANDBOX	RosaeNLG is a template-based Natural Language Generation (NLG) automates the production of relatively repetitive texts based on structured input data and textual templates, run by a NLG engine. Production usage is widespread in large corporations, especially in the financial industry. GitHub: https://github.com/RosaeNLG/
RWKV	INCUBATION	RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding. GitHub: https://github.com/rwkv/
Image Added	SANDBOX	Automatic machine learning, commonly referred to as AutoML, holds great promise in democratizing the utilization of machine learning (ML) by automating a large portion of the work typically performed by data scientists. However, the vast search space of potential pipelines poses a challenge, often resulting in suboptimal or no pipelines being generated, particularly when dealing with large and complex datasets. SapientML addresses this issue by leveraging a collection of pre-existing datasets and their human-created pipelines, enabling efficient generation of high-quality pipelines for new datasets with predictive tasks. With SapientML, data scientists can rapidly create and amend AI models, as the code is provided along with detailed explanations. Furthermore, citizen data scientists can easily create the desired AI models as well. GitHub: https://github.com/sapientml/sapientml
Image Added	SANDBOX	ShaderNN is a lightweight deep learning inference framework optimized for Convolutional Neural Networks. It provides high-performance inference for deep learning applications in image and graphics process on mobile devices. GitHub: https://github.com/inferenceengine/shadernn
Image Added	INCUBATION	SOAJS is an open source microservices and API management platform, SOAJS eliminates the IT plumbing challenges, so you can deploy microservices significantly earlier and faster. IT initiatives such as digital transformation are simplified, accelerated, cost reduced, and risk mitigated. Our fully integrated, world-class API lifecycle management, multi-cloud orchestration, release management, and IT Ops automation capabilities eliminate your IT organization’s modernization pain. GitHub: https://github.com/soajs
Image Added	INCUBATION	Substra is a framework offering distributed orchestration of machine learning tasks among partners while guaranteeing secure and trustless traceability of all operations. It enables privacy-preserving federated learning projects, where multiple parties collaborate on a Machine Learning objective while each one keeps their private datasets behind their own firewall. GitHub: https://github.com/SubstraFoundation/substra
Image Added	INCUBATION	sparklyr is an R package that lets you analyze data in Spark while using familiar tools in R. sparklyr supports a complete backend for dplyr, a popular tool for working with data frame objects both in memory and out of memory. You can use dplyr to translate R code into Spark SQL. GitHub: https://github.com/

soajs

sparklyr/sparklyr

Image RemovedGitHub:

Image Added

INCUBATION

Substra is a framework offering distributed orchestration of machine learning tasks among partners while guaranteeing secure and trustless traceability of all operations. It enables privacy-preserving federated learning projects, where multiple parties collaborate on a Machine Learning objective while each one keeps their private datasets behind their own firewall.

GitHub: https://github.com/SubstraFoundation/substra

Image Removed

INCUBATION

sparklyr is an R package that lets you analyze data in Spark while using familiar tools in R. sparklyr supports a complete backend for dplyr, a popular tool for working with data frame objects both in memory and out of memory. You can use dplyr to translate R code into Spark SQL.

Trust Mark works toward translating ethical principles specific to conversational AI intoaction and risk mitigation for developers, enterprise users – principles, LF EdX education, maturity model, and specifications.

Image Added

SANDBOX

Xtreme1 is the next generation open source platform for multi-sensory training data. It accelerates the modeling process by advanced AI-powered tools, thousands of projects distilled ontologies, and plentiful data curation features.

GitHub:

https://github.com/

sparklyr

basicai/

sparklyr

xtreme1

Recent space activity

Space contributors

Contributors

mode	list
scope	descendants
limit	5
showLastTime	true
order	update

Page tree

Versions Compared

Old Version 169

New Version Current

Key

LF AI & Data Foundation

LF AI & Data Committees

& Workstreams

Current Projects

Current Projects

Image Removed

Image Removed

Image Added

Image Removed

Image Added

Image Added

Image Added

Image Added

Image Added

Image Added

Image Added

RWKV

Image Added

Image Added

Image Added

Recent space activity

Space contributors

Page tree

Page History

Versions Compared

Old Version 169

New Version Current

Key

LF AI & Data Foundation

LF AI & Data Committees

& Workstreams

Current Projects

Current Projects

Image Removed

Image Removed

Image Added

Image Removed

Image Added

Image Added

Image Added

Image Added

Image Added

Image Added

Image Added

RWKV

Image Added

Image Added

Image Added

Recent space activity

Space contributors