Date: Fri, 29 Mar 2024 07:33:48 +0000 (UTC) Message-ID: <2021280612.9161.1711697628078@aws-us-west-2-dlf-confluence-1.web.codeaurora.org> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_9160_97496659.1711697628078" ------=_Part_9160_97496659.1711697628078 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Help Us Improve the Wiki
This Wiki is owned by the LF AI Foundation Community. Contributions are = always welcomed to help improve it! In the upper right of this page, select= Log In to contribute. You will need a Linux Foundation ID (created at https://identity.linuxfoundati= on.org/) to log in. For a Conflu= ence overview, click here.
Welcome to the LF AI & Data Fo= undation wiki, where you will find information with a cross project focus. = For individual projects, follow the links below.
The LF AI= & Data Foundation is a project of The Linux Foundation that supports o= pen source innovation in artificial intelligence, machine learning, deep le= arning and data open source projects. The LF AI & Data Foundation was c= reated to support numerous technical projects within this important space.<= /span>
With the = LF AI & Data Foundation, members are working to create a neutral space = for harmonization and acceleration of separate technical projects focused o= n AI, ML, DL and Data technologies.
For more = information, please view the How to Get Involved deck.
Questions= ? Please email info@lfaidata.foundation.
Web Site: https://lfaidata.foundation/ <= /span>
Landscape: https://lands=
cape.lfaidata.foundation/
GitHub: https://github.com/lfai<=
/a>
Mail Lists: https://lists.lfaidata.foundation/g/main
Twitter: @LFAIData_Fdn=
Artwork: https://artwork.lfaidata.foundation<= /span>
Presen= tations: Google Slides or MS Powerpoint
Email: info@lfa= idata.foundation
Project |
Status |
Description = |
---|---|---|
|
SANDBOX= | 1chipML is an open source library for basic numerica= l crunching and machine learning for microcontrollers. As the Internet of T= hings and Edge Computing are becoming a ubiquitous reality, we need to a re= liable and open framework to use on limited and low power demanding hardwar= e. Gi= tHub: https://github.com/1chipML/1chipML = td> |
|
GRAD= UATE |
Acumos is an Open Source Platform, which supports desig= n, integration and deployment of AI models. Furthermore, Acumos supports an= AI marketplace that empowers data scientists to publish adaptive AI models= , while shielding them from the need to custom develop fully integrated sol= utions. GitHub: https://g= ithub.com/acumos |
|
INCUBAT= ION | Adlik is an end-to-end optimizing framework for deep le= arning models. The goal of Adlik is to accelerate deep learning inference p= rocess both on cloud and embedded environment. GitHub: https://github.com/Adlik |
|
INCUBAT= ION | Amundsen is a data discovery and metadata engine for im= proving the productivity of data analysts, data scientists and engineers wh= en interacting with data. GitHub: https://github.com/amundsen-io |
<= /p> |
GRADUATE |
Angel is a high-performance distributed machine learnin= g platform based on the philosophy of Parameter Server. It is tuned for per= formance with big data from Tencent and has a wide range of applicability a= nd stability, demonstrating increasing advantage in handling higher dimensi= on model. |
|
GRADUAT= E | Adversarial Robustness To= olbox (ART) provides tools that enable developers and researchers to evalua= te, defend, certify and verify Machine Learning models and applications aga= inst the adversarial threats. GitHub: https://github.com/Trusted-AI/adversarial-robu= stness-toolbox |
|
INCUBAT= ION | AI Explainability 360 is an open source toolkit that ca= n help users better understand the ways that machine learning models predic= t labels using a wide variety of techniques throughout the AI application l= ifecycle. |
|
INCUBAT= ION | AI Fairness 360 is an extensible open source toolkit th= at can help users understand and mitigate bias in machine learning models t= hroughout the AI application lifecycle. GitHub:&nb= sp;https://github.com/Trusted-AI/AIF360 |
|
SANDBOX= | Artigraph is a tool to improve the authorship, manag= ement, and quality of data. It emphasizes that the core deliverable of a da= ta pipeline or workflow is the data, not the tasks. Artigraph aims to = shift tooling focus towards managing the entire data lifecycle (lineage, me= tadata, schema, storage formats and systems, etc). = td> |
|
SANDBOX= | BeyondML is a f= ramework for developing sparse neural networks that can perform multiple ta= sks across multiple data domains. This framework provides value to the comm= unity by: - simplifying of the developm= ent and deployment of advanced machine learning capabilities for use on low= -end devices and in dynamic environments characteristic of the resource-con= strained edge - reducing in the complex= ity and cost of deploying ML models or systems of models to cloud platforms= - reducing in the carbon footprint of = deployed ML models |
|
SANDBOX= | Within the Bito= l project, the primary objective is to tackle multiple challenges, such as = data normalization, ensuring the relevance of documentation, establishing s= ervice-level expectations, simplifying data and tool integration, and promo= ting a data product-oriented approach. These efforts offer several advantag= es, including stimulating innovation and streamlining integration processes= . Github: https://github.com/bito= l-io |
|
SANDBOX= | CLAIMED (= Component Library for AI, Machine Learning, ETL and Data Science) is = a runtime and programming language agnostic Data & AI component framewo= rk abstracting away all complexity for advanced MLOps and TrustedAI. Donate= d by the IBM Center for Open Source Data and AI Technologies, the project a= ims to democratize Data & AI by providing empowerment for Data Scientis= ts to take on MLOps and TrustedAI, as well as Citizen Data Scientists to to= use no-code/low-code tooling. |
|
INCUBAT= ION | Datashim is enabling and accelerating data access fo= r Kubernetes/Openshift workloads in a transparent and declarative way. It's= opensource since September of 2019 and it is growing to support use-cases = related to data access in AI projects. GitHub: https://github.com/IBM/da= taset-lifecycle-framework |
<= /p> |
INCUBAT= ION | DataPractice= s.org was pioneered by data.world as a =E2=80=9CManifesto for Data Prac= tices=E2=80=9D of four values and 12 principles that illustrate the most ef= fective, ethical, and modern approach to data teamwork. As a member of the = foundation, datapractices.org will expand to offer open courseware a= nd establish a collaborative approach to defining and refining data best pr= actices. GitHub: https://github.com/datadotworld/data-practices-site |
|
SANDBOX |
DeepCausality is a = hyper-geometric computational causality library that enables fast and deter= ministic context-aware causal reasoning over complex multi-stage causality = models. Deep Causality adds only minimal overhead and thus is suitable for = real-time applications without additional acceleration hardware. =GitHub: https://github.com/deepcausality-rs/deep_causality
|
|
SANDBOX |
DeepRec is a high-performance re= commendation deep learning framework based on TensorFlow. |
|
INCU= BATION |
DELTA is a deep learning based end-to-end = natural language and speech processing platform. DELTA aims to provide easy and fast experiences for= using, deploying, and developing natural language processing and speech mo= dels for both academia and industry use cases. DELTA is mainly implemented = using TensorFlow and Python 3. GitHub: https://gi= thub.com/didi/delta |
|
INCUBAT= ION | DocArray is a library for nested, unstructured, multimo= dal data in transit, including text, image, audio, video, 3D mesh, etc. It = allows deep-learning engineers to efficiently process, embed, search, recom= mend, store, and transfer the multi-modal data with a Pythonic API.<= /p> GitHub: https://github.com/docarray |
|
INCUBATION |
Elastic Deep Learning (EDL) optimizes the global utiliz= ation of the cluster running deep learning job and the waiting time of job = submitters. It includes two parts: a Kubernetes controller for the elastic = scheduling of distributed deep learning jobs, and a fault-tolerable deep le= arning framework. GitHub:&n= bsp;https://github.com/PaddlePaddle/edl |
|
GRADUAT= E | Egeria is an op= en source project dedicated to making metadata open and automatically excha= nged between tools and platforms, no matter which vendor they come from. GitHub: https://github.com/= odpi/egeria |
|
SANDBOX= | Elyra is an open-source low code / no code framework= for creating reproducible, scalable and component based data science pipel= ines. It allows senior data scientist to create reusable components easily.= Citizen data scientists can reuse their code without programming skills. M= LOps engineers are provided with tested and maintainable deliverables, and = scale on Kubeflow, Airflow and others. |
|
INCUBAT= ION | FATE (Federated AI Technology Enabler) is the world'= s first industrial grade federated learning open source framework to enable= enterprises and institutions to collaborate on data while protecting data = security and privacy. It implements secure computation protocols based on h= omomorphic encryption and multi-party computation (MPC). Supporting various= federated learning scenarios, FATE now provides a host of federated learni= ng algorithms, including logistic regression, tree-based algorithms, deep l= earning and transfer learning. GitHub: https://github= .com/FederatedAI |
|
INCUBAT= ION | Feast is an ope= n source feature store for machine learning. It was developed as a collabor= ation between Gojek and Google in 2018. Feast aims to: = -- Provide scalable and performant access to feature data for ML models dur= ing training or serving. -- Provide a consistent view of features for both training and serving.&nb= sp;-- Enable re-use of featur= es through discovery, documentation, and metadata tracking. --Ensures model performance by tracking, val= idating, and monitoring features in production. |
<= img class=3D"confluence-embedded-image" draggable=3D"false" width=3D"400" s= rc=3D"494552d4ca4047133c99c1d98e11e529" data-image-src=3D"/download/attachm= ents/327683/feathr-horizontal-color.png?version=3D1&modificationDate=3D= 1668735270000&api=3Dv2" data-unresolved-comment-count=3D"0" data-linked= -resource-id=3D"70647825" data-linked-resource-version=3D"1" data-linked-re= source-type=3D"attachment" data-linked-resource-default-alias=3D"feathr-hor= izontal-color.png" data-base-url=3D"https://wiki.lfaidata.foundation" data-= linked-resource-content-type=3D"image/png" data-linked-resource-container-i= d=3D"327683" data-linked-resource-container-version=3D"233" alt=3D"" height= =3D"115"> |
SANDBOX= | Feathr is an enterprise-grade, high-performance feat= ure store. Feathr automatically computes your feature values and joins them= to your training data, using point-in-time-correct semantics to avoid data= leakage, and supports materializing and deploying your features for use on= line in production. GitH= ub: https://github.= com/linkedin/feathr |
|
SANDBOX= | FlagAI (Fast LArge-scale General AI models) is a fast, = easy-to-use and extensible toolkit for large-scale model. Our goal is to su= pport training, fine-tuning, and deployment of large-scale models on variou= s downstream tasks with multi-modality. Currently, we are focusing on NLP m= odels and tasks. In the near future, we will support for other modalities.<= /span> |
|
GRADUAT=
E |
Flyte is a production-grade, declarative, structured= and highly scalable cloud-native workflow orchestration platform. It allow= s users to describe their ML/Data pipelines using Python, Java or (in the f= uture other languages) and Flyte manages the data flow, parallelization, sc= aling and orchestration of these pipelines. Flyte builds on top of Docker c= ontainers and kubernetes. |
|
INCUBAT= ION | ForestFlow is a scalable policy-based cloud-native m= achine learning model server. ForestFlow strives to strike a balance betwee= n the flexibility it offers data scientists and the adoption of standards w= hile reducing friction between Data Science, Engineering and Operations tea= ms. |
|
GRADUATE |
Horovod, a distributed training framework for TensorFlo= w, Keras and PyTorch, improves speed, scale and resource allocation in mach= ine learning training activities. Uber uses Horovod for self-driving vehicl= es, fraud detection, and trip forecasting. It is also being used by Alibaba= , Amazon and NVIDIA. Contributors to the project outside Uber include Amazo= n, IBM, Intel and NVIDIA. G= itHub: https://github.com/horovod/horovod <= /td> |
|
SANDBOX= | Intersectional Fairness (ISF) is a bias detection an=
d mitigation technology for intersectional bias, which combinations of mult=
iple protected attributes cause. |
<= img class=3D"confluence-embedded-image" draggable=3D"false" width=3D"468" s= rc=3D"9798b019d3e3a4b33bbf6ed775ef772a" data-image-src=3D"/download/attachm= ents/327683/2023.10.19%20Interoperability%20Logo.png?version=3D1&modifi= cationDate=3D1697750432000&api=3Dv2" data-unresolved-comment-count=3D"0= " data-linked-resource-id=3D"98205733" data-linked-resource-version=3D"1" d= ata-linked-resource-type=3D"attachment" data-linked-resource-default-alias= =3D"2023.10.19 Interoperability Logo.png" data-base-url=3D"https://wiki.lfa= idata.foundation" data-linked-resource-content-type=3D"image/png" data-link= ed-resource-container-id=3D"327683" data-linked-resource-container-version= =3D"233" alt=3D"" height=3D"155"> |
INCUBATION |
The goal of the= Interoperability Initiative is to enable voice and conversational AI to wo= rk like the web. Before us is a future where users can freely find and make= use of any conversational assistant and language model that addresses thei= r goals, just as they do now with web pages. Our path toward achievin= g this goal is to define, develop, and promote standards =E2=80=93 beginnin= g with an open, universal application programming interface (API) =E2=80=93= which will enable any conversational assistant that follows the standards = to freely interoperate with other standards-using assistants to connect, co= mmunicate, and transfer content and control across assistants, platforms, a= nd language models. We encourage you to learn more about our interope= rability work by following the links below, where you can learn about our a= pproach, process, future work, and how to get involved. |
|
INCUBAT= ION | JanusGraph is a= scalable graph database optimized for storing and querying graphs containi= ng hundreds of billions of vertices and edges distributed across a multi-ma= chine cluster. |
|
I= NCUBATION | Kedro is an open-source Python framework for creatin= g reproducible, maintainable and modular data science code. It borrows conc= epts from software engineering best-practice and applies them to machine-le= arning code; applied concepts include modularity, separation of concerns an= d versioning. GitHub: https://github.com/kedro-org&nb= sp; |
|
I= NCUBATION | Kompute is a general purpose GPU compute framewor= k for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Bl= azing fast, mobile-enabled, asynchronous and optimized for advanced GPU dat= a processing use cases. |
|
INCUBAT= ION | KServe provides a Kubernetes Custom Resource Definit= ion for serving machine learning (ML) models on arbitrary frameworks. It ai= ms to solve production model serving use cases by providing performant, hig= h abstraction interfaces for common ML frameworks like Tensorflow, XGBoost,= ScikitLearn, PyTorch, and ONNX. It encapsulates the complexity of autoscal= ing, networking, health checking, and server configuration to bring cutting= edge serving features like GPU Autoscaling, Scale to Zero, and Canary Roll= outs to your ML deployments. It enables a simple, pluggable, and complete s= tory for Production ML Serving including prediction, pre-processing, post-p= rocessing and explainability. GitHub: https://github.com/kserve |
|
SANDBOX= | LakeSoul is a cloud-native Lakehouse framework devel= oped by DMet= aSoul team, and= supports scalable metadata management, ACID transactions, efficient and fl= exible upsert operation, schema evolution, and unified streaming & batc= h processing. |
|
INCUBAT= ION | Ludwig is a toolbox built on top of TensorFlow that all= ows to train and test deep learning models without the need to write code. = All you need to provide is your data, a list of fields to use as inputs, an= d a list of fields to use as outputs, Ludwig will do the rest. Simple comma= nds can be used to train models both locally and in a distributed way, and = to use them to predict on new data. GitHub: <= /span>https://github.com/uber/ludwig |
|
GRADUAT= E | Marquez is an open source metadata service for the coll= ection, aggregation, and visualization of a data ecosystem=E2=80=99s metada= ta. It maintains the provenance of how datasets are consumed and produced, = provides global visibility into job runtime and frequency of dataset access= , centralization of dataset lifecycle management, and much more. = |
INCUBAT= ION | Milvus is an open source similarity search engine for m= assive-scale feature vectors. Built with heterogeneous computing architectu= re for the best cost efficiency. Searches over billion-scale vectors take o= nly milliseconds with minimum computing resources. Milvus can be used in a = wide variety of scenarios to boost AI development. GitHub: = https://github.com/milvus-io |
|
|
INCUBAT= ION | NNStreame= r (Neural Network Support as Gstreamer Plugins) is a set of Gstreamer = plugins that support ease and efficiency for Gstreamer developers adopting = neural network models and neural network developers managing neural network= pipelines and their filters. GitHub: https://github.com/nnstreamer |
|
SANDBOX= | OpenBytes aims to facilitate wider sharing of, and c= ollaboration with, data in the AI community through the promotion of data s= tandards and formats and enabling contributions of data. The value of this = project lies in its stimulus on academic interest and AI innovation by prom= oting high-quality datasets and pushing the boundaries of science further.<= /span> |
|
SANDBOX= | OpenDataology is an open source dataset license com= pliance analysis project. It enables users of publicly available datasets a= nd users who curate datasets from multiple data sources (particularly for u= se as a part of machine learning models) to identify the potential license = compliance risks. OpenDataology consists primarily of three key components.=
|
|
INCUBAT= ION | OpenDS4All is a= project created to accelerate the creation of data science curricula at ac= ademic institutions. Our goal i= s to provide recommendations, slide sets, sample Jupyter notebooks, and oth= er materials for creating, customizing, and delivering data science and dat= a engineering education. |
|
INCUBAT= ION | OpenFL is a Python 3 library for federated learning = that enables organizations to collaboratively train a model without sharing= sensitive information. = GitHub: https://github.com/intel/openfl |
|
GRADUAT= E | OpenLineage proposes an open standard and API for linea= ge collection that data processing engines can implement to publish at run = time details of the data sources that it is reading, the types of processin= g it is performing and the destination of the results. GitHub: https://github.com= /OpenLineage |
|
GRAD= UATE |
ONNX is an open format to represent deep learning model= s. With ONNX, AI developers can more easily move models between state-of-th= e-art tools and choose the combination that is best for them. ONNX is devel= oped and supported by a community of partners. GitHub: https://github.com/onnx <= /td> |
|
GRADUATE |
Pyro is a universal probabilistic programming language = (PPL) written in Python and supported by PyTorch on the backend. Pyro enabl= es flexible and expressive deep probabilistic modeling, unifying the best o= f modern deep learning and Bayesian modeling. |
|
SANDBOX= | The Recommenders repository provides ex= amples and best practices for building recommendation systems, provided as = Jupyter notebooks. The module recommenders contains functions to simplify c= ommon tasks used when developing and evaluating recommender systems.= |
|
SANDBOX= | RosaeNLG is a template-based Natural Language Generati= on (NLG) automates the production of relatively repetitive texts based on s= tructured input data and textual templates, run by a NLG engine. Production= usage is widespread in large corporations, especially in the financial ind= ustry. GitHub: https://github.com/RosaeNLG/ |
RWKV |
INCUBAT= ION | RWKV is an RNN with transformer-level LLM performanc= e. It can be directly trained like a GPT (parallelizable). So it's combinin= g the best of RNN and transformer - great performance, fast inference, save= s VRAM, fast training, "infinite" ctx_len, and free sentence embedding. GitHub: https://github.com/rwkv/ |
|
SANDBOX | Automatic machine learning, comm= only referred to as AutoML, holds great promise in democratizing the utiliz= ation of machine learning (ML) by automating a large portion of the work ty= pically performed by data scientists. However, the vast search space of pot= ential pipelines poses a challenge, often resulting in suboptimal or no pip= elines being generated, particularly when dealing with large and complex da= tasets. SapientML addresses this issue by leveraging a collection of pre-ex= isting datasets and their human-created pipelines, enabling efficient gener= ation of high-quality pipelines for new datasets with predictive tasks. Wit= h SapientML, data scientists can rapidly create and amend AI models, as the= code is provided along with detailed explanations. Furthermore, citizen da= ta scientists can easily create the desired AI models as well. |
|
SANDBOX= | ShaderN= N is a lightweight deep learning inference framework optimized for Convolut= ional Neural Networks. It provides high-performance inference for deep lear= ning applications in image and graphics process on mobile devices.= p> Gi= tHub: https://github.com/inferenceengine/shadernn= |
INCUBAT= ION | SOAJS is an open source microservices and API manage= ment platform, SOAJS eliminates the IT plumbing challenges, so you can depl= oy microservices significantly earlier and faster. IT initiatives such as d= igital transformation are simplified, accelerated, cost reduced, and risk m= itigated. Our fully integrated, world-class API lifecycle management, multi= -cloud orchestration, release management, and IT Ops automation capabilitie= s eliminate your IT organization=E2=80=99s modernization pain. GitHub= : https://github.com/soajs |
|
|
INCUBAT= ION | Substra is a framework offering distributed orchestr= ation of machine learning tasks among partners while guaranteeing secure an= d trustless traceability of all operations. It enables = span>privacy-preserving federated learning projects, where multiple parties collabora= te on a Machine Learning objective while each one keeps their private datas= ets behind their own firewall. <= span style=3D"color: rgb(36,41,46);">GitHub: https://github.com/SubstraFoundation/substra |
|
INCUBAT= ION | sparklyr = is an R package that lets you analyze data in Spark while using familiar to= ols in R. sparklyr supports a complete backend for dplyr, a popular tool fo= r working with data frame objects both in memory and out of memory. You can= use dplyr to translate R code into Spark SQL. |
|
INCUBATION | Trust Mark work= s toward translating ethical principles specific to conversational AI= into action and risk mitigation for developers, en= terprise users =E2=80=93 principles, LF EdX education, m= aturity model, and specifications. |
|
SANDBOX= | Xtreme1 is the = next generation open source platform for multi-sensory training data. It ac= celerates the modeling process by advanced AI-powered tools, thousands of p= rojects distilled ontologies, and plentiful data curation features. = p> GitHub: https://github.com/basicai/xtreme1
|