DataCanvas Alaya

Large Model Matrix

DataCanvas Alaya is a self-developed “general + industry” white-box large-model matrix by DataCanvas. As one of the core capabilities of AI Foundation Software, it adheres to the open-and-friendly open-source philosophy, empowering users with greater freedom in AI innovation, aiming to accelerate the application of large models in diverse business scenarios. DataCanvas provides a series of pre-trained large models with different configurations and parameters, possessing cutting-edge capabilities and technologies. These models reshape the current AI software landscape in tasks like text dialogue, image generation, and notably, the DataPilot domain.

Alaya-7B Highlights LLMOps Toolchain Applications

Alaya-7B

Alaya-7B Foundation Model is one of the open-source foundation models in the DataCanvas Alaya Large Model Matrix. It is pre trained from scratch on a self-collected and carefully selected trillion-token dataset (including Chinese and English articles, news, encyclopedias, and other data sources on the internet).

https://github.com/DataCanvasIO/alaya

Alaya-7B Foundation Model

Alaya-7B Chat Model is a conversational version of Alaya-7B and is also a member of the DataCanvas Alaya Large Model Matrix. After fine-tuning the selected dataset and detoxifying data based on drug, pornography, and negative bias, Alaya-7B Chat Model was aligned with human values.

https://github.com/DataCanvasIO/alaya

Alaya-7B Chat Model

Highlights

Open-source License

“White-Box”Large Model
Apache 2.0 License
Support Fine-Tuning
Multimodal

Support Text & Image
Support Sequential Data
Support Structural Data
New Model Training Mechanism

Improved Attention Mechanism
Longer Context Window
Composable Fine-Tuning
Brand New Masking Mechanism
Model Matrix Series

Model Scale from Small to Large
General to Vertical Industry

LLMOps Toolchain

The LLMOps toolchain was born for training and using large models, covering the entire lifecycle process of training, fine-tuning, compression, deployment, inference, and monitoring of large models. It provides a complete set of tools for data scientists and application developers to easily process data and use this data to develop, train, and deploy models of any size.

LMS

LMS- Large Model Serving, is mainly aimed at engineering and technical developer, aiming to help engineers achieve the delivery and operation of large models, improve the delivery speed and quality of large models, reduce the operation and maintenance costs of large models, and meet the needs of large-scale model production and service operation.

https://github.com/DataCanvasIO/LMS
LMPM

LMPM- Large Model Prompt Manager，is a tool for designing and constructing large model prompts, guiding users to design better prompts and generate more accurate, reliable, and expected output content. This tool can provide both development toolkit for technical personnel and human-machine interaction mode for non-technical personnel, meeting the needs of different groups of people using large models.

https://github.com/DataCanvasIO/LMPM

Applications

Enterprise Knowledge Steward Solution

The Enterprise Knowledge Steward Solution is an advanced model technology application that integrates DataCanvas Alaya large model, DingoDB multimodal vector database, and AIFS artificial intelligence foundational software products. By collecting and processing data, writing to the vector database, integrating and fine-tuning the large language model, applying knowledge assistant applications, and engaging in feedback and iterative optimization, enterprise users can build highly automated and intelligent capabilities for knowledge management and exploratory analysis.

Tech Advantages

Multi-Modal Support

Supports multiple data modes
Provides semantic alignment
High-Precision Retrieval

Parsing of multiple data types
Preserves the original content of the text
High Availability and Scalability

Storage provides mechanisms for multiple replicas
Storage provides mechanisms for multi-node scalability
Security & Compliance

Data stored in private domain
Large models deployed internally within the enterprise
Intelligent Data Fusion

Unified analysis of structured and unstructured data
Integrating data from multiple business systems
Rich Scenarios

Knowledge question and answering
Multi-modal data retrieval
Natural language analysis and decision-making