Ant Group Achieves Major Breakthrough in AI Model Training Using Domestic Chips and MoE Architecture
According to a Bloomberg report on March 24, citing informed sources, Ant Group has made significant strides in artificial intelligence by successfully training a large model named "Ling-Plus" with 290 billion parameters. This achievement was made possible by leveraging domestic chips from companies like Alibaba and Huawei, combined with the Mixture of Experts (MoE) architecture. This milestone not only highlights Ant Group's technological advancements but also demonstrates the potential of domestic chips in high-end computing.
The performance of "Ling-Plus" is reportedly on par with similar models trained using Nvidia's H800 chips, but with a 20% reduction in training costs. This cost advantage enhances Ant Group's competitiveness in AI technology applications and offers the industry a more economical and efficient solution for training large models.
A paper published by Ant Group's Ling team further reveals the depth of their technological innovations. By utilizing products from domestic manufacturers such as Moore Threads, Tangent Intelligence, and Cambricon, the team successfully reduced the cost of training 1 trillion tokens from 6.35 million yuan to 5.08 million yuan. This significant cost reduction is attributed to both the application of domestic chips and the implementation of innovative methods.
In terms of architectural and training strategy innovations, Ant Group employed dynamic parameter allocation and mixed-precision scheduling techniques to enhance resource utilization efficiency. Upgrades to the training anomaly handling mechanism, featuring an adaptive fault-tolerant recovery system, significantly shortened interruption response times, ensuring training stability. Additionally, the introduction of an automated evaluation framework compressed the verification cycle by over 50%, accelerating model optimization. Furthermore, instruction fine-tuning techniques based on knowledge graphs significantly improved the precision of complex task execution, enhancing the model's practicality.
Experimental results indicate that Ant Group's 300 billion-parameter MoE model can achieve efficient training even on lower-performance devices equipped with domestic GPUs. Its performance is comparable to models of the same scale trained entirely on Nvidia chips, whether they are dense models or MoE models. This achievement provides new insights and methods for AI development in resource-constrained environments, significantly enhancing the efficiency and accessibility of AI development.
Ant Group's series of innovations not only demonstrate its leading position in AI technology but also make a significant contribution to the advancement of the domestic chip industry. As technology continues to progress, there is reason to believe that domestic chips will play an increasingly important role in the field of artificial intelligence.
No comments:
Post a Comment