News

Current Location: Home>News>Industry dynamics
The competition for domestic AI chips has just begun
Release time:2023-10-30 14:25:40 | Views:

AMD CEO Su Zifeng stated that the High Bandwidth Memory (HBM) density provided by the MI300X is 2.4 times that of the NVIDIA H100, and the HBM bandwidth is 1.6 times that of competitors. Wall Street analysts also generally believe that AMD's chip will pose a strong challenge to NVIDIA, which currently holds over 80% of the AI chip market. This MI300X accelerator is expected to replace NVIDIA's similar products.

 

However, the market's response to this new product does not seem enthusiastic. As of the overnight closing, AMD's stock price fell by over 3.6%, while Nvidia, which was challenged, rose instead of falling, and its stock price surged 3.90% in a single day.

 

As for the reason for the decline in AMD's stock price on that day, it may include Nvidia being more mature in AI development, while AMD's new products still need market validation. In addition, its customers are more concerned about price issues. As a reference, due to tight supply and demand, the price of the Nvidia H100 has reached $40000 per piece, and AMD has not disclosed the pricing of the MI300X, making it difficult to compare with the Nvidia H100


01

Nivida, a rising star

The emergence of ChatGPT in 2022 has pushed the development of the AI industry to a new climax. Generative AI requires inference training based on massive data, and high computing power GPU acceleration cards have naturally become a hot spot in the market. Riding the east wind of AI, Nvidia took the throne of "computing power overlord".

 

It is reported that Microsoft spent hundreds of millions of dollars and tens of thousands of Nvidia A100 chips to build a supercomputing platform, only to provide better computing power for ChatGPT and the new version of Bing. Moreover, Microsoft has deployed hundreds of thousands of GPUs in over 60 data centers on Azure for ChatGPT inference. Tesla CEO Musk has also purchased approximately 10000 GPUs for one of the company's two data centers. In addition, many technology companies such as Amazon, Alibaba, Baidu, etc. are competing to deploy AI chips.

 

The extreme imbalance between supply and demand has made it difficult for NVIDIA's GPUs to be supplied, leading to price increases. According to market sources, NVIDIA's A100 and H100 AI GPU orders are constantly increasing, and the prices of A800 and H800 have increased by 40%. The delivery time for new orders may be extended until December.

 

Under the hot wave of AI, Nvidia has made a lot of money. Nvidia stated that its sales for the quarter ending in July are expected to reach $11 billion, more than 50% higher than Wall Street's previous estimates. However, as the infrastructure of AI, the dominance of computing power chips is clearly not conducive to the long-term development of the industry. The market urgently needs to introduce new competitors, and the emergence of AMD may be able to "share" the pressure of the AI market.

 

At the same time, local AI applications and AI chip startups in China have also blossomed with the AI craze and the attention of venture capitalists. So, how is the progress of AI chip research in China? Which companies can stand out?

02

What is the progress of domestic AI chips?

The main AI chip companies in China include Cambrian, Huawei Shengteng, Haiguang Information, Muxi Technology, Biren Technology, Suiyuan Technology, Iluvatar, etc. With the popularization and effectiveness of AI applications, domestic AI chips are experiencing a comprehensive outbreak and growth, and multiple AI chip unicorns will gradually emerge.

Cambrian

In the cloud product line, Cambrian has launched four generations of chip products, namely the MUL100, MUL270, MUL290 (in car), and MUL370 series, to support artificial intelligence processing tasks with rapidly increasing complexity and data throughput in cloud computing and data center scenarios. In addition, there is another product under development during the Cambrian period, the MUL590, which has not yet been officially released, and the most eye-catching one is the MUL590 of the Cambrian period.

 

The chip adopts a brand new architecture of MLUarch05, and its actual training performance has significantly improved compared to the flagship product MUL290 series on sale, which is expected to become an advanced AI computing chip in China. It is reported that the overall computing power performance of the MUL590 is about 70% of that of the A100, and it is expected to replace the NVIDIA A100 in some scenarios.


Huawei Shengteng

The Huawei Ascend mainly includes two processors, the Ascend 910 and the Ascend 310, using its own Da Vinci architecture. The Shengteng 310 is a low-power AI processor designed for edge scenarios. The Shengteng 910 is a high-performance AI processor designed for cloud and data centers, capable of supporting large-scale AI training tasks with excellent performance.

 

According to the information released by Huawei, the actual test results show that in terms of computing power, the Shengteng 910 fully meets the design specifications, with a half precision (FP16) computing power of 256 Tera FLOPS and an integer precision (INT8) computing power of 512 Tera OPS. Importantly, the power consumption required to meet the specifications is only 310W, significantly lower than the design specifications of 350W.

 

It is reported that in practical applications, the processing speed of the Shengteng 910 is more than 80% faster than similar products in the industry. Xu Zhijun stated that the overall technical performance of the Shengteng 910 exceeded expectations, and as the most powerful AI processor, it deserves it.

 

However, the Shengteng 910 also has significant limitations. The Shengteng 910 relies on Huawei's own software ecosystem, requires Huawei's deep optimization and code porting, and has relatively poor universality. For example, the Shengteng cannot do GPT-3 because it does not support 32-bit floating-point, and currently, almost all large model training requires 32-bit floating-point.

Muxi Technology

Muxi Company mainly has two AI chips, Xisi and Xiyun, among which the Xiyun MXC series is a GPU chip developed by the company for AI training and general computing.

 

MXC500 is a computing power chip of Muxi benchmarking A100/A800, with an FP32 floating-point performance of up to 15TFlops. As a comparison, the A100 graphics card FP32 has a performance of 19.5 TFlops. In addition to its close performance, the complete software stack (MXMACA) of MXC500 is also compatible with CUDA and is expected to be shipped on a large scale by the end of the year.

 

In addition, Mu Xi's team has a rich background and experience. Some of its core personnel have participated in the development of AMD's MI100 and MI200 products, which are currently the most mainstream GPGPU products of AMD.


Haiguang Information

Haiguang Information is a potential stock. Compared with NVIDIA's A100 and AMD's MI100, many basic indicators of Haiguang DCU (coprocessor) series Deep Computing No.1 have reached the level of similar high-end products in the international market. Although there is still a significant gap in overall performance, it is already quite excellent in the context of domestic substitution and has huge development potential.

 

However, if Haiguang Information wants to use the new generation GPGPU architecture, it still requires AMD authorization, which poses iteration issues.


Biren Technology

When the BR100 of Bi Ren was released, it caused a strong sensation due to its ultra-high parameters and performance.

 

In terms of parameters, the BR100 series is built based on the 7nm process technology and has 77 billion transistors. Developed independently by Bi Ren Technology, the chip architecture adopts advanced design, manufacturing, and packaging technologies such as Chiplet and 2.5D CoWoS. It can be paired with 64GB HBM 2E graphics memory, over 300MB on-chip cache, and supports PCIe 5.0, CXL interconnection protocols, and more.

 

Performance is the highlight of BR100, with 1024 TOPS INT8512 TFLOPS BF16, 256 TFLOPS TF32+, and 128 TFLOPS FP32. It can achieve an external I/O bandwidth of 2.3TB/s and supports 64 encoding and 512 decoding. It is claimed to surpass the latest flagship of international manufacturers in dimensions such as FP32 (single precision floating-point) and INT8 (integer, commonly used in artificial intelligence reasoning).

 

The BR100 series universal GPU chips support cloud based training and inference, and are currently in the final stage and are expected to be released this year. The second chip of Biren Technology has started its architecture design. Later, Biren Technology will gradually launch GPU chips for smart computing centers, cloud games, and edge computing.

 

However, the BR100 product has not yet been released, and its parameters are still in the laboratory stage, making it difficult to measure its commercial measured performance.


Ali Pingtou Brother

Alibaba's AI chip is very different from the GPU architecture because they are entirely based on AI algorithm optimization architecture.

 

Alibaba once stated that the Luminous 800 was the strongest AI chip in the world at that time, with the highest performance and energy efficiency ratio. One Luminous 800 had the same computing power as 10 GPUs.

 

In the industry standard ResNet-50 test, the inference performance of the light containing 800 chip reached 78563 IPS, which is four times higher than the current industry's best AI chip performance; The energy efficiency ratio is 500 IPS/W, which is 3.3 times that of the second place.

 

In the industry, the chip released by Pingtou Brother is also highly regarded.

Suiyuan Technology

Suiyuan Technology is one of the few cloud based AI chip startups. It completed the second iteration of AI training chips in just three years, and its main product is "Shensi".

 

It is reported that the size of the Shensi 2.0 released by Suiyuan Technology reaches 57.5 millimeters × 57.5mm (with an area of 3306mm2), reaching the limit of the 2.5D packaging of the Sun and Moon light. Like the previous generation products, it adopts the Groveland 12nm FinFET process and integrates a total of 9 chips. The single precision FP32 has a computing power of 40TFLOPS, the single precision tensor TF32 has a computing power of 160TFLOPS, and the integer precision INT8 has a computing power of 320TOPS. In contrast, NVIDIA's A100 GPU based on the Ampere architecture has a single precision floating-point computing power of only 19.5TFLOPS.

Kunlun Core

Baidu Kunlun chip is a cloud AI universal chip independently developed by Baidu. At the Baidu AI Developers Conference held in July 2018, Robin Lee, chairman and CEO of Baidu, officially announced that Baidu's self-developed AI chip was named Kunlun. Baidu Kunlun 1 successfully streamed in 2019, using Samsung's 14nm manufacturing process. At present, it has mass produced more than 20000 pieces, which are widely deployed in Baidu search engine, Baidu AI Cloud ecological partners and other scenarios. Baidu Kunlun 2 achieved mass production in the second half of 2021, using 7nm advanced technology, and its performance is three times higher than Baidu Kunlun 1. It is reported that Baidu is planning to produce its third Kunlun chip by the end of the year.


Iluvatar

Iluvatar mainly includes two AI chips, Tiangai 100 and Zhijia 100. Tiangai 100 is a high-performance cloud based universal parallel computing card based on GPGPU architecture chip. It is introduced that Tiangai 100 is independently designed and developed from the bottom hardware to the upper software, without taking the shortcut of purchasing foreign GPU IP, ensuring complete independent intellectual property rights. Subsequently, Tiantian Smart Core released its second product, "Smart Armor 100", which is known as the "treasure of the town" and has attracted the attention of many industry users.

 

The Iluvatar GPGPGPU computing chip is mainly designed for cloud AI training+inference and cloud universal computing. It is a rare high-end computing chip in China that is compatible with heterogeneous computing ecosystems such as CUDA


03

Relying solely on computational power cannot make Nvidia's PlanB

Nvidia's strength is not only reflected in hardware products, but also in software platforms, Nvidia has its own moat.

 

CUDA is a GPU based parallel computing platform and programming model launched by NVIDIA, which can be used to accelerate large-scale data parallel computing, making GPU applicable to a wider range of scientific and engineering computing fields. The good ecosystem of CUDA has attracted the attention and use of numerous academic institutions and high-performance computing centers, and also provides Nvidia with a strong market competitive advantage. Now AMD is also doing the same thing, but NVIDIA has already taken the lead, and building AMD may be even more difficult.

 

The importance of CUDA is self-evident, but providing a CUDA compatible layer requires manufacturers to have sufficient research and development capabilities. The companies mentioned earlier that are compatible with CUDA include Mu Xi, Hai Guang, Bi Ren, Tian Tian Zhi Xin, etc. Therefore, compatibility with CUDA has become one of the criteria for measuring AI chip companies.

 

As for whether it is necessary to be compatible with CUDA, there are different opinions in the industry. Experts say that CUDA is still important in small models, but its position is becoming increasingly important in large models. If the future Chinese market is dominated by small models, CUDA will still have a great influence, while if it is dominated by large models, the dependence on CUDA will become smaller and smaller.

 

In summary, valuing software adaptation development is crucial.

 

Suggestions for establishing a domestically produced IT system include setting reasonable performance requirements and verification purposes for domestically produced systems and chips, and attempting to import some domestically produced chips from non critical applications; Strengthen software adaptation development to ensure software compatibility, stability, and operational performance with different systems; Establish and strengthen investment in domestic basic IT software and hardware manufacturers to ensure their influence on product development plans; Prioritize domestic supply chains and mature platforms, and actively adopt innovative semiconductor technologies


04

The gap between Nvidia and Nvidia will gradually narrow in the future

At present, chips have become one of the most promising fields in the semiconductor industry. As the core market driving the development of the chip industry, AI chips have incalculable industry value. With the gradual maturity of AI chip technology, their application scenarios gradually penetrate into various intelligent terminal fields, occupying an increasingly important position in China's technological development.

 

According to Gartner's data, the global market share of Chinese GPUs in the current global semiconductor industry is only 1%. In 2022, among the global semiconductor procurement of $600 billion, the scale of chip procurement by Chinese companies reached $149 billion, accounting for a quarter; The scale of chip procurement by Chinese factories of multinational enterprises has reached 213 billion US dollars, accounting for 35%.

 

China's chip industry still has enormous development potential. In the future, China's chip industry will continue to increase investment, and the distance between domestic enterprises and NVIDIA will gradually narrow.

 

Note: Transferred from the semiconductor industry vertical and horizontal