Table of contents
TogglePreface
Artificial intelligence (AI) is rapidly changing our world. Whether it is chatbots, voice assistants, or self-driving vehicles, they all rely on powerful AI training and reasoning technologies. But not all AI models are trained the same, with some companies choosing to use state-of-the-art hardware while others try to achieve similar results with fewer resources.
DeepSeek, OpenAI, and Anthropic are the three major competitors in the AI field, and each company has different training strategies. DeepSeek chose to use the older but less expensive A100 GPU, OpenAI relied on the latest NVIDIA H100, and Anthropic relied on Google TPU to optimize AI training. This article will delve into the strategies of these three companies in AI training and reasoning, and analyze their impact on the AI industry. Let’s read on!
AI training and AI reasoning: important processes for training artificial intelligence models
Artificial intelligence (AI) has become a core battlefield in the technology world, and there are two key stages in the development of AI models:AI Training and AI Inference.
- AI training is like learning a new skill, requires constant practice and absorption of knowledge, just like a student preparing for an exam, reading, taking notes, and practicing questions.
- AI reasoning is like an exam, the knowledge that has been learned needs to be applied quickly to answer questions, ensuring that the results are fast and accurate.
at present,OpenAI (GPT-4), Anthropic (Claude), and DeepSeek They are the three major players in the AI training market. Traditionally, OpenAI and Anthropic have relied on NVIDIA H100 GPU or Google TPU training model, but DeepSeek takes a different strategy and uses the older A100 GPU To reduce the cost of AI training.
How does DeepSeek use A100 GPU to challenge high-end chips?
DeepSeek's unique training method
But why does DeepSeek use the older A100 GPU instead of the latest H100 or Blackwell? Does this really offer any advantages?
DeepSeek did not choose the most powerful GPU on the market to train AI, but A100 GPU, and throughMixture of Experts (MoE) To improve AI training efficiency.
Mixed Expert Model
So, how does the MoE model work? Why can it effectively reduce costs?
MoE is the core technology of DeepSeek, similar to aSmart Restaurant:
- General AI training is like "every chef cooks the same dish", all GPUs run together, consuming a lot of resources.
- MoE is like "letting the chef who is best at cooking the dish". Different expert networks are responsible for different parts, reducing GPU operating costs and improving AI training efficiency.
Through MoE, DeepSeek Only enable part of the expert network, not the entire model, making AI training resources more economical and effectively utilizing the A100 GPU.
How cloud computing can maximize the performance of A100
But is it enough to rely solely on A100? How does DeepSeek ensure that the performance of the model does not degrade due to the use of older GPUs?
DeepSeek also Cloud resource scheduling, ensuring more flexible allocation of AI training resources. This allows DeepSeek to achieve efficient training results even with older GPUs, just like taxi sharing, allowing all passengers to reach their destination smoothly without adding additional vehicles.
Why did OpenAI and Anthropic choose H100 and TPU?
The choice of AI training is not just "faster hardware is always better", different companies have different strategic considerations.
OpenAI and Anthropic chose different AI training hardware, which implies far-reaching technical decisions and market competition considerations.
Why does OpenAI's GPT-4 need H100?
Top-notch learning environment: H100 is like an elite school
If DeepSeek can use A100 to train AI, why does OpenAI spend a lot of money to adopt H100? This is like when students prepare for a big exam, some choose to study on their own using the most common reference books, while others go toThe best cram school,haveGuidance from famous teachers, exclusive teaching materials, and even personalized teaching plans, to ensure that you can rank top in the exam.
H100 is a "top school" in the field of AI training. It has stronger computing power and is suitable for large-scale AI training. This means that GPT-4 not only needs to "learn" language, but also needs to achieve language understanding and generation capabilities that surpass humans.
Why can H100 provide the performance OpenAI needs?
- Maximum memory bandwidth: This allows GPT-4 to process massive amounts of data at once, just like a student is able to digest more information at once.
- Built-in Transformer Engine: This is an acceleration technology designed specifically for AI that helps GPT-4 perform data calculations faster, just like having an efficient note-taking method that makes learning more efficient.
- Stronger parallel computing capabilities: Allows H100 to complete AI training faster at runtime, avoiding the performance bottlenecks faced by traditional GPUs.
In other words, H100 is like a "super elite learning environment" tailored for OpenAI, allowing GPT-4 to achieve the ultimate learning speed and accuracy.
Why did Anthropic choose TPU for its Claude model?
Different strategies: TPU is like a special class for Olympic mathematics competitions
Anthropic did not choose to follow OpenAI's footsteps, but instead used Google TPU to train Claude.
TPU is an AI chip developed by Google itself, which is optimized for AI training. This is equivalent to a training center tailor-made for mathematics competition contestants, providing an optimized learning environment to ensure that students can achieve the best results in the competition.
Why is TPU suitable for Claude?
- Faster matrix operations: The core of AI training is matrix calculation, and TPU is optimized for this feature, just like providing more efficient computing tools to math competition students.
- Seamless integration with the Google ecosystem: Anthropic mainly uses Google Cloud to train Claude. TPU can perform at its best in such an environment and reduce the delay of data transmission.
- Reduce NVIDIA dependency: If the AI training market is completely monopolized by NVIDIA, costs will be difficult to control. Anthropic chose TPU not only for technical considerations, but also for strategic independence.
In other words, Claude's training focuses onComputational efficiency and flexibility, TPU provides a relatively independent and efficient environment that suits Claude's development needs.
Why do different AI companies choose different hardware?
Market positioning and strategy differences
The choice of AI training is actually like a sports competition. Different players will choose the most suitable training method based on their own strengths.
- OpenAI chooses H100, just like a sprinter chooses high-intensity burst training, ensuring you can cross the finish line as quickly as possible during the race.
- Anthropic chooses TPU, just like marathon runners choose long-term endurance training, ensuring the stability and continuous computing power of AI.
Such a choice is not just a technical issue, but also involves corporate strategy and market goals.
How will the competitive landscape develop?
As AI training technology evolves, different companies will choose the technology stack that best suits their own development.
- NVIDIA continues to launch more powerful GPUs, such as H200, Blackwell, will attract AI training companies that require extreme performance.
- Google may further develop TPU technology to make it more competitive in specific applications.
- Other AI chip companies (such as Cerebras and Graphcore) may challenge the existing technology framework and provide new options.
Conclusion: Different training methods, same goal
Whether you choose H100 or TPU, the goal behind every AI training strategy is the same - to enable AI to learn and reason faster, more accurately, and more efficiently, thereby improving the capabilities of application scenarios.
How does AI reasoning affect the final AI application?
Practical application scenarios of AI reasoning
Chatbots and Voice Assistants
When you ask a question using ChatGPT or Siri, the AI must analyze your meaning, retrieve the best answer, and then organize the sentence response within milliseconds. If this process is too slow, the conversation will become jerky, just like when you are chatting with a friend and the other person is always a beat behind, the experience will be very bad.
Image recognition and face unlocking
Today's smartphones are equipped with facial recognition unlocking function. When you hold the phone up to your face, the AI needs to compare your facial features in a very short time, otherwise the unlocking speed will slow down or even fail, and the user may prefer to go back to the traditional password input method.
Real-time decision making for autonomous driving systems
The most extreme application of AI reasoning is autonomous driving. Imagine a self-driving car traveling at 100 kilometers per hour, and suddenly someone crosses the road in front of it. AI must decide whether to brake, turn, or slow down in less than 0.1 seconds, otherwise a serious accident will result. If AI reasoning is too slow, the vehicle will not be able to respond in time, and the consequences will be disastrous.
The key to reasoning: the balance between speed and accuracy
The speed and accuracy of reasoning are core issues in AI competition. In the past, many AI models emphasized accuracy, but if the process was too slow, even if the answer was accurate, it would not meet the needs of immediate applications. Therefore, how to achieve the best balance between speed and accuracy has become the ultimate goal of the development of AI reasoning technology.
This is why AI companies not only pursue more powerful computing power when developing models, but also need to optimize the inference architecture to ensure that AI can make efficient decisions in real time.
Competition in the future AI training and inference market
The technology for AI training is shifting, and DeepSeek’s strategy lowers costs, giving more companies a chance to compete. Does this mean that the AI market will usher in a new round of reshuffle?
The impact of low-cost AI training
In the past, AI training was like a luxury arms race, with only a few large companies able to afford top-notch equipment. DeepSeek uses MoE (mixed expert model) and A100 GPU to make AI training more like "modified car racing". As long as it is optimized properly, it can compete at low cost.
This will lower the threshold for AI development and enable more companies to participate in the market without having to rely on expensive H100 GPUs, changing the situation that was previously dominated by only technology giants.
AI reasoning becomes a new battlefield
As AI training costs decrease, companies will pay more attention to reasoning performance. AI training can be thought of as an athlete’s preparation for a competition, while reasoning is the performance in the actual competition. If AI training becomes widespread, the real competitive advantage will shift to the speed and accuracy of inference technology.
Conclusion: The future direction of the AI market
The AI market is undergoing transformation, with low-cost training and efficient inference becoming the core of competition. DeepSeek offers a more cost-effective AI training model, while OpenAI and Anthropic still stick to high-efficiency strategies.
This technological competition is still ongoing. In the next few years, the market landscape may change significantly, and those companies that can balance cost and efficiency will ultimately win!
Related reports
related articles
Taiwan’s first AI unicorn: What is Appier, with a market value of US$1.38 billion, doing?
What is DNS? Introduction to Domain Name System – System Design 06
Introduction to System Design Components Building Block – System Design 05
Back-of-the-envelope Back-of-the-envelope Calculation – System Design 04
Non-functional features of software design – System Design 03
Application of abstraction in system design – System Design 02
Introduction to Modern System Design - System Design 01