DeepSeek Is Making Waves Worldwide—Here’s Why
The Chinese AI company DeepSeek has made a major impact on the tech industry by launching highly efficient AI models that rival advanced offerings from US firms like OpenAI and Anthropic.
Established in 2023, DeepSeek has reached its milestones using significantly less funding and computing power than its rivals.
Last week, the company unveiled its “reasoning” R1 model, sparking enthusiasm among researchers, surprising investors, and prompting reactions from major AI players. On January 28, DeepSeek took things further by introducing a model capable of processing both images and text.
So, what exactly has DeepSeek accomplished, and how did it achieve it?
In December, DeepSeek introduced its V3 model, a highly capable large language model that rivals OpenAI’s GPT-4o and Anthropic’s Claude 3.5 in performance.
Like other models, V3 can make mistakes or generate incorrect information, but it excels at tasks such as answering questions, writing essays, and producing computer code. In tests of problem-solving and mathematical reasoning, it has outperformed the average human in some cases.
Training V3 reportedly cost around $5.58 million—significantly less than GPT-4, which required over $100 million to develop.
DeepSeek claims to have trained V3 using around 2,000 specialized H800 GPUs from NVIDIA—far fewer than some competitors, which have reportedly used up to 16,000 of the more powerful H100 chips.
On January 20, the company introduced R1, a “reasoning” model designed to tackle complex problems step by step. These models excel at tasks requiring contextual understanding and interconnected reasoning, such as reading comprehension and strategic planning.
R1 is an enhanced version of V3, refined through reinforcement learning. Its performance appears comparable to OpenAI’s o1, released last year. DeepSeek also applied the same technique to create “reasoning” versions of smaller open-source models that can run on personal computers.
DeepSeek’s Impact
This release has fueled intense interest in DeepSeek, boosting the popularity of its V3-powered chatbot app and causing a dramatic shake-up in the tech market. Investor reactions have led to a sharp decline in stock prices, with NVIDIA losing approximately $600 billion in market value at the time of writing.
DeepSeek’s key innovation lies in improving efficiency—achieving strong performance with fewer resources. The company has introduced two groundbreaking techniques that could influence AI research more broadly.
The first involves a mathematical concept known as “sparsity.” AI models contain vast numbers of parameters (V3 has about 671 billion), but only a small portion is used for any given input. Identifying which parameters are necessary is challenging, but DeepSeek developed a novel method to predict and train only the relevant ones, significantly reducing the required training resources.
Improved Data Storage and Compression in V3
The second breakthrough relates to how V3 manages data storage in computer memory. DeepSeek has devised an efficient compression technique that makes storing and retrieving essential information faster and more effective.
DeepSeek has released its models and techniques under the open MIT License, allowing anyone to download, modify, and use them freely.
While this move could challenge AI companies reliant on proprietary models for profit, it is a major win for the broader AI research community.
Currently, AI research often demands immense computing power, limiting the ability of university-based researchers and those outside major tech firms to conduct experiments. However, DeepSeek’s efficiency-focused methods could lower these barriers, making experimentation and development more accessible.
For consumers, AI access may also become more affordable. More models could run directly on personal devices like laptops and smartphones, reducing reliance on cloud-based services with subscription fees.
For well-funded research teams, greater efficiency may not be as transformative. It remains to be seen whether DeepSeek’s approach will lead to AI models with superior overall performance or simply ones that require fewer resources to train and run.
Read the original article on: Science Alert
Read more: China’s DeepSeek Shakes Up the AI Industry, Becoming a Trillion-Dollar Game-Changer Overnight
Leave a Reply