How DeepSeek Built Its A.I. on a Tight Budget
When DeepSeek, a Chinese start-up, announced that it had developed one of the most powerful artificial intelligence (A.I.) systems using significantly fewer computer resources than other industry giants, it made waves in the tech world. DeepSeek achieved this by building an A.I. system comparable to those that require over 16,000 chips, using only 2,000. This breakthrough demonstrates that cutting-edge technology doesn’t always need substantial financial investment, which has important implications for the future of A.I. development.
Typically, A.I. systems require vast amounts of computing power to process large data sets. These systems are largely built on neural networks, which are mathematical structures that learn by analyzing data. Over the past decade, these networks have become increasingly resource-hungry, relying heavily on specialized chips such as GPUs (graphics processing units). Initially designed for rendering graphics in video games, GPUs have become essential for running A.I. algorithms. However, these chips are expensive, often costing upwards of $40,000, and they consume a large amount of electricity.
DeepSeek, however, found a way to sidestep these substantial costs. Rather than relying on a single, massive neural network, the company adopted a technique called “mixture of experts,” which divides the workload among several smaller, specialized neural networks. Each “expert” system focuses on a specific domain, such as biology, programming, or poetry. By pairing these specialized networks with a generalist system that coordinates the overall process, DeepSeek created an efficient yet powerful architecture. This method is akin to how a news editor manages a team of reporters, each with their own area of expertise, ensuring that they work in harmony.
But DeepSeek didn’t stop there with its cost-saving innovations. The company also applied a mathematical shortcut to reduce memory usage during computations, reminiscent of the basic math techniques we learn in school. By halving the memory allocated for each number in calculations, DeepSeek compromised on precision but maintained enough accuracy for its neural network to function effectively. This allowed the system to run on fewer resources without sacrificing performance.
DeepSeek also utilized a technique that stretched the final multiplication results across more memory space after compressing the initial data, which made the final outputs more precise. While this might seem like a straightforward approach, it required an advanced understanding of both hardware and software to fully optimize GPU performance.
Despite these innovative breakthroughs, DeepSeek’s journey was not without its challenges. The development of these methods involved extensive experimentation and risk-taking, with millions of dollars invested in testing and refinement. As Tim Dettmers, an A.I. researcher at the Allen Institute for Artificial Intelligence, pointed out, A.I. innovation is inherently risky, and many are hesitant to invest in unproven approaches due to the potential for failure.
The $6 million that DeepSeek spent was focused solely on the final training of its system, with additional funds allocated for research and experimentation. While this amount may seem modest compared to other A.I. projects, DeepSeek’s ability to innovate on such a budget offers valuable lessons for the future of A.I. development. By sharing its methods with the broader community, DeepSeek has the potential to dramatically reduce the cost of creating advanced A.I. systems, making the technology more accessible to startups and researchers with limited resources.