Why China's DeepSeek AI model matters

Making Silicon Valley look silly...

31 January 2025

Interview with

Mike Wooldridge, University of Oxford

Part of the show Asteroid Bennu's brine, and DeepSeek shocks Silicon Valley

WALL STREET.jpg

Credit:

CC0 Via Pixabay

Play Download

The emergence of a Chinese-made artificial intelligence model called DeepSeek has shaken the tech industry and the global markets, and bruised the egos of those in Silicon Valley and the White House. Even US president Donald Trump got in on the action, announcing this week that the Chinese DeepSeek AI platform had shaken Silicon Valley and knocked billions off the share prices of some of the world’s leading tech giants. But what exactly is DeepSeek? And why has it unsettled tech innovators and investors? Here’s Mike Wooldridge, professor of computer science at the University of Oxford, where he specialises in artificial intelligence…

Mike - DeepSeek is an example of what's called a large language model. And large language models are the AI programs that have taken the world by storm over the last few years. So the most famous of these is ChatGPT. And they're very, very large scale, very general artificial intelligence systems. And they were surprisingly powerful, which is why ChatGPT got so much attention when it was released. They were unexpectedly powerful. But their capability came at a cost. They're extraordinarily expensive to develop and require really hard to imagine computational resources to build. So we don't have precise figures. But a GPT class model to build would require something like 20,000 to 40,000 AI supercomputers running for months. And the cost of that runs into hundreds of millions of dollars. What that meant is that the number of organisations with those resources at hand to be able to do that is very, very small. It's basically the Silicon Valley giants and a very small number of state level actors. Now, DeepSeek come along, they announce a model which seems to be as capable as the current crop of large language models. It seems to be up there with the best of them. And yet the claim was that it was built for a tiny fraction of the cost of those models, which means that the advantage that Silicon Valley had and what's called the moat, you know, the thing which kept everybody else out of this market, looked like it might be evaporating. And that panicked Silicon Valley.

Chris - Do their claims stack up, not just in terms of performance, but their claim that they can do this so cheaply? Do you think that that was a bit of fanfare to get them traction? Or is it the reality?

Mike - They have, it seems, done it comparatively cheaply, but not so cheaply that we're all going to be building large language models in the shed at the end of our garden. They've told us that they used a network of a couple of thousand of GPUs, so much smaller than the kind of scale of supercomputers that we use to build ChatGPT and the like. But if you dig into the statistics, they required three million hours of processing. So AI supercomputer, a GPU running for three million hours. Now you can only do that if you have thousands of those running in parallel. Now the cost of their cluster, I did some back of the envelope calculations. It looks like something like 50 to $60 million to buy those GPUs and three million GPU hours at $2 an hour. The running cost of those amounts to $6 million. So that's where they get that headline figure on. But still that scale means, for example, my research group has one GPU and my students fight over getting access to it. There's no way we could do that. There's no way we could replicate that. So they haven't exactly given this technology to the masses, but nevertheless, it looks like it's a real advance.

Chris - There were claims made though, that this was being done using lower grade chips than the industry class. And that's always been a sort of political thing, hasn't it? Because China has and some other actors have been prevented from having access to the sorts of technology that would enable them to make these very high end computer chips. So they're using lower grade architectures. Is that the case or have they just sneakily got hold of what the world thought they didn't have? Or have they made a genuine breakthrough in being able to get the kind of computing powers that you're saying they have from lesser materials?

Mike - So there are export regulations in particular from the US about top end silicon processors going to China. So indeed, it looks like they've made some advances in the core architecture, what's called the transformer architecture, which is the neural network architecture that's underneath ChatGPT and Gemini and Claude and all of these large language models. And it looks like they've been able to optimise that architecture. So one of the key ideas, for example, is an idea called a mixture of experts approach. And the idea that instead of having one big, very, very clever neural network, you break it down into a bunch of smaller networks, which is much more efficient. Now, this is not a new idea. It's been around for a while, but it seems like they might have made it work. And that's quite an interesting development.

Chris - What about the quality and integrity of what it generates? Because there was concern expressed in a number of different quarters that when you ask it certain questions about certain things that China has sensitivities about, it doesn't answer the question.

Mike - Yeah. So first thing to say is, we're going to need time to evaluate the model. I think teams all over the world are busy doing that. But at first blush, it does indeed look like this is a model which is in the same territory as the other GPT class models. Around the censorship issue, I think it's pretty well documented now that it appears that the model is censored in terms of some of the answers it gives. My advice to anybody using any large language model, wherever it comes from, is do not tell that model anything that you're not comfortable with your neighbours knowing or the world knowing about. Because you don't really know where that data is going to end up. Don't tell it about your relationship problems or complain about your boss or anything like that. Because really, the text that you give, you just don't know where that's going to end up. Obviously, in this case, there are additional sensitivities, because the model comes from China. And so that advice, I think, is particularly relevant.

Chris - Yeah, if you look at the, I had a look at the terms and conditions and it says, the data are held on secure servers in China. And that is how you interpret that, isn't it? But are they basically creating a giant earpiece for the Chinese government with this?

Because any industry in China has board representation from the government, it has to. And therefore, anything within that company potentially can be accessed by the government, the government can just request it. So are they basically by getting people all over the world onto this, and then feeding it all kinds of things, they might not realise they need to be as careful as you're advocating for. It could be hoovering up all kinds of interesting things and reinforcing its knowledge about all of us.

Mike - Indeed. And I do think that's something that I think people need to be mindful of. That's something I think that governments have very quickly woken up to. So I mean, the advice, I think, is very, very clear. Do not tell any large language model, anything sensitive, private, that you wouldn't want publicly known. Just don't.

Chris - Have you downloaded DeepSeek? Would you use it?

Mike - No and no. We would use it for experiments, I think, with caution. We would evaluate it. I wouldn't use it on my desktop machine.