How the Science of Burning Buildings Paves the Way to Advances in AI

Behind the Builders: Rajiv Mongia and Intel’s thermal team are pushing boundaries to keep the heat off Moore’s Law and big AI chips that are getting bigger and more powerful.

“AI is already wreaking havoc on global power systems,” headlined Bloomberg last summer. And it’s not slowing down.

A recent report by the International Energy Agency predicted that “Data centre electricity consumption is set to more than double to around 945 TWh by 2030,” driven by AI. That’s roughly adding the need for the power generated by five more of China’s Three Gorges Dams in the next five years.

The power is needed not only to run fleets of powerful computers but to keep them from overheating. GPUs and accelerators filling today’s AI data centers can each generate well over 1 kilowatt of heat. The only consumer devices you might encounter with that kind of power are literally heaters – space heaters, hair dryers, microwaves or turbo electric kettles.

And high-powered AI chips are going to keep getting bigger and hotter. “As we hit 1 trillion transistors on a GPU and two to three kilowatts of power by 2030,” says Intel’s Rajiv Mongia, “it’s going to be a lot of fun to solve the thermal problem.”

How do we solve this energy dilemma and unlock the future potential of AI? Mongia’s answer to his “fun problem” is to cool AI chips in ways that – and this is where it seems like alchemy – boost performance and save electricity at the same time.

A Career Keeping the Heat off Moore’s Law

Mongia is a senior principal engineer and leader of the Thermal Core Competency Group in Intel’s Assembly Test Technology Development (ATTD) in Intel Foundry. This team “makes sure thermals don’t come in the way of Moore’s Law,” he explains.

In other words, ATTD creates new ways to combine more and more silicon dies into faster and more capable packages for Intel and its foundry customers, and Mongia and team figure out how to manage the resulting heat.

Prior to his time at Intel, Mongia worked on small gas turbines (turning heat into electricity) and as a consultant in failure analysis (focused on fires and explosions), work that included study of the World Trade Center towers’ collapse on Sept. 11, 2001. “I decided I’d had enough death and destruction – I want to create something,” he reflects, and joined Intel initially to help make laptops more hospitable to laps.

He’s spent most of the past 22 years devoted to that keep-Moore’s-Law-cool mission, with detours to help build Intel RealSense cameras and to support Intel’s mid-2010s push into the maker market.

“I’ve been in almost every major thermal role at Intel in some way, shape or form,” Mongia says. It may seem an odd pivot to go from burning buildings to cooling chips, but “it’s the same equations – there are different boundary conditions, but it’s still fluid mechanics, thermodynamics and heat transfer.”

Mongia took those less-cool gigs since he thought “thermals was no longer that challenging to figure out. For me, it’s all about having an interesting problem to solve, and the ability to try to make a difference somewhere, somehow.”

The Next Thermal Challenge: Cooling Stacked, Multi-Chip Packages

Between the rise of AI and the increasing ubiquity of large multi-chip packages – where several or even dozens of silicon dies are combined into a single device – the challenge is back.

“Now it turns out that this power thermal problem is getting pretty complex,” Mongia says. “There’s a lot we can do here.”

The menu of solutions starts with baking thermal considerations earlier into chip design projects. “We’ve revamped our tool flow to do a lot of co-design work earlier,” Mongia explains, such as running nearly 100,000 thermal simulations every month.

His team developed what’s become the industry standard for modeling the heat characteristics of stacked high-bandwidth memory (HBM), and now it applies similar approaches to stacking all manner of chips. “Once you have multiple stacks and high power, it becomes all the more important to get the material thermal characteristics precisely figured out.”

In one recent example, the thermal squad rescued an Intel design win for a temperature-sensitive customer chip after the initial design came back too hot. In the space of two weeks, the cross-Intel team modeled hundreds of different design options, completely revamped the distribution of silicon intellectual property (IP) and the multi-chip layout, and ended up with a spec-beating design.

“What people forget is how interdependent everything is, from the silicon all the way through the system, to make sure you’re co-optimizing across that full spectrum,” Mongia notes. As counterintuitive as it sounds, “I could actually increase the power on a part and make it easier to cool.”

Swapping Metal Lids for Liquid-Cooled Plates ­– and More ‘Exotic’ Solutions

The rest of the thermals menu includes a growing suite of technologies to directly improve the act of cooling.

Say, for instance, replacing the standard heat spreader or metal lid that covers the silicon chips mounted on the package with something Mongia calls an “integrated cold plate” – basically a tiny radiator with dozens of internal fins inside and liquid running through it.

Early testing suggests a big GPU with an integrated cold plate can run as much as 20% cooler (and hence 15% faster) compared to one with a regular cold plate – a significant improvement that’s getting attention from potential foundry customers, Mongia says.

Beyond that, “it’s getting pretty exotic,” he says. “We’re looking into things like how to bring liquid into the silicon stack itself.” Imagine liquid not merely within the lid but swirling around inside the 3D silicon stack. How cool is that?

About Rajiv Mongia: Builder in Brief

Home site: Hillsboro, Oregon

Title: Senior principal engineer

Team: Thermal Core Competency Group, Assembly Test Technology Development

Years at Intel: 21

Most important skill: “Being comfortable with discomfort, and to challenge what’s made us successful in the past – to be brave enough to question the plan and the status quo.”

Go-to exercise or entertainment for relaxation: Spending half a weekend day taking pictures of wildlife or nightscapes.

Current book on the nightstand: “Einstein’s Fridge” by Paul Sen. “A wonderful history of the motivation and development of the field of thermodynamics from the industrial revolution to Stephen Hawking.”