Amazon says it recently achieved a major breakthrough in networking design—and has been quietly deploying the new technology in its data centers since late last year. The company claims it has significantly increased data speeds while reducing energy use, potentially giving the tech giant an edge as companies race to build ever-faster systems in the cloud.
The new technology hinges on a “quasi-random” design that combines elements of traditional, structured data networks with the performance advantages of more random architectures. Researchers have explored random networks for decades, but the technology has never been successfully scaled. Now, Amazon thinks it has cracked the code.
The fact that Amazon is using this in the real world is “remarkable,” says Brighten Godfrey, a computer science professor at the University of Illinois at Urbana-Champaign and an expert in networking, who was not involved in Amazon’s research. Godfrey co-authored a seminal 2012 paper on random network graphs, which he says are a “mind-bending problem to solve, in general.”
A team of engineers and researchers at Amazon Web Services, including several recruited from academia, has been working on the random networking problem since 2023. Amazon also designed a new piece of data center equipment it dubbed the ShuffleBox, which automatically shuffles the cables required for this kind of networking.
“By essentially flattening the network, we eliminated the bottlenecks that come with traditional networking designs,” Matt Rehder, vice president of AWS Network Engineering, said in an exclusive interview with WIRED. “We think we’re the only ones who have done this at scale.”
Network Effects
Amazon detailed its new networking design in a paper published last month titled “RNG: Flat Datacenter Networks at Scale.” RNG stands for “resilient network graphs,” which are neither entirely structured nor entirely random.
Interestingly, the Amazon team behind RNG isn’t making this networking pitch around generative AI. This is about making the company’s everyday data center architecture more efficient. “RNG is a great fit for our core demands, but AI training data patterns are far more coordinated and centrally orchestrated, so they don’t approximate a random graph,” Rehder says.
Since the mid-1980's, communications networks—from telecom to to data centers—have been predominantly designed with a “fat-tree” topology, which includes two or three vertical layers of switches and routers. These are connected by “fat” nodes at the top of the structure, where there are multiple routers of the same type, and thinner branches towards the bottom. Put very simply, in a fat-tree network, data moves up and down the stack. The increased bandwidth near the top of the structure, where the data bisects, helps eliminate bottlenecks.
Over time, the tech industry has developed and deployed variations on the fat-tree architecture. But the design has room for improvement. It’s generally reliable, but also rigid, inefficient, and requires complex cabling. As in, actual physical cables.
If you’ve ever been in a data center or an office building’s server room, you’ve likely seen nests of colorful cables spilling out of metal racks. Cabling is one of the greatest costs in networking, Rehder says, and Amazon’s global data centers are currently connected with 20 million kilometers of fiber optic cables. That’s roughly the distance it would take to travel from Earth to the moon and back 25 times.
In 2012, as the demand for cloud computing services was exploding, a group of researchers at University of Illinois Urbana-Champaign, including Godfrey, introduced a concept known as Jellyfish. Fixed network designs in use at the time were struggling to meet growing demand, so the researchers proposed a “high-capacity network interconnect which, by adopting a random graph topology, yields itself naturally to incremental expansion.” They believed this random approach could be more efficient and scalable than networks built using the fat-tree architecture.
“We gave it the name Jellyfish because it’s fluid,” Godfrey says. “You can connect the routers and switches randomly and it becomes this flexible pool of network capacity, which is very efficient.”
However, Jellyfish also introduced new challenges in layout, data routing, and cabling. Routing in random graphs is trickier, Godfrey says, because there are many more and diversified paths that data can take from its source to its destination. Cabling is harder because the endpoints of the cables are chosen randomly.
A couple of years later, Google began toying with another solution: It started integrating optical circuit switching, or OCS, into its network designs. This approach uses tiny mirrors to reflect light from an input port to an output port, which lets Google refigure optical cabling in real-time. But, again: This adds a certain amount of engineering complexity, as well as cost.
So Random
Amazon, meanwhile, was searching for the “holy grail,” says Giacomo Bernardi, who is one of the lead authors on the new paper, along with Amazon Scholars Ratul Mahajan and C.S. Seshandhri. In an ideal world, a data network would be flat and efficient, resilient to hardware failures, random enough to maximize performance, and scalable enough to grow without becoming unwieldy. It would also rely on simpler, streamlined cabling rather than increasingly complex fiber-optic systems.
When he and his colleagues began trying to build such a network, Bernardi says he had already become obsessed with Penrose tiling, a kind of aperiodic tiling named after the British physicist Roger Penrose. (Other researchers have been so inspired by Penrose tilings that they’ve tried to translate the patterns into error-correcting code in quantum computers.) Bernardi wondered if Amazon could use a similar construction and create a flat “mesh” by following a repeating pattern. He and his team tried building a simulation of what that might look like.
It didn’t work. Penrose tiling was promising on paper, Bernardi says, but the simulated data network was unreliable, and the researchers didn’t achieve the efficiency gains they had hoped for. They achieved better outcomes, he realized, when they replaced the more structured parts of the network design with randomness. “We ‘embraced the chaos,’ and adopted a quasi-random approach,” Bernardi explains.
A critical component of this design is the ShuffleBox, a new optical device Amazon developed that mixes connections between routers internally. During a short tour of one of Amazon’s networking labs in Cupertino, WIRED was able to view the chaotic bundle of cables running between routers in a traditional fat-tree structure, and compare them side-by-side with the tidy waves of cables being run through ShuffleBoxes in the new design.
Rehder says Amazon’s RNG design has made the company’s data centers both more efficient and more resilient. Compared to traditional networks, he claims it uses 69 percent fewer routers and switches, delivers 33 percent higher data throughput, cuts network power consumption by 40 percent, and lowers operating costs by 27 percent.
The first instance of RNG was unleashed in a Dublin data center in 2024. Amazon then expanded the technology to data centers in Germany and Spain. The company says that now, most newly-built data centers are being outfitted with the RNG networking protocol.



.gif)

