Nvidia unveils new kind of Ethernet for AI, Grace Hopper ‘Superchip’ in full production

Nvidia CEO Jensen Huang showed off the first iteration of Spectrum-X, the Spectrum-4 chip, with one hundred billion transistors in a 90-millimeter by 90-millimeter die.


Nvidia CEO Jensen Huang, offering the opening keynote of the Computex computer technology conference, on Monday in Taipei, Taiwan, unveiled a host of new products, including a new kind of ethernet switch dedicated to moving high-volumes of data for artificial intelligence tasks. 

“How do we introduce a new ethernet, that is backward compatible with everything, to turn every data center into a generative AI data center?” posed Huang in his keynote. 
”For the very first time we are bringing the capabilities of high performance computing into the ethernet market,” said Huang. 

The Spectrum-X, as the family of ethernet is known, is “the world’s first high-performance ethernet for AI,” according to Nvidia. A key feature of the technology is that it “doesn’t drop packets,” said Gilad Shainer, the senior vice president of networking, in a media briefing. 

The first iteration of Spectrum-X is Spectrum-4, said Nvidia, which it called “the world’s first 51Tb/sec Ethernet switch built specifically for AI networks.” The switch works in conjunction with Nvidia’s BlueField data processing unit, or DPU, chips that handle data fetching and queueing, and Nvidia fiber-optic transceivers. The switch can route 128 ports of 400-gigabit ethernet, or 64 800-gig ports, from end to end, the company said.

Huang held up the silver Spectrum-4 ethernet switch chip on stage, noting that it’s “gigantic,” consisting of a hundred billion transistors on a 90-millimeter by 90-millimeter die built with Taiwan Semiconductor Manufacturing’s “4N” process technology. The part runs at 500 watts, said Huang. 


“For the very first time we are bringing the capabilities of high performance computing into the ethernet market,” said Huang. 



Spectrum-4 is the first in a line of Spectrum-X chips that are a new kind of ethernet purpose-built to provide lossless packet transmission for AI workloads.


Nvidia’s chip, the switch housing it, has the potential to change the ethernet networking market. The vast majority of switch silicon is supplied by chip maker Broadcom. Those switches are sold to networking equipment makers Cisco Systems, Arista Networks, Extreme Networks, Juniper Networks, and others. Those companies have been expanding their equipment to better handle AI traffic. 

The Spectrum-X family is built to address the bifurcation of data centers into two forms. One is what he called “AI factories,” facilities that cost hundreds of millions of dollars for the most powerful GPUs, based on Nvidia’s NVLink and Infiniband, used for AI training, serving a small number of very large workloads.

The other data center facility is AI cloud, which is multi-tenant, based on ethernet, and handles hundreds and hundreds of workloads for customers simultaneously, and which is focused on things such as serving up the predictions to consumers of AI, which will be served by the Spectrum-X. 

The Spectrum-X, said VP Shainer, is able to “spread traffic across the network in the best way,” said Shainer, using “a new mechanism for congestion control,” that averts a pile-up of packets that can happen in the memory buffer of network routers.

“We use advanced telemetry to understand latencies across the network to identify hotspots before they cause anything, to keep it congestion-free.”

Nvidia said in prepared remarks that “the world’s top hyperscalers are adopting NVIDIA Spectrum-X, including industry-leading cloud innovators.”

Nvidia is building a test-bed computer, it said, at its Israel offices, called Israel-1, a “generative AI supercomputer,” using Dell PowerEdge XE9680 servers composed of H100 GPUs running data across the Spectrum-4 switches.

All the news at Computex is available in Nvidia’s newsroom.

In addition to the new ethernet technology, Huang’s keynote featured a new model in the company’s “DGX” series of computers for AI, the DGX GH200, which the company bills as “a new class of large-memory AI supercomputer for giant generative AI models.” 

Generative AI refers to programs that produce more than a score, sometimes being text, sometimes being images, other artifacts, as, for example, OpenAI’s ChatGPT bot.

The GH200 is the first system to ship with what the company calls its “superchip,” the Grace Hopper board, which contains on a single circuit board a Hopper GPU, and the Grace CPU, a CPU based on ARM instruction set that is meant to compete with x86 CPUs from Intel and Advanced Micro Devices.


Nvidia’s Grace Hopper “superchip,” a board containing its Grace CPU, left, and Hopper GPU, is now in full production, the company said.


The first iteration of Grace Hopper, the GH200, is “in full production,” said Huang. Nvidia said in a press release that “global hyperscalers and supercomputing centers in Europe and the U.S. are among several customers that will have access to GH200-powered systems.”

The DGX GH200 combines 256 of the superchips, said Nvidia, to achieve a combined 1 exaflops — ten to the power of 18, or, one billion, billion floating point operations per second — utilizing 144 terabytes of shared memory. The computer is 500 times as fast as the original DGX A100 machine released in 2020, according to Nvidia.

The keynote also unveiled MGX, a reference architecture for system makers to quickly and cost-effectively build 100+ server variations. The first partners to use the spec are ASRock Rack, ASUS, GIGABYTE, Pegatron, QCT and Supermicro, with QCT and Supermicro to be first to market with systems, in August, said Nvidia.

The entire keynote can be seen as a replay from the Nvidia Web site.


MGX is a reference architecture for computer system makers to quickly and cost-effectively build over 100 server variations using Nvidia chips. QCT and Supermicro will be the first to market with systems, in August, said Nvidia.


Source link

Leave a comment