X
LLM Inference
vs. NVIDIA H100 Tensor Core GPU
vs. NVIDIA H100 Tensor Core GPU
X
LLM Training
vs. NVIDIA H100 Tensor Core GPU
vs. NVIDIA H100 Tensor Core GPU
X
Energy Efficiency
vs. NVIDIA H100 Tensor Core GPU
vs. NVIDIA H100 Tensor Core GPU
X
Data Processing
vs. NVIDIA H100 Tensor Core GPU
vs. NVIDIA H100 Tensor Core GPU
LLM inference and energy efficiency: TTL = 50 milliseconds (ms) real time, FTL = 5s, 32,768 input/1,024 output, NVIDIA HGX™ H100 scaled over InfiniBand (IB) vs. GB200 NVL72, training 1.8T MOE 4096x HGX H100 scaled over IB vs. 456x GB200 NVL72 scaled over IB. Cluster size: 32,768. A database join and aggregation workload with Snappy / Deflate compression derived from TPC-H Q4 query. Custom query implementations for x86, H100 single GPU and single GPU from GB200 NLV72 vs. Intel Xeon 8480+ Projected performance subject to change.
| GB200 NVL72 | GB200 Grace Blackwell Superchip | |
|---|---|---|
| Configuration | 36 Grace CPU : 72 Blackwell GPUs | 1 Grace CPU : 2 Blackwell GPU |
| FP4 Tensor Core2 | 1,440 PFLOPS | 40 PFLOPS |
| FP8/FP6 Tensor Core2 | 720 PFLOPS | 20 PFLOPS |
| INT8 Tensor Core² | 720 POPS | 20 POPS |
| FP16/BF16 Tensor Core² | 360 PFLOPS | 10 PFLOPS |
| TF32 Tensor Core² | 180 PFLOPS | 5 PFLOPS |
| FP32 | 6,480 TFLOPS | 180 TFLOPS |
| FP64 | 3,240 TFLOPS | 90 TFLOPS |
| FP64 Tensor Core | 3,240 TFLOPS | 90 TFLOPS |
| GPU Memory | Bandwidth | Up to 13.5 TB HBM3e | 576 TB/s | Up to 384 GB HBM3e | 16 TB/S |
| NVLink Bandwidth | 130TB/s | 3.6TB/s |
| CPU Core Count | 2,592 Arm® Neoverse V2 cores | 72 Arm Neoverse V2 cores |
| LPDDR5X Memory | Bandwidth | Up to 17 TB LPDDR5X | Up to 18.4 TB/s | Up to 480GB LPDDR5X | Up to 512 GB/s |
