10/08/2023
GPU cho AI được bán giá 46.000 USD trên eBay
H100, chip mạnh nhất của Nvidia dùng để huấn luyện AI, đang được rao giá 46.000 USD (một tỷ đồng) trên eBay trong bối cảnh nhu cầu tăng vọt.
Theo John Carmack, cựu giám đốc tư vấn công nghệ của Meta, một số chip đồ họa Nvidia H100 đang được rao bán trên nền tảng eBay với giá 39.995-46.000 USD. Trong khi đó, mức giá được một số công ty bán lẻ đề xuất trước đó là 30.000-36.000 USD.
H100 là chip AI mới và mạnh nhất của Nvidia, cũng là bản kế nhiệm của A100 - có giá 10.000 USD và được mệnh danh là "ngựa thồ" của ngành trí tuệ nhân tạo, với 80 tỷ bóng bán dẫn bên trong!
Nvidia Reveals Hopper H100 GPU With 80 Billion Transistors
Today, at its GPU Technology Conference (GTC), Nvidia revealed details of its Hopper architecture and the Nvidia H100 GPU. We've known Nvidia has been working on next-generation GPUs for some time, but now we have some concrete specs. The Hopper architecture and H100 GPU are not to be confused with Ada, the consumer-focused architecture that will power future GeForce cards. Nvidia hasn't revealed any details on Ada yet, and Hopper H100 will supersede the Ampere A100, which itself replaced the Volta V100. These are all datacenter parts, and with steeper competition from the likes of AMD's Instinct MI250/250X and the newly announced Instinct MI210, Nvidia is looking to retake the lead in HPC.
As you'd expect given its legacy, H100 was designed for supercomputers with a focus on AI capabilities. It includes numerous updates and upgrades compared to the current A100, all designs to reach new levels of performance and efficiency. Hopper packs in 80 billion transistors, and it's built using a custom TSMC 4N process — that's for 4nm Nvidia, not to be confused with the generic N4 4nm process that TSMC also offers. For those keeping score, the A100 GPU 'only' had 54 billion transistors.
Nvidia didn't reveal core counts or clocks, but it did give some other details. H100 supports Nvidia's fourth generation NVLink interface, which can deliver up to 900 GB/s of bandwidth. It also supports PCIe 5.0 for systems that don't use NVLink, which tops out at 128 GB/s. The updated NVLink connection provides 1.5X more bandwidth than the A100, while PCIe 5.0 delivers double the bandwidth of PCIe 4.0.
The H100 will also support 80GB of HBM3 memory by default, with 3 TB/s of bandwidth — that's 1.5X faster than the A100's HBM2E. While the A100 was available in 40GB and 80GB models, with the latter coming later in the life cycle, both the H100 and A100 still use up to six HBM stacks, apparently with one stack disabled (i.e., using a dummy stack). Generally speaking, the H100 has 50% more memory and interface bandwidth compared to its predecessor.
That's a nice improvement, to be sure, but other aspects of Hopper involve even larger increases. H100 can deliver up to 2,000 TFLOPS of FP16 compute and 1,000 TFLOPS of TF32 compute, as well as 60 TFLOPS of general purpose FP64 compute — that's triple the performance of the A100 in all three cases. Hopper also adds improved FP8 support with up to 4,000 TFLOPS of compute, six times faster than the A100 (which had to rely on FP16 as it lacked native FP8 support). To help optimize performance, Nvidia also has a new transformer engine that will automatically switch between FP8 and FP16 formats, based on the workload.
Nvidia will also add new DPX instructions that are designed to accelerate dynamic programming. These can help with a broad range of algorithms, including route optimization and genomics, and Nvidia claims performance in these algorithms is up to 7X faster than its previous generation GPUs, and up to 40X faster than CPU-based algorithms. Hopper also includes changes to improve security, and the multi-instance GPU (MIG) now allows for seven secure tenants running on a single H100 GPU.
All of these changes are important for Nvidia's supercomputing and AI goals. However, the changes aren't all for the better. Despite the shift to a smaller manufacturing node, the H100 TDP for the SXM variant has been increased to 700W, compared to 400W for the A100 SXM modules. That's 75% more power, for improvements that seem to range between 50% and 500%, depending on the workload. In general, we expect performance will be two to three times faster than the Nvidia A100, so there should still be a net improvement in efficiency, but it's further evidence of the slowing down of Moore's Law.
Overall, Nvidia claims the H100 scales better than A100, and can deliver up to 9X more throughput in AI training. It also delivers 16X to 30X more inference performance using Megatron 530B throughput as a benchmark. Finally, in HPC apps like 3D FFT (fast Fourier transform) and genome sequencing, Nvidia says H100 is up to 7X faster than A100.