Nvidia's Pascal to use stacked memory, proprietary NVLink interconnect

by Scott Wasson — 6:50 PM on March 25, 2014

GTC — Today during his opening keynote at the Nvidia GPU Technology Conference, CEO Jen-Hsun Huang offered an update to Nvidia's GPU roadmap. The big reveal was about a GPU code-named Pascal, which will be a generation beyond the still-being-introduced Maxwell architecture in the firm's plans.

Pascal's primary innovation will be the integration of stacked "3D" memory situated on the same substrate with the GPU, providing substantially higher bandwidth than traditional DRAMs mounted on the same circuit board.

If all of this info sounds more than a little familiar, perhaps you'll recall that Nvidia also announced a future, post-Maxwell GPU at GTC 2013. It was code-named Volta and was also slated to feature stacked memory on package. So what happened?

Turns out Volta remains on the roadmap, but it comes after Pascal and will evidently include more extensive changes to Nvidia's core GPU architecture.

Nvidia has inserted Pascal into its plans in order to take advantage of stacked memory and other innovations sooner. (I'm not sure we can say that Volta has been delayed, since the firm never pinned down that GPU's projected release date.) That makes Pascal intriguing even though its SM will be based on a modified version of the one from Maxwell. Memory bandwidth has long been one of the primary constraints for GPU performance, and bringing DRAM onto the same substrate opens up the possibility of substantial performance gains.

The picture above includes a single benchmark result, as projected for Pascal, in the bandwidth-intensive SGEMM matrix multiplication test. As you can see, Pascal nearly triples the performance of today's Kepler GPUs and nearly doubles the throughput of the upcoming Maxwell chips. This comparison is made at the same power level for each GPU, so Pascal should also represent a nice increase in energy efficiency.

Compared to today's GPU memory subsystems, Huang claimed Pascal's 3D memory will offer "many times" the bandwidth, two and a half times the capacity, and four times the energy efficiency. The Pascal chip itself will not participate in the 3D stacking, but it will have DRAM stacks situated around it on the same package. Those DRAM stacks will be of the HBM type being developed at Hynix. You can see the DRAM stacks cuddled up next to the GPU in the picture of the Pascal test module below.

The other item of note in Pascal's feature set is a new, proprietary chip-to-chip interconnect known as NVLink. This interconnect is a higher-bandwidth alternative to PCI Express 3.0 that Nvidia claims will be substantially more power-efficient. In many ways, NVLink looks very similar to PCI Express. It uses differential signaling with an embedded clock, and it will support the PCI Express programming model, including "DMA+", so driver support should be straightforward. Nvidia expects NVLink to act as a GPU-to-GPU connection and, in some cases, as a GPU-to-CPU link. To that end, the second generation of NVLink will be capable of maintaining cache coherency between multiple chips.

NVLink was created chiefly for use in supercomputing clusters and other enterprise-class deployments where many GPUs may be installed into a single server. Interestingly, as part of today's announcements, IBM revealed that it will incorporate NVLink into future CPUs. We don't have any details yet about which CPUs or what proportion of the Power CPU lineup will use NVLink, though.

Huang claimed NVLink will offer five to 12 times the bandwidth of PCIe. That may be a bit of CEO math. The first generation of NVLink will feature eight lanes per block or "brick" of connectivity. Each of those lanes will be capable of transporting 20Gbps of data, so the aggregate bandwidth of a brick should be 20GB/s. By contrast, PCIe 3.0 transfers 8Gbps per lane and 8GB/s across eight lanes, and the still-in-the-works PCIe 4.0 standard is targeting double that rate.

NVLink apparently gets some of its added bandwidth by imposing stricter limits on trace lengths across the motherboard, and the company says it has made a "fundamental breakthrough" in energy efficiency, resulting from Nvidia's own research, that differentiates NVLink from PCIe. NVLink will not be an open standard, though, so we may not be seeing a public airing of the entire spec.

The module pictured above will be the basic building block of many solutions based on the Pascal GPU. Each module has two "bricks" of NVLink connectivity onboard, and the board will connect to the host system via a mezzanine-style NVLink connector. The combination of connector and NVLink protocol should allow for some nice, dense, and high-integrity server systems built around Nvidia GPUs—and it will also ensure that those systems can only play host to Nvidia silicon. This proprietary hook is surely another motivation for the creation of NVLink, at the end of the day.

Huang said he wants the Pascal module to be the future of not just supercomputers but all sorts of visual computing systems, including gaming PCs. Mezzanine-style modules do have size and signal integrity advantages over traditional expansion cards with edge-based connectors. Another benefit of this module is additional power without auxiliary power cables. Nvidia's current Tesla GPUs draw between 225 and 300W, and the firm apparently expects to power them solely via the mezzanine connection to the module. We'll have to work to tease out exactly what Huang's statement means for future consumer PCs, but Nvidia admits it doesn't expect PCIe cards to be going away any time soon.

最新文章

  1. HBase如何选取split point
  2. Understanding ASP.NET MVC Filters and Attributes
  3. [转载] 为 Key-Value 数据库实现 MVCC 事务
  4. DataView usage combind with event and ViewModel From ERP-DEV
  5. 微软职位内部推荐-Senior SDE
  6. [转]MonkeyRunner在Windows下的Eclipse开发环境搭建步骤(兼解决网上Jython配置出错的问题)
  7. 「花田对」CSDN程序员专场——谁来拯救技术宅!_豆瓣
  8. 【剑指offer】面试题24:二叉搜索树的兴许前序遍历序列
  9. Composer常见问题
  10. UIButton的属性设置
  11. JavaScript内置对象-Array
  12. 用HTML5实现的各种排序算法的动画比較
  13. python读取excel时,数字自动转化为float
  14. 色彩转换——RGB & HSV
  15. Java内存区域与内存溢出异常(JVM学习系列1)
  16. event flow
  17. java 五十条数据分为一组
  18. 解决Floodlight界面无法显示问题
  19. 【Java】JavaIO(二)、节点流
  20. php 面试一般都遇到什么问题

热门文章

  1. PHP“Cannot use object of type stdClass as array” (php在调用json_decode从字符串对象生成json对象时的报错)
  2. UVa 11437:Triangle Fun(计算几何综合应用,求直线交点,向量运算,求三角形面积)
  3. Java Hour 31 Weather ( 4 )
  4. NotifyIcon 将窗口最小化到托盘
  5. javascript正则表达式速查
  6. SU suplane命令学习
  7. struts2总结四:Action与Form表单的交互
  8. bug记录
  9. windows 8 系统部署IIS并发布网站
  10. UVa10917 A Walk Through the Forest(SPFA+记忆化搜索)