• What features of GPUs allow them to perform computations faster than a typical CPU?

    GPUs have a massively parallel processing architecture consisting of thousands of smaller, more efficient cores designed to handle multiple tasks simultaneously. It uses the CUDA(Compute Unified Device Architecture) technology to connect those internal processors together and become a thread processor to solve data-intensive calculations. Each processor can exchange, sync and share the data. GPUs have a parallel stream architecture that focuses on executing a large number of concurrent threads at a slower speed rather than executing a single thread rapidly. Whereas, the CPU just consists of several cores optimized for serial processing, does not have a strong capability in parallel processing.

- What is the biggest limiting factor for training large models with current generation GPUs?

Training large models mean the data size is huge. The GPU memory capacity is the biggest limiting factor for training large models. The memory capacity limiting factor prevents GPU form handling terabyte-scale data. Due to limited by the bandwidth and latency of the PCIe bus, once the data size is bigger than the capacity of the GPU memory, the performance decreases significantly as the data transfers to the device become the primary bottleneck.

  • GPU 一个core的结构是-->SM(streaming multiprocessor )-->多个SP(streaming processor )->shared memory, 一个SM里共享内存。如果是SIMT(单指令多线程)多处理器,它以一个可伸缩的多线程流处理器(Streaming Multiprocessors,SMs)阵列为中心实现了MIMD(多指令多数据)的异步并行机制,其中每个多处理器(multiprocessor

    )包含多个标量处理器(Scalar Processor,SP),线程结构是grid-->block-->thread,每个线程有个local memory, 通过global memory, constant memory 和 texture memory和CPU共享内存。所以多个显卡是没办法共享内存的,而且global memory是一种很慢的方式。多显卡间也可以交换内存,但是速度就慢了,违反了GPU设计的初衷。

  • deep learning 里一般限制训练效率的是显存大小而不是流处理单元个数?

    这个好像很难说,GPU的设计就是SIMD,单指令多数据流。简化指令,数据流更多,通过SIMT,实现MIMD,此时SP就要处理指令和任务,GPU进行并行计算,也就是很多个SP同时做处理。你说它少了,也会影响效率。但总的来说还是显存大小更重要。

最新文章

  1. webpack 填坑指南
  2. 三维等值面提取算法(Dual Contouring)
  3. HTML DOM Document
  4. c# 相对路径的一些资料
  5. python built-in delattr()
  6. C/C++ 活动预处理器
  7. 【系统移植】kernel分析
  8. java7-3 继承
  9. mybatis系列-08-动态sql
  10. 5个经典的JavaScript面试基础问题
  11. scala学习笔记:函数与方法
  12. C++ ComboBox基础
  13. 关appid
  14. 使用cx_Freeze 将python3代码打包成.exe程序
  15. dubbo的InvocationChain
  16. Spring 基于构造函数的依赖注入
  17. Cas 服务器 为Service定义UI
  18. URL重写中的中文参数问题
  19. C和C指针小记(二)-注释,三字母词,编译选项
  20. Qt读取TXT文件时,GBK与UTF-8编码判断

热门文章

  1. Java复习整理 Day02
  2. Eureka详解系列(三)--探索Eureka强大的配置体系
  3. for循环实现的一些小例子
  4. httprunner(6)配置信息config
  5. AtCoder Beginner Contest 163
  6. [Python] Pandas 对数据进行查找、替换、筛选、排序、重复值和缺失值处理
  7. Docker配置文件deamon.json详解
  8. Zabbix 微信监控报警
  9. 深入剖析JavaScript中的对象与原始值之间的转换机制
  10. 【GitChat首秀:限时免费】互联网测试岗校招的那些事儿