▶ 纹理内存访问补充(见纹理内存博客 http://www.cnblogs.com/cuancuancuanhao/p/7809713.html)

▶ 计算能力

● 不同计算能力的硬件对计算特性的支持。

  

● 不同计算能力的硬件技术特性(重要)。

  

● 浮点运算技术标准描述(原文)

■ All compute devices follow the IEEE 754-2008 standard for binary floating-point arithmetic with the following deviations:

  There is no dynamically configurable rounding mode; however, most of the operations support multiple IEEE rounding modes, exposed via device intrinsics;

  There is no mechanism for detecting that a floating-point exception has occurred and all operations behave as if the IEEE-754 exceptions are always masked, and deliver the masked response as defined by IEEE-754 if there is an exceptional event; for the same reason, while SNaN encodings are supported, they are not signaling and are handled as quiet;

  The result of a single-precision floating-point operation involving one or more input NaNs is the quiet NaN of bit pattern 0x7fffffff;

  Double-precision floating-point absolute value and negation are not compliant with IEEE-754 with respect to NaNs; these are passed through unchanged;

Code must be compiled with -ftz=false, -prec-div=true, and -prec-sqrt=true to ensure IEEE compliance (this is the default setting; see the nvcc user manual for description of these compilation flags).

Regardless of the setting of the compiler flag -ftz,

Atomic single-precision floating-point adds on global memory always operate in flush-to-zero mode, i.e., behave equivalent to FADD.F32.FTZ.RN,
    Atomic single-precision floating-point adds on shared memory always operate with denormal support, i.e., behave equivalent to FADD.F32.RN.

■ IEEE-754R 标准,CUDA 设备上函数 fminf(),fmin(),fmaxf(),fmax() 在有且仅有一个参数为NaN时,返回值将是另一个参数。

■ 浮点数转化为整数,且超出整形格式边界时,IEEE-754 称为未定义,而 CUDA 设备将该数映射到整形格式边界(改格式能表示的最大或最小的数)。

■ 整数除以零或整数溢出时,IEEE-754 称为未定义,而 CUDA 设备将抛出一个非特定的值。

● 各代 CC 设备的新特性介绍

■ 调整共享内存和 L1 缓存的平衡

 // driver_types.h
enum __device_builtin__ cudaFuncCache
{
cudaFuncCachePreferNone = , // 默认缓存模式
cudaFuncCachePreferShared = , // 扩大共享内存缩小 L1 缓存
cudaFuncCachePreferL1 = , // 扩大 L1 缓存缩小共享内存
cudaFuncCachePreferEqual = // 等量大小的共享内存和 L1 缓存
}; // cuda_runtime_api.h
extern __host__ cudaError_t CUDARTAPI cudaDeviceSetCacheConfig(enum cudaFuncCache cacheConfig);

最新文章

  1. 分析MariaDB初始化脚本mysql_install_db
  2. __cdecl 、__fastcall、__stdcall
  3. 树莓派启用root账户
  4. linux常用命令之--用户与用户组管理命令
  5. java 容器类大集结
  6. STK 10.1.3
  7. java算法之冒泡排序法
  8. Effective C++_笔记_条款00_基本术语
  9. Struts1项目转成Struts2项目步奏
  10. Java多线程JUC
  11. linux IPtable防火墙 禁止和开放端口
  12. reStructuredText语法简单说明
  13. 为什么学习linux
  14. Python Socket 通信
  15. Web Deploy
  16. 【BZOJ】4430: [Nwerc2015]Guessing Camels赌骆驼
  17. Webstorm2016使用技巧——SVN插件使用(svnToolBox)
  18. 【五】MongoDB管理之生产环境说明
  19. tarjan求割点
  20. pandas小结

热门文章

  1. Future接口和FutureTask类【FutureTask实现了Runnable和Future接口】
  2. CTF-练习平台-Misc之 中国菜刀,不再web里?
  3. 【BZOJ1703】【usaco2007margold】ranking the cows 奶牛的魅力排名
  4. 【vue】Mac上安装Node和NPM
  5. Django FBV和CBV -
  6. nginx反向代理解决跨域问题
  7. MySqli 执行多条SQL语句
  8. json_encode 中文 null
  9. java判断字符串中是否含有中文
  10. Java-Web中访问某个指定工程中的文件,报错后发现访问的文件是另一个工程里面的文件