结合这个图示来看:https://dl.dropboxusercontent.com/u/32077444/nsight.pdf

1) The bars you see in the Summary Page of the Profiler represent the % bottlenecked that unit was for the selected draw call(s). This gives you a feel for which part of the pipeline to go after for optimization opportunities, rather than just trying things and seeing if the FPS changes. So, in your case, you are showing ~75-80%, which means you can try and improve your shader source and that should help the performance of the 5 draw calls in your selected Draw Call Group. Note that a unit doesn’t have to be a 100% bottleneck for it to be worth investigating for changes. Even if it is a bottleneck 10% of the time it still prevented you from achieving the optimal throughput for a given call, so if you are, say, 20% texture bound you can still investigate the standard optimizations like filtering and mipmapping to see how it impacts perf.

2) The gaps in the Frame Timings graph are sometimes uncontrollable. It can be helpful to run an analysis session on your frames to get a feel for how full your command buffers might be and what might cause the gap (such as resource uploads, etc.). We don’t really give out more details in that screen and without a repro it is hard to tell exactly what caused the gap.

3) You asked about the 3 timing values in the Frame Timings and what might be considered “good”. The 3 values represent 2 ways to measure the draw call timing and 1 calculated value:

a.EPC/Empty Pipeline Cost: This is measuring each draw call, one at a time, as it flows from the top to the bottom of the pipe. We add a flush before and after each call so you can consider this an absolute cost for the draw call, not taking anything else like pipeline width, resource contention (both positive and negative), etc. into account. This is helpful to know how much each draw call costs in isolation.

b. FPC/Full Pipeline Cost: We measure this value with all draw calls in flight but bookended by pipeline reports that give us the start time for each draw call (first vertex being processed) to the end (last fragment being retired to the frame buffer). This means that any resource contention such as hitting the texture unit and either warming or dirtying the cache, having so many threads around that the shader units are fully occupied and cannot start on new work, is all taken into account. This gives you a “real world” cost for every draw call.

c. IDC/Incremental Draw Cost: This is a calculated value that takes into account any overlap you might see in draw calls. Say you have 2 identical draw calls, each one basically takes up ½ of the full pipeline width. Each one’s EPC and FPC are likely to be very close, but if they only take up ½ of the width the incremental or additional cost of that second might actually be 0…it is able to be executed fully in parallel with the first call. So, the FPS would be the same, 1 draw or 2, and the IDC would be full cost for the first call and 0 for the second.

4) On the Memory Screen, you asked if there was a breakdown per shader or draw request. This is what we have the state buckets for. By pressing the button on the tool bar, you can group draw calls be shared state (in this case you can say the shader in question) and then you will see the stats for just those draw calls. You can also do it based on performance markers, so you can group them pretty much however you want.

5) On the Memory Screen, you asked if the 330k was read or write and it is the sum. We don’t yet break out read vs write but could consider it for a future enhancement.

6) Your other question on the Memory screen was what the 3.6GB of bandwidth between L2 and Memory was and that is the number of bytes written. I must confess that I am puzzled by the number because it should be basically the sum of write operations that go through the L2 and most of them should come via the Framebuffer unit. If I can get access to your app it would help me understand if we have a bug there or just a number that isn’t reported.

7) On the Bottleneck screen, you asked about drilling into the shader bottleneck information. We don’t currently support this but it is a feature that we have considered and already laid some of the ground work for in our CUDA tools. I will add you to the list of requestors for that capability.

8) You asked how the Framebuffer could be a bottleneck if rendering a full screen quad and that is because in NVIDIA language, the Framebuffer represents basically the memory controller. All requests for memory, from the blending unit, texture unit, shader, etc. all go through the Framebuffer unit. Are you doing lots of lookups in draw call 116?

9) Utilization is generally trying to show you how much of the available horsepower you used for the amount of time the draw call took. To gain details I would need to know what your workload was and possibly sample additional data, but it is possible the shader unit is underutilized because it was bottlenecked waiting for data inside of the shader unit, like L1 values to return, local memory, or other resource contention.

最新文章

  1. 源码包---linux软件安装与管理
  2. windows环境下搭建react native环境
  3. BZOJ1093 [ZJOI2007]最大半连通子图
  4. Linux学习笔记(20) Linux系统管理
  5. javascript 定时器
  6. LightOJ 1047-Program C
  7. poj1019_Number_Sequence
  8. 【转】JAVA字符串格式化-String.format()的使用--不错
  9. 二、Hadoop学习笔记————架构学习
  10. Windows驱动开发工具 WDK 学习笔记(1)
  11. Java虚拟机判定对象存活算法
  12. apt下载open-jdk8报错add-apt-repository: command not found
  13. 微信小程序页面带参数跳转
  14. PAT B1020
  15. OpenGL 获取当前屏幕坐标的三维坐标(gluUnProject使用例子 VS+glut)
  16. elastic客户端TransportClient的使用
  17. selinux介绍/状态查看/开启/关闭
  18. Java基础-虚拟内存之映射字节缓冲区(MappedByteBuffer)
  19. HBase常用指令
  20. Oracle中分页查询和联表查询

热门文章

  1. centos6.5 安装vlc播放器【超简单】
  2. bufferknife框架的正确使用方式 -终于他么知道了
  3. Dijkstra算法解决单源最短路径
  4. 2018 ACM-ICPC 徐州网络赛
  5. CentOS7部署l2tp/IPsec服务
  6. 在小程序开发中使用 npm
  7. [BZOJ 4117] Weather Report
  8. 【分块】bzoj2957 楼房重建
  9. Problem D: 零起点学算法24——判断奇偶数
  10. 13南理工test01:进制转化