The unstable gradient problem: The fundamental problem here isn't so much the vanishing gradient problem or the exploding gradient problem. It's that the gradient in early layers is the product of terms from all the later layers. When there are many layers, that's an intrinsically unstable situation. The only way all layers can learn at close to the same speed is if all those products of terms come close to balancing out. Without some mechanism or underlying reason for that balancing to occur, it's highly unlikely to happen simply by chance. In short, the real problem here is that neural networks suffer from an unstable gradient problem. As a result, if we use standard gradient-based learning techniques, different layers in the network will tend to learn at wildly different speeds.

Again, early hidden layers learn much more slowly than later hidden layers. In this case, the first hidden layer is learning roughly 100 times slower than the final hidden layer. No wonder we were having trouble training these networks earlier!

We have here an important observation: in at least some deep neural networks, the gradient tends to get smaller as we move backward through the hidden layers. This means that neurons in the earlier layers learn much more slowly than neurons in later layers. And while we've seen this in just a single network, there are fundamental reasons why this happens in many neural networks. The phenomenon is known as the vanishing gradient problem**See Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, by Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber (2001). This paper studied recurrent neural nets, but the essential phenomenon is the same as in the feedforward networks we are studying. See also Sepp Hochreiter's earlier Diploma Thesis,Untersuchungen zu dynamischen neuronalen Netzen (1991, in German)..

Why does the vanishing gradient problem occur? Are there ways we can avoid it? And how should we deal with it in training deep neural networks? In fact, we'll learn shortly that it's not inevitable, although the alternative is not very attractive, either: sometimes the gradient gets much larger in earlier layers! This is the exploding gradient problem, and it's not much better news than the vanishing gradient problem. More generally, it turns out that the gradient in deep neural networks is unstable, tending to either explode or vanish in earlier layers. This instability is a fundamental problem for gradient-based learning in deep neural networks. It's something we need to understand, and, if possible, take steps to address.

One response to vanishing (or unstable) gradients is to wonder if they're really such a problem. Momentarily stepping away from neural nets, imagine we were trying to numerically minimize a function f(x)f(x) of a single variable. Wouldn't it be good news if the derivative f′(x)f′(x) was small? Wouldn't that mean we were already near an extremum? In a similar way, might the small gradient in early layers of a deep network mean that we don't need to do much adjustment of the weights and biases?

http://neuralnetworksanddeeplearning.com/chap5.html

Design a Gradient Logo Illustrator Tutorial - YouTube

最新文章

  1. iOS CALayer应用详解(2)
  2. sizeof、strlen、字符串、数组,整到一块,你还清楚吗?
  3. python和nodejs的aes128加密对比
  4. 【noiOJ】p7939
  5. PHP Static Self 的区别
  6. android开发 drawtext的开始坐标位置
  7. 最简单的jdbc程序
  8. Canvas入门(2):图形渐变和图像形变换
  9. JavaScript 作用域和变量提升
  10. cursor 与refcursor及sys_refcursor的区别 (转载)
  11. params关键字载入空值的陷阱
  12. SICP 习题 (1.10)解题总结
  13. 经纬Zhang英拉垫背的企业家VC没有到这种地步这么卑鄙
  14. 分享一个c#写的开源分布式消息队列equeue
  15. 利用MyEclipse连接数据库并自动生成基于注解或者XML的实体类
  16. redis入门指南-附录A
  17. Theano学习-梯度计算
  18. [Swift]LeetCode964. 表示数字的最少运算符 | Least Operators to Express Number
  19. JS学习笔记Day10
  20. angular.identity()

热门文章

  1. etcd的原理分析
  2. 新装系统(CentOS7.4)环境初始化配置笔记
  3. 自己定义控件-LinearListView
  4. Android使用TextView,设置onClick属性无效解决的方法
  5. MyEclipse导入Hibernate出现Path must include project and resource;/project name
  6. How to Configure an SSIS Package to Access a Web Service using WCF
  7. python——双下划线与python命名机制
  8. OpenCV入门学习(三)HistogramEquivalent
  9. Atitit.mysql 5.0 5.5  5.6 5.7  新特性 新功能
  10. iOS中文输入法多次触发的问题及解决方案