Dropout & Maxout

This is the 8th post of a series of posts I planned about a journal of myself studying deep learning in Professor Bhiksha Raj's course, deep learning lab. I decided to write these posts as notes of my learning process and I hope these posts can help the others with similar background. 
Back to Content Page
--------------------------------------------------------------------
PDF Version Available Here
--------------------------------------------------------------------
In the last post when we looked at the techniques for convolutional neural networks, we have mentioned dropout as a technique to control sparsity. Here let's look at the details of it and let's look at another similar technique called maxout. Again, these techniques are not constrained only to convolutional neural networks, but can be applied to almost any deep networks, or at least feedforward deep networks.

Dropout

Dropout is famous, and powerful, and simple. Despite the fact that dropout is widely used and very powerful, the idea is actually simple: randomly dropping out some of the units while training. One case can be showed as in the following figure.

Figure 1. An illustration of the idea of dropout

To state this a little more formally: one each training case, each hidden unit is randomly omitted from the network with a probability of p. One thing to notice though, the selected dropout units are different for each training instance, that's why this is more of a training problem, rather than an architecture problem.
As stated in the origin paper by Hilton et al, another view to look at dropout makes this solution interesting. Dropout can be seen as an efficient way to perform model averaging across a large number of different neural networks, where overfitting can be avoided with much less cost of computation.
Initially in the paper, dropout is discussed under p=0.5, but of course it could basically set up to any probability.  

Maxout

Maxout is an idea derived for dropout. It is simply an activation function that takes the max of input, but when it works with dropout, it can reinforce the properties dropout has: improve the accuracy of fast model averaging technique and facilitate optimization. 
Different from max-pooling, maxout is based on a whole hidden layer that is built upon the layer we are interested in, so it's more like a layerwise activation function. As stated by the original paper from Ian, with these hidden layers that only consider the max of input, the network remains the same power of universal approximation. The reasoning is not very different from what we did in the 3rd post of this series on universal approximation power.  
 
Despite of the fact that maxout is an idea that works derived on dropout and works better, maxout can only be implemented on feedforward neural networks like multi-layer perceptron or convolutional neural networks. In contrast, dropout is a fundamental idea, though simple, that can work for basically any networks. Dropout is more like the idea of bagging, both in the sense of bagging's ability to increase accuracy by model averaging, and in the sense of bagging's widely adaption that can be integrated with almost any machine learning algorithm. 
 
In this post we have talked about two simple and powerful ideas that can help to increase the accuracy with model averaging technique. In the next post, let's move back to the track of network architectures and start to talk generative models' network architecture. 
----------------------------------------------
If you find this helpful, please cite:
Wang, Haohan, and Bhiksha Raj. "A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas." arXiv preprint arXiv:1510.04781 (2015).
----------------------------------------------

By Haohan Wang
Note: I am still a student learning everything, there may be mistakes due to my limited knowledge. Please feel free to tell me wherever you find incorrect or uncomfortable with. Thank you.

Main Reference:

  1. Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012).
  2. Goodfellow, Ian J., et al. "Maxout networks." arXiv preprint arXiv:1302.4389 (2013).

最新文章

  1. Sencha, the nightmare!
  2. 深入浅出NodeJS——数据通信,NET模块运行机制
  3. September 17th 2016 Week 38th Saturday
  4. js练习-控制div属性
  5. xtrabackup 增量备份(InnoDB)
  6. 如何制作prezi swf格式字体(prezi 中文字体)
  7. [译]Redis大冒险
  8. PrintQueue
  9. 检测是否安装了 .NET Framework 3.5
  10. lostash 正则
  11. JavaScript之cookie
  12. Linux下使用javac编译
  13. [Hadoop] - 自定义Mapreduce InputFormat&OutputFormat
  14. 2017(2)数据库设计,数据库设计过程,ER模型,规范化理论
  15. change the version of python on my centos
  16. JSOUP如何优秀的下载JPEG等二进制图像
  17. 『TensorFlow』读书笔记_进阶卷积神经网络_分类cifar10_上
  18. JAVA WEB 三器之过滤器(Filter)
  19. 远程复制文件scp使用
  20. java Object解析

热门文章

  1. 初识Scrapy之再续火影情缘
  2. 搭建一个简单的FTP服务器
  3. Angular学习笔记【如何正确使用第三方组件】
  4. Nim && Grundy (基础博弈游戏 )
  5. 配置HEXO
  6. Siverlight5 3D 中文环境搭建
  7. jquery调用asp.net 页面后台的实现代码
  8. Java并发(五):并发,迭代器和容器
  9. jsp连接sqlite、Sqlite相对路径绝对路径问题(转)
  10. Vue2.0搭建脚手架(vue-cli)