Sparse Coding

  • Sparse coding is a class of unsupervised methods for learning sets of over-complete bases to represent data efficiently. The aim of sparse coding is to find a set of basis vectors  such that an input vector  can be represented as a linear combination of these basis vectors:

  • The advantage of having an over-complete basis is that our basis vectors are better able to capture structures and patterns inherent in the input data 
  • However, with an over-complete basis, the coefficients are no longer uniquely determined by the input vector. Therefore, in sparse coding, we introduce the additional criterion of sparsity to resolve the degeneracy introduced by over-completeness.
  • The sparse coding cost function is defined on a set of m input vectors as: 

    where is a sparsity function which penalizes  for being far from zero. We can interpret the first term of the sparse coding objective as a reconstruction term which tries to force the algorithm to provide a good representation of , and the second term as a sparsity penalty which forces our representation of  (i.e., the learned features) to be sparse. The constant  is a scaling constant to determine the relative importance of these two contributions.

  • Although the most direct measure of sparsity is the  norm, it is non-differentiable and difficult to optimize in general. In practice, common choices for the sparsity cost  are the penalty and the log sparsity
  • It is also possible to make the sparsity penalty arbitrary small by scaling down  and scaling up  by some large constant. To prevent this from happening, we will constrain  to be less than some constant . The full sparse coding cost function hence is:


    where the constant is usually set

  • One problem is that the constraint cannot be forced using simple gradient-based methods. Hence, in practice, this constraint is weakened to a "weight decay" term designed to keep the entries of  small:
  • Another problem is that the L1 norm is not differentiable at 0, and hence poses a problem for gradient-based methods. We will "smooth out" the L1 norm using an approximation which will allow us to use gradient descent. To "smooth out" the L1 norm, we use  in place of , where is a "smoothing parameter" which can also be interpreted as a sort of "sparsity parameter" (to see this, observe that when is large compared to , the is dominated by , and taking the square root yield approximately .
  • Hence, the final objective function is:
  • The set of basis vectors are called "dictionary" ().  is "adapted" to if it can represent it with a few basis vectors, that is, there exists a sparse vector in  such that . We call  the sparse code. It is illustrated as follows: 

Learning

  • Learning a set of basis vectors using sparse coding consists of performing two separate optimizations (i.e., alternative optimization method):

    • The first being an optimization over coefficients  for each training example
    • The second being an optimization over basis vectors across many training examples at once.
  • However, the classical optimization alternates between D and  can achieve good results, but very slow.
  • A significant limitation of sparse coding is that even after a set of basis vectors have been learnt, in order to "encode" a new data example, optimization must be performed to obtain the required coefficients. This significant "runtime" cost means that sparse coding is computationally expensive to implement even at test time, especially compared to typical feed-forward architectures.

Remarks

  • From my view, due to the sparseness enforced in the dictionary learning (i.e., sparse code), the restored matrix is able to remove noise of original matrix, i.e., having some effect of denoising. Hence, Sparse coding could be used to denoise images.

References

最新文章

  1. MVC、MVP、MVVM、Angular.js、Knockout.js、Backbone.js、React.js、Ember.js、Avalon.js、Vue.js 概念摘录
  2. CentOS评估磁盘I/O性能读写极限测试
  3. QQ摄像头读取条码
  4. 全新重装win8.1系统后 配置开发及办公环境步骤
  5. AppCan相关网站
  6. 一个简单的以User权限启动外部应用程序(用NetUserAdd函数和USER_INFO_1结构体动态添加用户,然后用CreateProcessWithLogonW启动程序)
  7. mysql免安装版使用
  8. POJ3080 Blue Jeans
  9. 设计模式14---设计模式之命令模式(Command)(行为型)
  10. 获取中央气象台API 完整城市列表简单方式
  11. iOS7 UIKit动力学-碰撞特性UICollisionBehavior 上
  12. 《JAVA程序设计》_第七周学习总结
  13. JavaScript--图片放大镜
  14. docx httpheader头设置
  15. oracle--Tracing PL/SQL subprogram calls with parameters values--Mahmoud Hatem,
  16. 获取进程ID,父进程ID,进程完整路径
  17. MySQL性能优化方法三:索引优化
  18. 升级Xcode10报错问题修复
  19. Centos7 Zabbix添加主机、图形、触发器
  20. Java JVM技术

热门文章

  1. BZOJ 2882: 工艺( 后缀自动机 )
  2. c++ 容器、继承层次、句柄类
  3. bootstrap基础知识点YI
  4. Introduction to REST #Reprinted#
  5. Windows Phone 8初学者开发—第12部分:改进视图模型和示例数据
  6. Linux的环境变量总结
  7. VS2010/MFC对话框一:创建对话框模板和修改对话框属性
  8. 查看一个int数组里边的每个数字出现过几次
  9. USACO Hamming Codes DFS 构造
  10. DW8051调试终结