sparse autoencoder的一个实例练习,这个例子所要实现的内容大概如下:从给定的很多张自然图片中截取出大小为8*8的小patches图片共10000张,现在需要用sparse autoencoder的方法训练出一个隐含层网络所学习到的特征。该网络共有3层,输入层是64个节点,隐含层是25个节点,输出层当然也是64个节点了。

main函数,  分五步走,每个函数的实现细节在下边都列出了。

 %%======================================================================
%% STEP : Here we provide the relevant parameters values that will
% allow your sparse autoencoder to get good filters; you do not need to
% change the parameters below. visibleSize = *; % number of input units
hiddenSize = ; % number of hidden units
sparsityParam = 0.01; % desired average activation of the hidden units.
% (This was denoted by the Greek alphabet rho,
% which looks like a lower-case "p",
% in the lecture notes).
lambda = 0.0001; % weight decay parameter
beta = ; % weight of sparsity penalty term %%======================================================================
%% STEP : Implement sampleIMAGES
%
% After implementing sampleIMAGES, the display_network command should
% display a random sample of patches from the dataset
patches = sampleIMAGES;
display_network(patches(:,randi(size(patches,),,)),); % Obtain random parameters theta
theta = initializeParameters(hiddenSize, visibleSize); %%======================================================================
%% STEP : Implement sparseAutoencoderCost
%
% You can implement all of the components (squared error cost, weight decay term,
% sparsity penalty) in the cost function at once, but it may be easier to do
% it step-by-step and run gradient checking (see STEP ) after each step. We
% suggest implementing the sparseAutoencoderCost function using the following steps:
%
% (a) Implement forward propagation in your neural network, and implement the
% squared error term of the cost function. Implement backpropagation to
% compute the derivatives. Then (using lambda=beta=), run Gradient Checking
% to verify that the calculations corresponding to the squared error cost
% term are correct.
%
% (b) Add in the weight decay term (in both the cost function and the derivative
% calculations), then re-run Gradient Checking to verify correctness.
%
% (c) Add in the sparsity penalty term, then re-run Gradient Checking to
% verify correctness.
%
% Feel free to change the training settings when debugging your
% code. (For example, reducing the training set size or
% number of hidden units may make your code run faster; and setting beta
% and/or lambda to zero may be helpful for debugging.) However, in your
% final submission of the visualized weights, please use parameters we
% gave in Step above. [cost, grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...
lambda,sparsityParam, beta, patches); %%======================================================================
%% STEP : Gradient Checking
%
% Hint: If you are debugging your code, performing gradient checking on smaller models
% and smaller training sets (e.g., using only training examples and - hidden
% units) may speed things up. % First, lets make sure your numerical gradient computation is correct for a
% simple function. After you have implemented computeNumericalGradient.m,
% run the following:
checkNumericalGradient(); % Now we can use it to check your cost function and derivative calculations
% for the sparse autoencoder.
numgrad = computeNumericalGradient( @(x) sparseAutoencoderCost(x, visibleSize, ...
hiddenSize, lambda,sparsityParam, beta, patches), theta); % Use this to visually compare the gradients side by side
disp([numgrad grad]); % Compare numerically computed gradients with the ones obtained from backpropagation
diff = norm(numgrad-grad)/norm(numgrad+grad);
disp(diff); % Should be small. In our implementation, these values are
% usually less than 1e-.
% When you got this working, Congratulations!!! %%======================================================================
%% STEP : After verifying that your implementation of
% sparseAutoencoderCost is correct, You can start training your sparse
% autoencoder with minFunc (L-BFGS). % Randomly initialize the parameters
theta = initializeParameters(hiddenSize, visibleSize); % Use minFunc to minimize the function
addpath minFunc/
options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost
% function. Generally, for minFunc to work, you
% need a function pointer with two outputs: the
% function value and the gradient. In our problem,
% sparseAutoencoderCost.m satisfies this.
options.maxIter = ; % Maximum number of iterations of L-BFGS to run
options.display = 'on';
[opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p,visibleSize, hiddenSize, ...
lambda, sparsityParam, beta, patches),theta, options);
%%======================================================================
%% STEP : Visualization W1 = reshape(opttheta(:hiddenSize*visibleSize), hiddenSize, visibleSize);
display_network(W1', 12); print -djpeg weights.jpg % save the visualization to a file %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 对应step1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%三个函数(sampleIMAGES)(normalizeData)(initializeParameters)%%%%
function patches = sampleIMAGES()
load IMAGES; % 加载初始的10张512*512大图片 patchsize = ; % 采样大小
numpatches = ; % 初始化该矩阵为0,该矩阵为 *10000维每一列为一张图片.
patches = zeros(patchsize*patchsize, numpatches); % IMAGES 为一个包含10 张images的三维数组,IMAGES(:,:,) 是一个第六张图片的 512x512 的二维数组,
% 命令 "imagesc(IMAGES(:,:,6)), colormap gray;" 可以把第六张图可视化.
% 这几张图是经过whiteing预处理的?
% IMAGES(:,:,) 就是从第一张图采样得到的(,) to (,) 的小patchs %在每张图片中随机选取1000个patch,共10000个patch
for imageNum = :
[rowNum colNum] = size(IMAGES(:,:,imageNum));
%实现每张图片选取1000个patch
for patchNum = :
%得到左上角的两个点
xPos = randi([,rowNum-patchsize+]);
yPos = randi([, colNum-patchsize+]);
%填充到矩阵里
patches(:,(imageNum-)*+patchNum) = ...
reshape(IMAGES(xPos:xPos+,yPos:yPos+,imageNum),,);
end
end
%由于autoencoder的激励函数是sigmod函数,输出值限定在[,],故为了达到H W,b(x)= x,x作为输入,
%也要限定在0-1之间,故需要进行正则化
patches = normalizeData(patches);
end % 正则化的函数,不太明白s-sigma法则?
function patches = normalizeData(patches)
% 减去均值
patches = bsxfun(@minus, patches, mean(patches));
% s = std(X),此处X是一个矢量,该函数返回标准偏差(注意其分母为n-,而不是n) 。
% 结果s是一个X各样本偏差无偏估计的平方根(X包含独立的、同分布样本)。
% 如果X是一个矩阵,该函数返回一个行矢量,它包含了X每列元素的标准偏差。
pstd = * std(patches(:));
patches = max(min(patches, pstd), -pstd) / pstd;
% 重新压缩 从[-,] 到 [0.1,0.9]
patches = (patches + ) * 0.4 + 0.1;
end %首先初始化参数
function theta = initializeParameters(hiddenSize, visibleSize)
% Initialize parameters randomly based on layer sizes.
% we'll choose weights uniformly from the interval [-r, r]
r = sqrt() / sqrt(hiddenSize+visibleSize+);
%rand(a,b)产生均匀分布的随机矩阵维度为a*b,元素取值范围0. ~1.0。
W1 = rand(hiddenSize, visibleSize) * * r - r;
%rand(a,b)**r即取值范围为(-2r), rand(a,b)**r -r即取值范围为(-r - r)
W2 = rand(visibleSize, hiddenSize) * * r - r;
b1 = zeros(hiddenSize, ); %连接到hidden unit的偏置单元
b2 = zeros(visibleSize, ); %链接到output layer的偏置单元
% 将矩阵合并为一个向量
theta = [W1(:) ; W2(:) ; b1(:) ; b2(:)];
%初始化参数结束
end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 对应step %%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%返回稀疏损失函数的值与梯度值%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...
lambda, sparsityParam, beta, data)
% visibleSize: 输入层单元数
% hiddenSize: 隐藏单元数
% lambda: 正则项
% sparsityParam: (p)指定的平均激活度p
% beta: 稀疏权重项B
% data: 64x10000 的矩阵为training data,data(:,i) 是第i个训练样例.
% 把参数拼接为一个向量,因为采用L-BFGS优化,L-BFGS要求的就是向量.
% 将长向量转换成每一层的权值矩阵和偏置向量值
% theta向量的的 ->hiddenSize*visibleSize,W1共hiddenSize*visibleSize 个元素,重新作为矩阵
W1 = reshape(theta(:hiddenSize*visibleSize), hiddenSize, visibleSize); %类似以上一直往后放
W2 = reshape(theta(hiddenSize*visibleSize+:*hiddenSize*visibleSize), visibleSize, hiddenSize);
b1 = theta(*hiddenSize*visibleSize+:*hiddenSize*visibleSize+hiddenSize);
b2 = theta(*hiddenSize*visibleSize+hiddenSize+:end); % 参数对应的梯度矩阵 ;
cost = ;
W1grad = zeros(size(W1));
W2grad = zeros(size(W2));
b1grad = zeros(size(b1));
b2grad = zeros(size(b2)); Jcost = ; %直接误差
Jweight = ;%权值惩罚
Jsparse = ;%稀疏性惩罚
[n m] = size(data); %m为样本的个数,n为样本的特征数 %前向算法计算各神经网络节点的线性组合值和active值
%W1为 hiddenSize*visibleSize的矩阵
%data为 visibleSize* trainexampleNum的矩阵
%remat(b1,,m)把向量b1复制扩展为hiddenSize*m列
% 根据公式 Z^(l) = z^(l-)*W^(l-)+b^(l-)
%z2保存的是10000个样本下隐藏层的输入,为hiddenSize*m维的矩阵,每一列代表一次输入
z2= W1*data + remat(b1,,m);%第二层的输入
a2 = sigmoid(z2); %对z2取sigmod 即得到a2,即隐藏层的输出
z3 = W2*a2+repmat(b2,,m); %output layer 的输入
a3 = sigmoid(z3); %output 层的输出 % 计算预测产生的误差
%对应J(W,b), 外边的sum是对所有样本求和,里边的sum是对输出层的所有分量求和
Jcost = (0.5/m)*sum(sum((a3-data).^));
%计算权值惩罚项 正则化项,并没有带正则项参数
Jweight = (/)*(sum(sum(W1.^))+sum(sum(W2.^)));
%计算稀疏性规则项 sum(matrix,)是进行按行求和运算,即所有样本在隐层的输出累加求均值
% rho为一个hiddenSize* 维的向量 rho = (/m).*sum(a2,);%求出隐含层输出aj的平均值向量 rho为hiddenSize维的
%求稀疏项的损失
Jsparse = sum(sparsityParam.*log(sparsityParam./rho)+(-sparsityParam).*log((-sparsityParam)./(-rho)));
%损失函数的总表达式 损失项 + 正则化项 + 稀疏项
cost = Jcost + lambda*Jweight + beta*Jsparse;
%计算l = 即 output-layer层的误差dleta3,因为在autoencoder中输入等于输出h(W,b)=x
delta3 = -(data-a3).*sigmoidInv(z3);
%因为加入了稀疏规则项,所以计算偏导时需要引入该项,sterm为稀疏项,为hiddenSize维的向量
sterm = beta*(-sparsityParam./rho+(-sparsityParam)./(-rho))
% W2 为64*25的矩阵,d3为第三层的输出为64*10000的矩阵,每一列为每个样本x^(i)的输出,W2'为W2的转置
% repmat(sterm,,m)会把函数复制扩展为m列的矩阵,每一列都为sterm向量。
% d2为hiddenSize*10000的矩阵
delta2 = (W2'*delta3+repmat(sterm,1,m)).*sigmoidInv(z2); %计算W1grad
% data'为10000*64的矩阵 d2*data' 位25*64的矩阵
W1grad = W1grad+delta2*data';
W1grad = (/m)*W1grad+lambda*W1; %计算W2grad
W2grad = W2grad+delta3*a2';
W2grad = (/m).*W2grad+lambda*W2; %计算b1grad
b1grad = b1grad+sum(delta2,);
b1grad = (/m)*b1grad;%注意b的偏导是一个向量,所以这里应该把每一行的值累加起来 %计算b2grad
b2grad = b2grad+sum(delta3,);
b2grad = (/m)*b2grad;
%计算完成重新转为向量
grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];
end %-------------------------------------------------------------------
% Here's an implementation of the sigmoid function, which you may find useful
% in your computation of the costs and the gradients. This inputs a (row or
% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). function sigm = sigmoid(x)
sigm = ./ ( + exp(-x));
end %sigmoid函数的导函数
function sigmInv = sigmoidInv(x)
sigmInv = sigmoid(x).*(-sigmoid(x));
end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 对应step %%%%%%%%%%%%%%%%%%%%%%%%%%%%
%三个函数:(checkNumericalGradient)(simpleQuadraticFunction)(computeNumericalGradient)
function [] = checkNumericalGradient()
x = [; ];
%当前简单函数实际的值与实际的导函数
[value, grad] = simpleQuadraticFunction(x);
% 在点 x 处计算简单函数的梯度,("@simpleQuadraticFunction" denotes a pointer to a function.)
numgrad = computeNumericalGradient(@simpleQuadraticFunction, x);
% disp()等价于 print()
disp([numgrad grad]);
fprintf('The above two columns you get should be very similar.\n(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n');
% norm 等价于 sqrt(sum(X.^)); 如果实现正确,设置 EPSILON = 0.0001,误差应该为2.1452e-
diff = norm(numgrad-grad)/norm(numgrad+grad);
disp(diff);
fprintf('Norm of the difference between numerical and analytical gradient (should be < 1e-9)\n\n');
end %这个简单函数用来检验写的computeNumericalGradient函数的正确性
function [value,grad] = simpleQuadraticFunction(x)
% this function accepts a 2D vector as input.
% Its outputs are:
% value: h(x1, x2) = x1^ + *x1*x2
% grad: A 2x1 vector that gives the partial derivatives of h with respect to x1 and x2
% Note that when we pass @simpleQuadraticFunction(x) to computeNumericalGradients, we're assuming
% that computeNumericalGradients will use only the first returned value of this function.
value = x()^ + *x()*x();
grad = zeros(, );
grad() = *x() + *x();
grad() = *x();
end %梯度检验的函数
function numgrad = computeNumericalGradient(J, theta)
% theta: 参数,向量或者实数均可
% J: 输出值为实数的函数. 调用y = J(theta)将会返回函数在theta处的值 % numgrad初始化为0,与theta维度相同
numgrad = zeros(size(theta));
EPSILON = 1e-;
% theta是一个行向量,size(theta,)是求行数
n = size(theta,);
%产生一个维度为n的单位矩阵
E = eye(n);
for i = :n
% (n,:)代表第n行,所有的列
% (:,n)代表所有行,第n列
% 由于E是单位矩阵,所以只有第i行第i列的元素变为EPSILON
delta = E(:,i)*EPSILON;
%向量第i维度的值
numgrad(i) = (J(theta+delta)-J(theta-delta))/(EPSILON*2.0);
end
%% --------------------------------------------------------------- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 对应step %%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%关于函数的展示%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [h, array] = display_network(A, opt_normalize, opt_graycolor, cols, opt_colmajor)
% This function visualizes filters in matrix A. Each column of A is a
% filter. We will reshape each column into a square image and visualizes
% on each cell of the visualization panel.
% All other parameters are optional, usually you do not need to worry
% about it.
% opt_normalize: whether we need to normalize the filter so that all of
% them can have similar contrast. Default value is true.
% opt_graycolor: whether we use gray as the heat map. Default is true.
% cols: how many columns are there in the display. Default value is the
% squareroot of the number of columns in A.
% opt_colmajor: you can switch convention to row major for A. In that
% case, each row of A is a filter. Default value is false.
warning off all if ~exist('opt_normalize', 'var') || isempty(opt_normalize)
opt_normalize= true;
end if ~exist('opt_graycolor', 'var') || isempty(opt_graycolor)
opt_graycolor= true;
end if ~exist('opt_colmajor', 'var') || isempty(opt_colmajor)
opt_colmajor = false;
end % rescale
A = A - mean(A(:)); if opt_graycolor, colormap(gray); end % compute rows, cols
[L M]=size(A);
sz=sqrt(L);
buf=;
if ~exist('cols', 'var')
if floor(sqrt(M))^ ~= M
n=ceil(sqrt(M));
while mod(M, n)~= && n<1.2*sqrt(M), n=n+; end
m=ceil(M/n);
else
n=sqrt(M);
m=n;
end
else
n = cols;
m = ceil(M/n);
end array=-ones(buf+m*(sz+buf),buf+n*(sz+buf)); if ~opt_graycolor
array = 0.1.* array;
end if ~opt_colmajor
k=;
for i=:m
for j=:n
if k>M,
continue;
end
clim=max(abs(A(:,k)));
if opt_normalize
array(buf+(i-)*(sz+buf)+(:sz),buf+(j-)*(sz+buf)+(:sz))=reshape(A(:,k),sz,sz)/clim;
else
array(buf+(i-)*(sz+buf)+(:sz),buf+(j-)*(sz+buf)+(:sz))=reshape(A(:,k),sz,sz)/max(abs(A(:)));
end
k=k+;
end
end
else
k=;
for j=:n
for i=:m
if k>M,
continue;
end
clim=max(abs(A(:,k)));
if opt_normalize
array(buf+(i-)*(sz+buf)+(:sz),buf+(j-)*(sz+buf)+(:sz))=reshape(A(:,k),sz,sz)/clim;
else
array(buf+(i-)*(sz+buf)+(:sz),buf+(j-)*(sz+buf)+(:sz))=reshape(A(:,k),sz,sz);
end
k=k+;
end
end
end if opt_graycolor
h=imagesc(array,'EraseMode','none',[- ]);
else
h=imagesc(array,'EraseMode','none',[- ]);
end
axis image off drawnow; warning on all

最新文章

  1. 如何删除已安装的Windows服务
  2. js词法作用域规则
  3. android 中int 和 String 互相转换的多种方法
  4. debian 学习记录-1 -安装
  5. Android 开发第三天
  6. codeforces 603C. Lieges of Legendre sg函数
  7. iOS ,呼叫捕获抛出勉未知方法的障碍
  8. java集合(1)
  9. 前端知识之jQuery
  10. 微信小程序底部tabbar
  11. 关于QT Graphics View开启OpenGL渲染后复选框、微调框等无法正常显示的问题
  12. windows程序设计 Unicode和多字节
  13. TabLayout+ViewPager的简单使用
  14. MVC与WebApi中的异常统一处理
  15. [编程] TCP协议概述
  16. nodejs 监控代码变动实现ftp上传
  17. 查看firefox浏览器 驱动geckodriver.exe文件的版本号的方法,以及下载链接
  18. UnicodeDammit
  19. Mysql delete操作
  20. kafka-spark streaming (一)

热门文章

  1. springMVC--4种映射处理器handlerMapping
  2. JavaScript 隐式类型转换之:加号+
  3. Java锁的选择
  4. Ribbon 使用入门
  5. 外同步信号检测---verilog---状态机
  6. sql server 存储过程使用游标记录
  7. WebGL和ThreeJs学习5--ThreeJS基本功能控件
  8. 纯css3实现文字间歇滚动效果
  9. 自定义抛出throw 对象练习
  10. 测试教程网.unittest教程.7. 各种断言方法