http://www.stanford.edu/~boyd/papers/admm/lasso/lasso_example.html
http://www-stat.stanford.edu/~susan/courses/b494/index/node35.html
http://www-stat.stanford.edu/~susan/courses/s227/node5.html
http://www-stat.stanford.edu/~susan/courses/b494/index/node29.html
lasso
Regularized least-squares regression using lasso or elastic net algorithms
Syntax
B = lasso(X,Y)
[B,FitInfo] = lasso(X,Y)
[B,FitInfo] = lasso(X,Y,Name,Value)
Description
B = lasso(X,Y) returns fitted least-squares regression coefficients for a set of regularization coefficients Lambda.
[B,FitInfo] = lasso(X,Y) returns a structure containing information about the fits.
[B,FitInfo] = lasso(X,Y,Name,Value) fits regularized regressions with additional options specified by one or more Name,Value pair arguments.
X |
Numeric matrix with n rows and p columns. Each row represents one observation, and each column represents one predictor (variable). |
Y |
Numeric vector of length n, where n is the number of rows of X. Y(i) is the response to row i of X. |
Name-Value Pair Arguments
Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.
'Alpha' |
Scalar value from 0 to 1 (excluding 0) representing the weight of lasso (L1) versus ridge (L2) optimization. Alpha = ۱ represents lasso regression, Alpha close to 0 approaches ridge regression, and other values represent elastic net optimization. See Definitions.Default: 1 |
'CV' |
Method lasso uses to estimate mean squared error:
- K, a positive integer — lasso uses K-fold cross validation.
- cvp, a cvpartition object — lasso uses the cross-validation method expressed in cvp. You cannot use a 'leaveout' partition with lasso.
- 'resubstitution' — lasso uses X and Y to fit the model and to estimate the mean squared error, without cross validation.
Default: 'resubstitution' |
'DFmax' |
Maximum number of nonzero coefficients in the model. lasso returns results only for Lambda values that satisfy this criterion.Default: Inf |
'Lambda' |
Vector of nonnegative Lambda values. See Definitions.
- If you do not supply Lambda, lasso calculates the largest value of Lambda that gives a nonnull model. In this case, LambdaRatio gives the ratio of the smallest to the largest value of the sequence, and NumLambda gives the length of the vector.
- If you supply Lambda, lasso ignores LambdaRatio and NumLambda.
Default: Geometric sequence of NumLambda values, the largest just sufficient to produce B = ۰ |
'LambdaRatio' |
Positive scalar, the ratio of the smallest to the largest Lambda value when you do not set Lambda.If you set LambdaRatio = ۰, lasso generates a default sequence of Lambda values, and replaces the smallest one with 0.Default: 1e-4 |
'MCReps' |
Positive integer, the number of Monte Carlo repetitions for cross validation.
- If CV is 'resubstitution' or a cvpartition of type 'resubstitution', MCReps must be 1.
- If CV is a cvpartition of type 'holdout', MCReps must be greater than 1.
Default: 1 |
'NumLambda' |
Positive integer, the number of Lambda values lasso uses when you do not set Lambda. lasso can return fewer than NumLambda fits if the if the residual error of the fits drops below a threshold fraction of the variance of Y.Default: 100 |
'Options' |
Structure that specifies whether to cross validate in parallel, and specifies the random stream or streams. Create the Options structure with statset. Option fields:
- UseParallel — Set to true to compute in parallel. Default is false.
- UseSubstreams — Set to true to compute in parallel in a reproducible fashion. To compute reproducibly, set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'. Default is false.
- Streams — A RandStream object or cell array consisting of one such object. If you do not specify Streams, lasso uses the default stream.
|
'PredictorNames' |
Cell array of strings representing names of the predictor variables, in the order in which they appear in X.Default: {} |
'RelTol' |
Convergence threshold for the coordinate descent algorithm (see Friedman, Tibshirani, and Hastie [3]). The algorithm terminates when successive estimates of the coefficient vector differ in the L2 norm by a relative amount less than RelTol.Default: 1e-4 |
'Standardize' |
Boolean value specifying whether lasso scales X before fitting the models.Default: true |
'Weights' |
Observation weights, a nonnegative vector of length n, where n is the number of rows of X. lasso scales Weights to sum to 1.Default: 1/n * ones(n,1) |
Output Arguments
B |
Fitted coefficients, a p-by-L matrix, where p is the number of predictors (columns) in X, and L is the number of Lambda values. |
FitInfo |
Structure containing information about the model fits.
Field in FitInfo |
Description |
Intercept |
Intercept term β۰ for each linear model, a 1-by-L vector |
Lambda |
Lambda parameters in ascending order, a 1-by-L vector |
Alpha |
Value of Alpha parameter, a scalar |
DF |
Number of nonzero coefficients in B for each value of Lambda, a 1-by-L vector |
MSE |
Mean squared error (MSE), a 1-by-L vector |
If you set the CV name-value pair to cross validate, the FitInfo structure contains additional fields.
Field in FitInfo |
Description |
SE |
The standard error of MSE for each Lambda, as calculated during cross validation, a 1-by-L vector |
LambdaMinMSE |
The Lambda value with minimum MSE, a scalar |
Lambda1SE |
The largest Lambda such that MSE is within one standard error of the minimum, a scalar |
IndexMinMSE |
The index of Lambda with value LambdaMinMSE, a scalar |
Index1SE |
The index of Lambda with value Lambda1SE, a scalar |
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% File: Lasso.m
% Author: Jinzhu Jia
%
% One simple (Gradiant Descent) implementation of the Lasso
% The objective function is:
% ۱/۲*sum((Y – X * beta -beta0).^2) + lambda * sum(abs(beta))
% Comparison with CVX code is done
% Reference: To be uploaded
%
% Version 4.23.2010
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [beta0,beta] = Lasso(X,Y,lambda)
n = size(X,1);
p = size(X,2);
if(size(Y,1) ~= n)
fprintf( ‘Error: dim of X and dim of Y are not match!!\n’)
quit cancel
end
beta0 = 0;
beta= zeros(p,1);
crit = 1;
step = 0;
while(crit > 1E-5)
step = step + 1;
obj_ini = 1/2*sum((Y – X * beta -beta0).^2) + lambda * sum(abs(beta));
beta0 = mean(Y – X*beta);
for(j = 1:p)
a = sum(X(:,j).^2);
b = sum((Y – X*beta).* X(:,j)) + beta(j) * a;
if lambda >= abs(b)
beta(j) = 0;
else
if b>0
beta(j) = (b – lambda) / a;
else
beta(j) = (b+lambda) /a;
end
end
end
obj_now = 1/2*sum((Y – X * beta -beta0).^2) + lambda * sum(abs(beta));
crit = abs(obj_now – obj_ini) / abs(obj_ini);
end
————————————————————————-
%% Lasso regularization example
% Copyright (c) 2011, The MathWorks, Inc.
%% Introduction to using LASSO
using the lasso functionality introduced
% in R2011b. It is motivated by
% This demo explains how to star
tan example in Tibshiranis original paper
% on the lasso.
via the lasso.
% J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288)
% Tibshirani, R. (1996). Regression shrinkage and selection
.
% The data set that were working with in this demo is a wide
ferent
% variables and only 20 observations. 5 out of the 8 variables h
% dataset with correlated variables. This data set includes 8 di
fave
% coefficients of zero. These variables have zero impact on the model. The
up workspace and set random seed
clear all
clc
% Set the random num
% other three variables have non negative values and impact the model
%% Clean
ber stream
rng(1981);
%% Creating data set with specific characteristics
% Create eight X variables
with one another
% The covariance matrix is spec
% The mean of each variable will be equal to zero
mu = [0 0 0 0 0 0 0 0];
% The variable are correlate
dified as
i = 1:8;
matrix = abs(bsxfun(@minus,i',i));
covariance = repmat(.5,8,8).^matrix;
, covariance, 20);
% Create a hyperplane that describes Y = f(X)
Beta = [3; 1
% Use these parameters to generate a set of multivariate normal random numbers
X = mvnrnd(m
u.5; 0; 0; 2; 0; 0; 0];
ds = dataset(Beta);
% Add in a noise vector
Y = X * Beta + 3 * randn(20,1);
%% Use linear regression to fit the model
b = regress(Y,X);
s, ‘PlotType’,
ds.Linear = b;
%% Use a lasso to fit the model
[B Stats] = lasso(X,Y, 'CV', 5);
disp(B)
disp(Stats)
%% Create a plot showing MSE versus lamba
lassoPlot(B, Sta
t 'CV')
%% Identify a reasonable set of lasso coefficients
% View the regression coefficients associated with Index1SE
ds.Lasso = B(:,Stats.Index1SE);
disp(ds)
(۱۰۰,۱);
Coeff_Num = zeros(100,1);
Betas = zeros(8,100);
cv
%% Create a plot showing coefficient values versus L1 norm
lassoPlot(B, Stats)
%% Run a Simulation
% Preallocate some variables
MSE = zeros(100,1);
mse = zero
s_Reg_MSE = zeros(1,100);
for i = 1 : 100
X = mvnrnd(mu, covariance, 20);
Y = X * Beta + randn(20,1);
[B Stats] = lasso(X,Y, 'CV', 5);
E(i) = Stats.MSE(:, Shrink);
regf = @(XTRAIN, ytrain, XTEST)(XTEST*
Shrink = Stats.Index1SE - ceil((Stats.Index1SE - Stats.IndexMinMSE)/2);
Betas(:,i) = B(:,Shrink) > 0;
Coeff_Num(i) = sum(B(:,Shrink) > 0);
M
Sregress(ytrain,XTRAIN));
cv_Reg_MSE(i) = crossval('mse',X,Y,'predfun',regf, 'kfold', 5);
end
Number_Lasso_Coefficients = mean(Coeff_Num);
disp(Number_Lasso_Coefficients)
MSE_Ratio = median(cv_Reg_MSE)/median(MSE);
disp(MSE_Ratio)