SPLS-package {SPLS}R Documentation

Sparse partial least-squares regression for high-throughput data analysis

Description

SPLS is used to perform the sparse partial least-squares regression. Here are two approaches (SPLS and SPLS2).

Usage

SPLS(X, Y, penalty="HL", nc=1,  lambda=0.01)
SPLS2(X, Y, penalty="HL", nc=1, lambda=0.01)

Arguments

X

n-by-p data matrix of p predictors measured on n samples.

Y

n-by-q multivariate response data matrix with q variables from the same n samples.

penalty

"HL" is the unbounded penalty proposed by Lee and Oh (2009) and "L1" is the L1 penalty.

nc

Number of latent components

lambda

Tuning parameter for the sparsity.

Details

SPLS and SPLS2 are new formulations of the sparse PLS (SPLS) procedure to allow both sparse variable selection and dimension reduction. These methods allow the standard L1-penalty and the unbounded penalty of Lee and Oh (2009). The computing algorithm for splsHL and spls2 is the modified version of the nonlinear iterative partial least-squares (NIPALS) algorithm.

Value

W

A matrix with the direciton vectors with respect to original predictors, X

R

A matrix with the direciton vectors with respect to residual matrix

T

Latent component matrix

beta

Estimates of regreesion coefficients

lambda

Tuning parameter

Author(s)

Donghwan Lee, Woojoo Lee, Youngjo Lee and Yudi Pawitan

Maintainer: Woojoo Lee <lwj221@gmail.com> and Donghwan Lee <liebe02@snu.ac.kr>

References

Lee, D., Lee, W., Lee, Y. and Pawitan, Y. (2011). Sparse partial least-squares regression and its applications to high-throughput data analysis, Chemometrics and Intelligent Laboratory Systems, 109, 1-8.

Examples


## Generation of X and Y
n<-40; p<-100 ; q<-10; var.h<-25; nsr<-0.1; p1<-5

set.seed(12345)
err.x<-rnorm(n*p, mean=0, sd=sqrt(1)) 
err.y<-rnorm(n*q, mean=0, sd=sqrt(nsr*25*var.h))

X<-matrix(err.x,n,p);X<-scale(X,T,F)

h1<-rnorm(n,mean=0, sd=sqrt(var.h));h1<-c(scale(h1,T,F))
h2<-rnorm(n,mean=0, sd=sqrt(var.h));h2<-c(scale(h2,T,F))
h3<-rnorm(n,mean=0, sd=sqrt(var.h));h3<-c(scale(h3,T,F))


X[,1:p1]<-X[,1:p1]+h1
X[,(p1+1):(2*p1)]<-X[,(p1+1):(2*p1)]+h2
X[,(2*p1+1):p]<-X[,(2*p1+1):p]+h3


Y<-matrix(err.y,n,q);Y<-scale(Y,T,F)

Y<-Y+3*h1-4*h2

## SPLS approach
## 1. Find the optimal tuning parameter using 10-fold Cross validation 
splsHL.cv<-cv.SPLS(X,Y, penalty="HL", fold = 10, nc=c(1:3), lambda=seq(0.01,0.1,,10), plot.it = FALSE) 
## 2. splsHL with optimal tuning parameters
splsHL.opt<-SPLS(X, Y, penalty="HL",  nc=splsHL.cv$nc.opt, lambda=splsHL.cv$lambda.opt)

## Estimates of regression coefficients
splsHL.opt$beta


## SPLS2 approach
## 1. Find the optimal tuning parameter using 10-fold Cross validation 
spls2.cv<-cv.SPLS2(X,Y, penalty="HL", fold = 10, nc=c(1:3), lambda=seq(1,10,,10), plot.it = FALSE) 
## 2. splsHL with optimal tuning parameters
spls2.opt<-SPLS2(X, Y, penalty="HL",  nc=spls2.cv$nc.opt, lambda=spls2.cv$lambda.opt)

## Estimates of regression coefficients
spls2.opt$beta


[Package SPLS version 1.2 Index]