library(RRRR)

## Introduction

The R package RRRR provides methods for estimating online Robust Reduced-Rank Regression.

If you use any code from the RRRR package in a publication, please use the following citation:

Yangzhuoran Yang and Ziping Zhao (2020). RRRR: Online Robust Reduced-Rank Regression Estimation. R package version 1.1.0. https://pkg.yangzhuoranyang.com/RRRR/.

This vignette aims to provide illustrations to estimate and update (Online) Reduced-Rank Regression using various methods contained in the package.

## Formulation

The formulation of the reduced-rank regression is as follow: $y = \mu +AB' x + D z+innov,$ where for each realization

• $$y$$ is a vector of dimension $$P$$ for the $$P$$ response variables,
• $$x$$ is a vector of dimension $$Q$$ for the $$Q$$ explanatory variables that will be projected to reduce the rank,
• $$z$$ is a vector of dimension $$R$$ for the $$R$$ explanatory variables that will not be projected,
• $$\mu$$ is the constant vector of dimension $$P$$,
• $$innov$$ is the innovation vector of dimension $$P$$,
• $$D$$ is a coefficient matrix for $$z$$ with dimension $$P*R$$,
• $$A$$ is the so called exposure matrix with dimension $$P*r$$, and
• $$B$$ is the so called factor matrix with dimension $$Q*r$$.

The matrix resulted from $$AB'$$ will be a reduced rank coefficient matrix with rank of $$r$$. The function estimates parameters $$\mu$$, $$A$$, $$B$$, $$D$$, and $$Sigma$$, the covariance matrix of the innovation’s distribution.

## Simulation

To simulate example data that can be used to estimate reduced-rank regression, use function RRR_sim.

data <- RRR_sim()
data
#> Simulated Data for Reduced-Rank Regression
#> ------------
#> Specifications:
#>    N    P    Q    R    r
#> 1000    3    3    1    1
#>
#> Coefficients:
#>          mu         A         B         D    Sigma1    Sigma2    Sigma3
#> 1  0.100000 -0.912728 -1.053396 -0.959006  1.000000  0.000000  0.000000
#> 2  0.100000 -0.458783 -0.055563  0.348002  0.000000  1.000000  0.000000
#> 3  0.100000  1.145210  0.414796 -1.057068  0.000000  0.000000  1.000000

A number of parameters can be specified. See ?RRR_sim. The default arguments are set in such a way that the matrix resulted from $$AB'$$ will be a reduced rank coefficient matrix with rank of $$r$$.

str(data)
#> List of 4
#>  $spec:List of 11 #> ..$ P    : num 3
#>   ..$N : num 1000 #> ..$ Q    : num 3
#>   ..$R : num 1 #> ..$ A    : num [1:3, 1] -0.913 -0.459 1.145
#>   ..$B : num [1:3, 1] -1.0534 -0.0556 0.4148 #> ..$ mu   : num [1:3] 0.1 0.1 0.1
#>   ..$D : num [1:3, 1] -0.959 0.348 -1.057 #> ..$ r    : num 1
#>   ..$Sigma: num [1:3, 1:3] 1 0 0 0 1 0 0 0 1 #> ..$ innov: num [1:1000, 1:3] 1.2326 -0.451 0.0953 -1.0221 -0.0941 ...
#>  $y : num [1:1000, 1:3] 0.257 -3.282 -0.313 -0.23 -2.116 ... #>$ x   : num [1:1000, 1:3] -0.432 -0.935 -0.864 -1.721 -0.428 ...
#>  $z : num [1:1000] 0.681 2.106 -0.564 -1.52 1.567 ... #> - attr(*, "class")= chr [1:2] "RRR_data" "list" The returned list of RRR_sim contains the input specifications and the data points $$y$$, $$x$$ and $$z$$. ## Reduced-Rank Regression using Gaussian MLE : RRR The Gaussian Maximum Likelihood Estimation method is described in Johansen, S. (1991). This method is not robust in the sense that it assumes a Gaussian distribution for the innovations which does not take into account the heavy-tailedness of the true distribution and outliers. res_gmle <- RRR(y=data$y, x=data$x, z = data$z)
res_gmle
#> Reduced-Rank Regression
#> ------------
#> Specifications:
#>    N    P    Q    R    r
#> 1000    3    3    1    1
#>
#> Coefficients:
#>          mu         A         B         D    Sigma1    Sigma2    Sigma3
#> 1  0.126372  1.024873  0.891214 -0.934592  2.622808  0.066015  0.114690
#> 2  0.057212  0.524635  0.058963  0.250266  0.066015  2.964781 -0.117965
#> 3  0.076175 -1.379085 -0.421274 -1.049623  0.114690 -0.117965  2.837209

The matrix $$z$$ and the constant $$\mu$$ term are optional.

res_gmle <- RRR(y=data$y, x=data$x, z=data$z, mu = FALSE) res_gmle <- RRR(y=data$y, x=data$x, z=NULL, mu = TRUE) res_gmle <- RRR(y=data$y, x=data$x, z=NULL, mu = FALSE) ## Robust Reduced-Rank Regression using Cauchy distribution and Majorisation-Minimisation: RRRR The Majorisation-Minimisation estimation method is partly described in Zhao, Z., & Palomar, D. P. (2017). This method is robust in the sense that it assumes a heavy-tailed Cauchy distribution for the innovations. As before the matrix $$z$$ and the constant term $$\mu$$ are optional. res_mm <- RRRR(y=data$y, x=data$x, z = data$z,
itr = 100,
earlystop = 1e-4)
res_mm
#> Robust Reduced-Rank Regression
#> ------
#> Majorisation-Minimisation
#> ------------
#> Specifications:
#>    N    P    Q    R    r
#> 1000    3    3    1    1
#>
#> Coefficients:
#>           mu          A          B          D     Sigma1     Sigma2     Sigma3
#> 1  0.1075724  0.5155024  1.8281610 -1.0163317  0.7221273 -0.0031805  0.0329807
#> 2  0.1157375  0.2259829  0.0738263  0.3250376 -0.0031805  0.7492083 -0.0456303
#> 3  0.0917840 -0.6470619 -0.7948998 -1.0679659  0.0329807 -0.0456303  0.7439557

Additional arguments that are worth noticing are itr, which control the maximum number of iteration, and earlystop, which is the criteria to stop the algorithm early. The algorithm will stop if the improvement on objective value is small than earlystop $$\times\ objective\_from\_last\_iteration$$.

This method is an iterative optimization algorithm so we can use the inbuilt plot.RRRR method to see the convergence plot of the algorithm.

plot(res_mm, aes_x = "iteration", xlog10 = TRUE) Argument aes_x can set the x axis to be the number of iteration or the run time. Argument xlog10 can indicate whether the scale of x axis is log 10 transformed.

## Online Robust Reduced-Rank Regression: ORRRR

The description of the generic Stochastic Successive Upper-bound Minimisation method and the Sample Average Approximation can be found in Razaviyayn, M., Sanjabi, M., & Luo, Z. Q. (2016).

There are two major estimation methods supported:

• SMM: Stochastic Majorisation-Minimisation
• SAA: Sample Average Approximation

The algorithm is online in the sense that the data is continuously incorporated and the algorithm can update the parameters accordingly. As before the matrix $$z$$ and the constant term $$\mu$$ are optional.

At each iteration of SAA, a new realisation of the parameters is achieved by solving the minimisation problem of the sample average of the desired objective function using the data currently incorporated. This can be computationally expensive when the objective function is highly nonconvex. The SMM method overcomes this difficulty by replacing the objective function by a well-chosen majorising surrogate function which can be much easier to optimise.

### SMM: Stochastic Majorisation-Minimisation

By default the function ORRRR uses SMM.

res_smm <- ORRRR(y=data$y, x=data$x, z=data$z, initial_size = 100, addon = 10) #> Loading required namespace: lazybar res_smm #> Online Robust Reduced-Rank Regression #> ------ #> Stochastic Majorisation-Minimisation #> ------------ #> Specifications: #> N P R r initial_size addon #> 1000 3 1 1 100 10 #> #> Coefficients: #> mu A B D Sigma1 Sigma2 Sigma3 #> 1 0.1064044 0.5191792 1.8197789 -1.0168206 0.7329679 -0.0037616 0.0338109 #> 2 0.1152912 0.2256964 0.0738946 0.3243414 -0.0037616 0.7623266 -0.0474689 #> 3 0.0927323 -0.6516321 -0.7918953 -1.0663292 0.0338109 -0.0474689 0.7554183 The simulated data set is of size 1000. In the above command, in the first iteration 100 realisations are used in the estimation with 10 more data points in each of the following iteration. Because of the increasing data size, the estimation will be slower the longer the algorithm run. Therefore, the estimated time left in the progress bar is not very accurate in this sense. The output from ORRRR can also plotted using plot.RRRR. plot(res_smm) ### SAA: Sample Average Approximation When using SAA, there are two sub solver supported in each iteration. • optim: the optim function from the stats package, and • MM: Majorisation-Minimisation method with ten subiterations by default. res_saa_optim <- ORRRR(y=data$y, x=data$x, z=data$z,
method = "SAA", SAAmethod = "optim")
res_saa_mm <- ORRRR(y=data$y, x=data$x, z=data$z, method = "SAA", SAAmethod = "MM") optim is a general purpose solver which means it will be quite slow for this specific problem, especially when the number of parameters is large. Embedding majorisation-minimisation into subiteration of SAA is more like a heuristic without solid theory backing up its efficiency. Due to the time constraint we do not show the estimated result here. ## Truly online: update.RRRR With the result from ORRRR, user can still update it with newly achieved data using function update. Note the result from RRRR can also be updated where it simply takes the result from RRRR as the starting point in online estimation. newdata <- RRR_sim() res2_smm <- update(res_smm, newy=newdata$y, newx=newdata$x, newz=newdata$z)
res2_smm
#> Online Robust Reduced-Rank Regression
#> ------
#> Stochastic Majorisation-Minimisation
#> ------------
#> Specifications:
#>            N            P            R            r initial_size        addon
#>         2000            3            1            1         1010           10
#>
#> Coefficients:
#>          mu         A         B         D    Sigma1    Sigma2    Sigma3
#> 1  0.111211  0.277957  2.000850 -0.293491  1.505942 -0.340514 -0.270444
#> 2  0.088274  0.198424  0.159417  0.017572 -0.340514  1.260649  0.260434
#> 3  0.093911 -0.299905 -0.782485 -0.833642 -0.270444  0.260434  1.733724

Without other arguments specified, update will just take the original specification in the model. If it applies to output of RRRR, then the default would be the default arguments in ORRRR, i.e., with method set to “SMM” and addon set to 10.