library(RRRR)

The R package *RRRR* provides methods for estimating online Robust Reduced-Rank Regression.

If you use any code from the `RRRR`

package in a publication, please use the following citation:

Yangzhuoran Yang and Ziping Zhao (2020). RRRR: Online Robust Reduced-Rank Regression Estimation. R package version 1.1.0. https://pkg.yangzhuoranyang.com/RRRR/.

This vignette aims to provide illustrations to estimate and update (Online) Reduced-Rank Regression using various methods contained in the package.

The formulation of the reduced-rank regression is as follow: \[y = \mu +AB' x + D z+innov,\] where for each realization

- \(y\) is a vector of dimension \(P\) for the \(P\) response variables,
- \(x\) is a vector of dimension \(Q\) for the \(Q\) explanatory variables that will be projected to reduce the rank,
- \(z\) is a vector of dimension \(R\) for the \(R\) explanatory variables that will not be projected,
- \(\mu\) is the constant vector of dimension \(P\),
- \(innov\) is the innovation vector of dimension \(P\),
- \(D\) is a coefficient matrix for \(z\) with dimension \(P*R\),
- \(A\) is the so called exposure matrix with dimension \(P*r\), and
- \(B\) is the so called factor matrix with dimension \(Q*r\).

The matrix resulted from \(AB'\) will be a reduced rank coefficient matrix with rank of \(r\). The function estimates parameters \(\mu\), \(A\), \(B\), \(D\), and \(Sigma\), the covariance matrix of the innovation’s distribution.

To simulate example data that can be used to estimate reduced-rank regression, use function `RRR_sim`

.

data <- RRR_sim() data #> Simulated Data for Reduced-Rank Regression #> ------------ #> Specifications: #> N P Q R r #> 1000 3 3 1 1 #> #> Coefficients: #> mu A B D Sigma1 Sigma2 Sigma3 #> 1 0.100000 -0.912728 -1.053396 -0.959006 1.000000 0.000000 0.000000 #> 2 0.100000 -0.458783 -0.055563 0.348002 0.000000 1.000000 0.000000 #> 3 0.100000 1.145210 0.414796 -1.057068 0.000000 0.000000 1.000000

A number of parameters can be specified. See `?RRR_sim`

. The default arguments are set in such a way that the matrix resulted from \(AB'\) will be a reduced rank coefficient matrix with rank of \(r\).

str(data) #> List of 4 #> $ spec:List of 11 #> ..$ P : num 3 #> ..$ N : num 1000 #> ..$ Q : num 3 #> ..$ R : num 1 #> ..$ A : num [1:3, 1] -0.913 -0.459 1.145 #> ..$ B : num [1:3, 1] -1.0534 -0.0556 0.4148 #> ..$ mu : num [1:3] 0.1 0.1 0.1 #> ..$ D : num [1:3, 1] -0.959 0.348 -1.057 #> ..$ r : num 1 #> ..$ Sigma: num [1:3, 1:3] 1 0 0 0 1 0 0 0 1 #> ..$ innov: num [1:1000, 1:3] 1.2326 -0.451 0.0953 -1.0221 -0.0941 ... #> $ y : num [1:1000, 1:3] 0.257 -3.282 -0.313 -0.23 -2.116 ... #> $ x : num [1:1000, 1:3] -0.432 -0.935 -0.864 -1.721 -0.428 ... #> $ z : num [1:1000] 0.681 2.106 -0.564 -1.52 1.567 ... #> - attr(*, "class")= chr [1:2] "RRR_data" "list"

The returned list of `RRR_sim`

contains the input specifications and the data points \(y\), \(x\) and \(z\).

`RRR`

The Gaussian Maximum Likelihood Estimation method is described in Johansen, S. (1991). This method is not robust in the sense that it assumes a Gaussian distribution for the innovations which does not take into account the heavy-tailedness of the true distribution and outliers.

res_gmle <- RRR(y=data$y, x=data$x, z = data$z) res_gmle #> Reduced-Rank Regression #> ------------ #> Specifications: #> N P Q R r #> 1000 3 3 1 1 #> #> Coefficients: #> mu A B D Sigma1 Sigma2 Sigma3 #> 1 0.126372 1.024873 0.891214 -0.934592 2.622808 0.066015 0.114690 #> 2 0.057212 0.524635 0.058963 0.250266 0.066015 2.964781 -0.117965 #> 3 0.076175 -1.379085 -0.421274 -1.049623 0.114690 -0.117965 2.837209

The matrix \(z\) and the constant \(\mu\) term are optional.

`RRRR`

The Majorisation-Minimisation estimation method is partly described in Zhao, Z., & Palomar, D. P. (2017). This method is robust in the sense that it assumes a heavy-tailed Cauchy distribution for the innovations. As before the matrix \(z\) and the constant term \(\mu\) are optional.

res_mm <- RRRR(y=data$y, x=data$x, z = data$z, itr = 100, earlystop = 1e-4) res_mm #> Robust Reduced-Rank Regression #> ------ #> Majorisation-Minimisation #> ------------ #> Specifications: #> N P Q R r #> 1000 3 3 1 1 #> #> Coefficients: #> mu A B D Sigma1 Sigma2 Sigma3 #> 1 0.1075724 0.5155024 1.8281610 -1.0163317 0.7221273 -0.0031805 0.0329807 #> 2 0.1157375 0.2259829 0.0738263 0.3250376 -0.0031805 0.7492083 -0.0456303 #> 3 0.0917840 -0.6470619 -0.7948998 -1.0679659 0.0329807 -0.0456303 0.7439557

Additional arguments that are worth noticing are `itr`

, which control the maximum number of iteration, and `earlystop`

, which is the criteria to stop the algorithm early. The algorithm will stop if the improvement on objective value is small than `earlystop`

\(\times\ objective\_from\_last\_iteration\).

This method is an iterative optimization algorithm so we can use the inbuilt `plot.RRRR`

method to see the convergence plot of the algorithm.

plot(res_mm, aes_x = "iteration", xlog10 = TRUE)

Argument `aes_x`

can set the x axis to be the number of iteration or the run time. Argument `xlog10`

can indicate whether the scale of x axis is log 10 transformed.

`ORRRR`

The description of the generic Stochastic Successive Upper-bound Minimisation method and the Sample Average Approximation can be found in Razaviyayn, M., Sanjabi, M., & Luo, Z. Q. (2016).

There are two major estimation methods supported:

- SMM: Stochastic Majorisation-Minimisation
- SAA: Sample Average Approximation

The algorithm is online in the sense that the data is continuously incorporated and the algorithm can update the parameters accordingly. As before the matrix \(z\) and the constant term \(\mu\) are optional.

At each iteration of SAA, a new realisation of the parameters is achieved by solving the minimisation problem of the sample average of the desired objective function using the data currently incorporated. This can be computationally expensive when the objective function is highly nonconvex. The SMM method overcomes this difficulty by replacing the objective function by a well-chosen majorising surrogate function which can be much easier to optimise.

By default the function `ORRRR`

uses SMM.

res_smm <- ORRRR(y=data$y, x=data$x, z=data$z, initial_size = 100, addon = 10) #> Loading required namespace: lazybar res_smm #> Online Robust Reduced-Rank Regression #> ------ #> Stochastic Majorisation-Minimisation #> ------------ #> Specifications: #> N P R r initial_size addon #> 1000 3 1 1 100 10 #> #> Coefficients: #> mu A B D Sigma1 Sigma2 Sigma3 #> 1 0.1064044 0.5191792 1.8197789 -1.0168206 0.7329679 -0.0037616 0.0338109 #> 2 0.1152912 0.2256964 0.0738946 0.3243414 -0.0037616 0.7623266 -0.0474689 #> 3 0.0927323 -0.6516321 -0.7918953 -1.0663292 0.0338109 -0.0474689 0.7554183

The simulated data set is of size 1000. In the above command, in the first iteration 100 realisations are used in the estimation with 10 more data points in each of the following iteration. Because of the increasing data size, the estimation will be slower the longer the algorithm run. Therefore, the estimated time left in the progress bar is not very accurate in this sense.

The output from `ORRRR`

can also plotted using `plot.RRRR`

.

plot(res_smm)

When using SAA, there are two sub solver supported in each iteration.

- optim: the
`optim`

function from the`stats`

package, and - MM: Majorisation-Minimisation method with ten subiterations by default.

res_saa_optim <- ORRRR(y=data$y, x=data$x, z=data$z, method = "SAA", SAAmethod = "optim") res_saa_mm <- ORRRR(y=data$y, x=data$x, z=data$z, method = "SAA", SAAmethod = "MM")

`optim`

is a general purpose solver which means it will be quite slow for this specific problem, especially when the number of parameters is large. Embedding majorisation-minimisation into subiteration of SAA is more like a heuristic without solid theory backing up its efficiency. Due to the time constraint we do not show the estimated result here.

`update.RRRR`

With the result from `ORRRR`

, user can still update it with newly achieved data using function `update`

. Note the result from `RRRR`

can also be updated where it simply takes the result from `RRRR`

as the starting point in online estimation.

newdata <- RRR_sim() res2_smm <- update(res_smm, newy=newdata$y, newx=newdata$x, newz=newdata$z) res2_smm #> Online Robust Reduced-Rank Regression #> ------ #> Stochastic Majorisation-Minimisation #> ------------ #> Specifications: #> N P R r initial_size addon #> 2000 3 1 1 1010 10 #> #> Coefficients: #> mu A B D Sigma1 Sigma2 Sigma3 #> 1 0.111211 0.277957 2.000850 -0.293491 1.505942 -0.340514 -0.270444 #> 2 0.088274 0.198424 0.159417 0.017572 -0.340514 1.260649 0.260434 #> 3 0.093911 -0.299905 -0.782485 -0.833642 -0.270444 0.260434 1.733724

Without other arguments specified, `update`

will just take the original specification in the model. If it applies to output of `RRRR`

, then the default would be the default arguments in `ORRRR`

, i.e., with `method`

set to “SMM” and `addon`

set to 10.