A Multilayered-and-Randomized Latent Factor Model for High-Dimensional and Sparse Matrices

How to extract useful knowledge from a high-dimensional and sparse (HiDS) matrix efficiently is critical for many big data-related applications. A latent factor (LF) model has been widely adopted to address this problem. It commonly relies on an iterative learning algorithm like stochastic gradient descent. However, an algorithm of this kind commonly consumes many iterations to converge, resulting in considerable time cost on large-scale datasets. How to accelerate an LF model’s training process without accuracy loss becomes a vital issue. To address it, this study innovatively proposes a multilayered-and-randomized latent factor (MLF) model. Its main idea is two-fold: a) adopting randomized-learning to train LFs for implementing a ‘one-iteration’ training process for saving time; and 2) adopting the principle of a generally multilayered structure as in a deep forest or multilayered extreme learning machine to structure its LFs, thereby enhancing its representative learning ability. Empirical studies on six HiDS matrices from real applications demonstrate that compared with state-of-the-art LF models, an MLF model achieves significantly higher computational efficiency with satisfactory prediction accuracy. It has the potential to handle LF analysis on a large scale HiDS matrix with real-time requirements.