Principal Component Analysis for Large Scale Problems with Lots of Missing Value

Raiko, T.; Ilin, A.; Karhunen, J.

doi:10.1007/978-3-540-74958-5_69

Published 2007 | Version v2

Conference paper Metadata-only

Principal Component Analysis for Large Scale Problems with Lots of Missing Value

Description

Principal component analysis (PCA) is a well-known classical data analysis technique. There are a number of algorithms for solving the problem, some scaling better than others to problems with high dimensionality. They also differ in their ability to handle missing values in the data. We study a case where the data are high-dimensional and a majority of the values are missing. In case of very sparse data, overfitting becomes a severe problem even in simple linear models such as PCA. We propose an algorithm based on speeding up a simple principal subspace rule, and extend it to use regularization and variational Bayesian (VB) learning. The experiments with Netflix data confirm that the proposed algorithm is much faster than any of the compared methods, and that VB-PCA method provides more accurate predictions for new data than traditional PCA or regularized PCA.

Title	Principal Component Analysis for Large Scale Problems with Lots of Missing Value
Authors	Raiko, T. Ilin, A. Karhunen, J.
Publisher	Springer
Year of publication	2007

Describes: Conference paper: 10.1007/978-3-540-74958-5_69 (DOI)

	All versions	This version
Views	41	39
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Principal Component Analysis for Large Scale Problems with Lots of Missing Value

Description

Details

Additional Details

Related works

Principal Component Analysis for Large Scale Problems with Lots of Missing Value

Creators

Description

Description

Details

Additional Details

Related works