## The module "train_pca" of the Mastrave modelling library

**Daniele de Rigo**

#### Copyright and license notice of the function train_pca

Copyright © 2007,2008,2009,2010 Daniele de Rigo

The file train_pca.m is part of Mastrave.

Mastrave is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Mastrave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Mastrave. If not, see http://www.gnu.org/licenses/.

#### Function declaration

[pc,mse,w,val2pc,pc2val] = train_pca(values)

#### Description

Training engine to model a given numeric matrix ` values ` applying
the principal component analysis (pca).

`is composed by N row-vectors representing N vectorial points in an n-dimensional space (so that`

**values**`n adjacent columns refer to different dimension coordinates). The principal components`

**values**`are returned along with the coefficients`

**pc**`to transform principal components subsets into the corresponding approximations of the original`

**pc2val**`.`

**values**`returned coefficients enable the inverse transformation from`

**val2pc**`to`

**values**`. The i-th element of`

**pc**`is the mean square error associated with the use of the first i principal components.`

**mse**`is the vector of weights associated to each principal component, such that prod(`

**w**`) == 1.`

**w**`is proportional to the diagonal of the S matrix returned by`

**w**`[U,S,V] = svd(`such that

`)`**values**

`== U * S * V'`**values**

#### Input arguments

valuesNumeric matrix each row of it represents a vectorial point in an n-dimensional space (so that::numeric,matrix::n adjacent columns are expected to refer to different dimension coordinates).values

#### Example of usage

% Small example on how to train part of tha available data% to obtain both a pca decompositionMfor the training set and % the coefficientspc_trainto validate the pca decomposition with the % validation set of data. N_train = 20, N_valid = 10, N = N_train + N_valid rnd_id = randperm( N ); [ x , y ] = mdeal( rand( N , 2 ) ); M = [ sin(x) exp(y)-x x.*log(y+1) y cos(y).*x ]; [ M_train , M_valid ] = mdeal( M(rnd_id,:), [ N_train N_valid ] , 1 ); [ id_train , id_valid ] = mdeal( rnd_id(:), [ N_train N_valid ] , 1 ); [ pc_train , mse_train , w , v2p , p2v ] = train_pca( M_train ); isequal( pc_train , M_train * v2p ) % Compute an algorithm equivalent (only less efficient) to that used % by @train_pca to estimate for each k the mean square error which % is associated with the use of the first k principal components to % reconstructv2p. n = numel( mse_train ) o = ones( 1 , n ); M_aprox = zeros( [ size(M_train) , n ] ); for k=1:n M_aprox(:,:,k) = pc_train(:,1:k) * p2v(1:k,:); end mse_t = zeros( n , 1 ); mse_t(:) = mean( reshape( (M_train(:,:,o)-M_aprox).^2 , [] , 1 , n ) ) [ mse_train mse_t ] % Finally, compute the mean square errors associated with the % validation set. pc_valid = M_valid * v2p; M_aprox = zeros( [ size(M_valid) , n ] ); for k=1:n M_aprox(:,:,k) = pc_valid(:,1:k) * p2v(1:k,:); end mse_v = zeros( n , 1 ); mse_v(:) = mean( reshape( (M_valid(:,:,o)-M_aprox).^2 , [] , 1 , n ) ) [ mse_train mse_t mse_v ]values

Memory requirements: max( ... O( numel() * min( size(values) ) ) , ... O( @svd ) ... ) See also: screed Keywords: training engine, modeling, space transformation, principal component analysis Version: 0.4.8values

#### Support

The Mastrave modelling library is committed to provide reusable and general - but also robust and scalable - modules for research modellers dealing with computational science. You can help the Mastrave project by providing feedbacks on unexpected behaviours of this module. Despite all efforts, all of us - either developers or users - (should) know that errors are unavoidable. However, the free software paradigm successfully highlights that scientific knowledge freedom also implies an impressive opportunity for collectively evolve the tools and ideas upon which our daily work is based. Reporting a problem that you found using Mastrave may help the developer team to find a possible bug. Please, be aware that Mastrave is entirely based on voluntary efforts: in order for your help to be as effective as possible, please read carefully the section on reporting problems. Thank you for your collaboration.