Mastrave project

The module "train_pca" of the Mastrave modelling library

Daniele de Rigo

Copyright and license notice of the function train_pca

The file train_pca.m is part of Mastrave.

Mastrave is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Mastrave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Mastrave. If not, see http://www.gnu.org/licenses/.

Function declaration

 [pc, mse, w, val2pc, pc2val] = train_pca( values )

Description

Training engine to model a given numeric matrix values applying the principal component analysis (pca). values is composed by N row-vectors representing N vectorial points in an n-dimensional space (so that values n adjacent columns refer to different dimension coordinates). The principal components pc are returned along with the coefficients pc2val to transform principal components subsets into the corresponding approximations of the original values . val2pc returned coefficients enable the inverse transformation from values to pc . The i-th element of mse is the mean square error associated with the use of the first i principal components. w is the vector of weights associated to each principal component, such that prod( w ) == 1. w is proportional to the diagonal of the S matrix returned by
[U,S,V] = svd( values ) such that
values == U * S * V'

Input arguments


 values             ::numeric,matrix::
                    Numeric matrix each row of it represents a
                    vectorial point in an n-dimensional space (so that
                     values  n adjacent columns are expected to refer to
                    different dimension coordinates).

Example of usage


   % Small example on how to train part of tha available data  M 
   % to obtain both a pca decomposition  pc_train  for the training set and
   % the coefficients  v2p  to validate the pca decomposition with the
   % validation set of data. 
   N_train   = 20,  N_valid = 10,  N = N_train + N_valid
   rnd_id    = randperm( N );
   
   [ x , y ] = mdeal( rand( N , 2 ) );
   M         = [ sin(x) exp(y)-x x.*log(y+1) y cos(y).*x ];
   [ M_train  , M_valid  ] = mdeal( M(rnd_id,:), [ N_train N_valid ] , 1 );
   [ id_train , id_valid ] = mdeal(   rnd_id(:), [ N_train N_valid ] , 1 );

   [ pc_train , mse_train , w , v2p , p2v ] = train_pca( M_train );
   isequal( pc_train , M_train * v2p )
   
   % Compute an algorithm equivalent (only less efficient) to that used
   % by @train_pca   to estimate for each k the mean square error which
   % is associated with the use of the first k principal components to
   % reconstruct  values .
   n         = numel( mse_train )
   o         = ones( 1 , n );

   M_aprox   = zeros( [ size(M_train) , n ] );
   for k=1:n
      M_aprox(:,:,k) = pc_train(:,1:k) * p2v(1:k,:);
   end
   mse_t     = zeros( n , 1 );
   mse_t(:)  = mean( reshape( (M_train(:,:,o)-M_aprox).^2 , [] , 1 , n ) )
   [ mse_train mse_t ]

   % Finally, compute the mean square errors associated with the
   % validation set.
   pc_valid  = M_valid * v2p;
   M_aprox   = zeros( [ size(M_valid) , n ] );
   for k=1:n
      M_aprox(:,:,k) = pc_valid(:,1:k) * p2v(1:k,:);
   end
   mse_v     = zeros( n , 1 );
   mse_v(:)  = mean( reshape( (M_valid(:,:,o)-M_aprox).^2 , [] , 1 , n ) )
   [ mse_train mse_t mse_v ]

Memory requirements:
   max(                                                                  ...
      O( numel(  values  ) * min( size(  values  ) ) )                 , ...
      O( @svd )                                                          ...
   )



See also:
   screed



Keywords:
   training engine, modeling, space transformation, principal component analysis



Version: 0.4.8

Support

The Mastrave modelling library is committed to provide reusable and general - but also robust and scalable - modules for research modellers dealing with computational science. You can help the Mastrave project by providing feedbacks on unexpected behaviours of this module. Despite all efforts, all of us - either developers or users - (should) know that errors are unavoidable. However, the free software paradigm successfully highlights that scientific knowledge freedom also implies an impressive opportunity for collectively evolve the tools and ideas upon which our daily work is based. Reporting a problem that you found using Mastrave may help the developer team to find a possible bug. Please, be aware that Mastrave is entirely based on voluntary efforts: in order for your help to be as effective as possible, please read carefully the section on reporting problems. Thank you for your collaboration.

This page is licensed under a Creative Commons Attribution-NoDerivs 3.0 Italy License.

This document is also part of the book:
de Rigo, D. (2012). Semantic Array Programming with Mastrave - Introduction to Semantic Computational Modelling. http://mastrave.org/doc/MTV-1.012-1