Mastrave project

The unchecked module "central_value_nosparse_" of the Mastrave modelling library

Daniele de Rigo

Copyright and license notice of the function central_value_nosparse_

The file central_value_nosparse_.m is part of Mastrave.

Mastrave is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Mastrave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Mastrave. If not, see http://www.gnu.org/licenses/.

Function declaration

[set_centered_Y, set_X] = central_value_nosparse_( X, Y , statistic , T )

Description

Module implementing a widely compatible version of the central_value algorithm, optimized for minimal memory usage without requiring sparse matrices.

Warning: this module is under revision. Its usage is temporarily not recommended until the revision is concluded.

Warning: central_value_nosparse_ should not be used in end-user code because it deliberately skips input and output arguments checks and other pre- and post-condition testing. This function should only be used within code which ensures the constraints described below for the input arguments to be satisfied.

Given an X vector, the module clusterizes X by extracting its unique value set called set_X . For each element i of set_X it will be selected a sub-set containig only the elements of X equal to . The corresponding subsets from the matrix of the signal Y are then processed (subset by subset) following the selected statistic , and the i -th result is stored into the i -th set_centered_Y element.

If X is a logical vector, then the sequences of contiguous true values are numbered and those numbers are regarded as the set set_X , so excluding from the statistic computation all the Y elements corresponding to the false values of X .

All the computations that implement the statistic are strictly vectorial, without any use of interpreted loops.

Input arguments


 X                 ::vector,numeric::
                   Array of independent variable elements (it must be a
                   single vector, or at least a scalar).

 Y                 ::matrix::
                   Array of dependent variable elements (it must have
                   the same nuber of rows as the  X  length: it can be
                   a matrix too, that is R->R^n functions are supported).

 statistic         ::string::
                   Name of the statistic to be applied to each cluster of
                   the  Y  matrix. Valid statistics are:

                     statistic  │    meaning
                   ─────────────┼───────────────────────────────────────
                     'mean'     │    mean
                     'median'   │    median
                     'min'      │    min
                     'max'      │    max
                     'count'    │    number of elements
                     'sum'      │    sum
                     'sumsq'    │    sum of squares 
                     'prod'     │    product
                     'var'      │    sample variance
                     'std'      │    sample standard deviation
                     'var_p'    │    population variance
                     'std_p'    │    population standard deviation

 T                 ::scalar_positive::
                   Optional cyclostationarity period of the independent 
                   variable. If passed, it will be considered
                      X(i) + T == X(i) 
                   for each  i -th element of  X . If omitted, it is
                   considered  T  = infty (no periodicity).

Example of usage


% Example 1: 

  n = 20;  N = 40;
  x = rand( 1, n );   x = x( ceil( rand( 1, N ) * n ) );
  y = [ sin(x*2*pi)+10; sin(x*2*pi) ] + randn( 2, N ) * .3;
  [ cy, cx ] = central_value_nosparse_( x, y, 'min' )
  plot( x, y, 'o', cx, cy ); pause;
  [ cy, cx ] = central_value_nosparse_( x, y, 'std' )
  plot( x, y, 'o', cx, cy ); pause;


% Example 2:

  T  = 365.25;   n = floor(T);   N = 80 * T;   k = repmat( [1.7 3]', 1, N );
  xx = rand( 1, n ) * 4*T;   xx = xx( ceil( rand( 1, N ) * n ) );
  yy = exp( randn(2,N).*k ) + [ sin(xx/T*2*pi)*2+3; cos(xx/T*2*pi)*20+130 ];

  t_central_value = cputime;                         % start speed test
     [mean_yy, cxx] = central_value_nosparse_(xx,yy,'mean',T);
     median_yy      = central_value_nosparse_(xx,yy,'median',T);
  t_central_value = cputime-t_central_value;         % end speed test
 
  semilogy( xx, yy, '.', cxx, [mean_yy;median_yy] )

% Comparison with the classical approach:
   
  t_classical = cputime;                             % start speed test
     mxx        = mod(  xx, T );
     cxx2       = sort( mxx   ); 
     cxx2       = cxx2( find( [1 diff(cxx2)] ) );
     mean2_yy   = zeros( size(yy,1), size(cxx2,2) );
     median2_yy = zeros( size(yy,1), size(cxx2,2) ); 
     for i=1:size( cxx2, 2 ) 
        for j=1:size( yy, 1 ) 
           mean2_yy(  j,i) = mean(   yy( j, mxx==cxx2(i) ) ); 
           median2_yy(j,i) = median( yy( j, mxx==cxx2(i) ) ); 
        end
     end
  t_classical = cputime - t_classical;               % end speed test

  all(all( mean2_yy   == mean_yy   ))
  all(all( median2_yy == median_yy ))

  ratio = t_classical / t_central_value;
  fprintf(1, '\n\tThe central_value approach is %4.2g faster\n\n', ratio )


% Example 3 (speed test):

  T = 200;     n0      = floor(T);   N = 20*T;
  n = n0:n0:N; samples = 10;
  t_central_value      = zeros( samples,size(n,2) );
  t_classical          = zeros( samples,size(n,2) );
  for i=1:length(n)
     fprintf( 1,'\nduplication ratio: %g  ', n(i)/N )
     for s = 1:samples
        xx = rand(1,n(i))*4*T;  xx = xx( ceil( rand(1,N)*n(i) ) );
        yy = [ sin(xx/T*2*pi)+10; sin(xx/T*2*pi) ] + randn(2,N)*.3;
     
        fprintf(1,' .' )
        t_central_value(s,i)  = cputime;                  % start speed test
           [ median_yy, cxx ] = central_value_nosparse_(xx,yy,'median',T);
        t_central_value(s,i)  = ...
           cputime-t_central_value(s,i);                  % end speed test
  
        t_classical(s,i) = cputime;                        % start speed test
           mxx        = mod(  xx, T );
           cxx2       = sort( mxx   ); 
           cxx2       = cxx2( find([1 diff(cxx2)]) );
           median2_yy = zeros( size(yy,1), size(cxx2,2) ); 
           for j=1:size(cxx2,2) 
              for k=1:size(yy,1)  
                 median2_yy(k,j) = median( yy( k, mxx==cxx2(j) ) ); 
              end
           end
        t_classical(s,i) = cputime - t_classical(s,i);    % end speed test
     end
  end
  plot(  n/N, [t_classical./t_central_value], 'ob' )
  xlabel( 'non duplicated data ratio'                         )
  ylabel( 'speed ratio: (classical approach)/(central value)' )

version: 0.2.6

Support

The Mastrave modelling library is committed to provide reusable and general - but also robust and scalable - modules for research modellers dealing with computational science. You can help the Mastrave project by providing feedbacks on unexpected behaviours of this module. Despite all efforts, all of us - either developers or users - (should) know that errors are unavoidable. However, the free software paradigm successfully highlights that scientific knowledge freedom also implies an impressive opportunity for collectively evolve the tools and ideas upon which our daily work is based. Reporting a problem that you found using Mastrave may help the developer team to find a possible bug. Please, be aware that Mastrave is entirely based on voluntary efforts: in order for your help to be as effective as possible, please read carefully the section on reporting problems. Thank you for your collaboration.

This page is licensed under a Creative Commons Attribution-NoDerivs 3.0 Italy License.

This document is also part of the book:
de Rigo, D. (2012). Semantic Array Programming with Mastrave - Introduction to Semantic Computational Modelling. http://mastrave.org/doc/MTV-1.012-1