The module "frequency_resampling" of the Mastrave modelling library

 

Daniele de Rigo

 


Copyright and license notice of the function frequency_resampling

 

 

Copyright © 2007,2008,2009,2010,2011,2012,2013,2014 Daniele de Rigo

The file frequency_resampling.m is part of Mastrave.

Mastrave is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Mastrave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Mastrave. If not, see http://www.gnu.org/licenses/.

Function declaration

 

 

 [ resampled_freqs, resampled_obs, siz ] = 
    frequency_resampling( frequency_vals            ,
                          tot_observations     = [] ,
                          N_runs               = [] ,
                          rand_func            = [] ,
                          do_sparse_resampling = [] ,
                          use_binomial         = [] ,
                          do_obs_resampling    = [] )

Description

 

 

Module supporting the unbiased statistical resampling of binary observations (i.e. observations whose value may be true/positive or false/negative). Given an array recording the frequency of positive observations frequency_vals and an array with the corresponding total amount of recorded observations tot_observations (irrespective of whether they are positive or negative), the module generates a set of N_runs statistical resampling runs for both frequency_vals and tot_observations .

A given run randomly selects the observations following a uniform pseudo-random sequence (by default, generated with @rand) or a custom randomisation function rand_func which might be provided as optional input. The observations are selected with repetition (i.e. a given observation may be selected multiple times). While the overall amount of observations selected within each run is the same as that of tot_observations , this might not be true for the total number of positive observations.

frequency_vals and tot_observations must have the same size siz . The resampled frequencies resampled_freqs and the correspondingly resampled total observations resampled_obs are returned as matrices where each column provides the ourput of a run. The number of elements in each column (i.e. the number of matrix rows) is the same as the number of elements of frequency_vals and tot_observations .

Input arguments

 

 


 frequency_vals         ::numel::
                        Array counting the positive occurrences (presences).
                        For each element of the array, the local cumulated 
                        amount of presences is provided. An element's value
                        of zero is interpreted as an absence of positive
                        observations, irrespective of whether within the
                        element there is a lack of observations or instead
                        the local available observations are all negative. 

 tot_observations       ::numel::
                        Array counting the total observations (including both
                        positive and negative occurences). An element's value
                        of zero is interpreted as an absence of local
                        observations. Default: [] (empty array). If empty,
                        the same array of  frequency_vals  is used (i.e. all 
                        observations are considered as positive).

 N_runs                 ::scalar_index::
                        Number of runs of the statistical resampling.
                        Default: [] (empty array). If empty, only one run is
                        generated.

 rand_func              ::function_handle::
                        Optional handle to a custom randomisation function.
                        Default: [] (empty array). If empty, the function 
                        @rand is used.
                     
 do_sparse_resampling   ::scalar_binary::
                        Flag setting whether the returned values of 
                         resampled_freqs  and  resampled_obs  must be sparse
                        matrices. Default: [] (empty array). If empty, the 
                        flag is set as false.

 use_binomial           ::scalar_binary::
                        Flag setting whether the returned values of 
                         resampled_freqs  must be computed by explicit 
                        bootstrap or instead by expliting a binomial Monte
                        Carlo extraction. The binomial method may be 
                        more efficient where a high number of observations
                        per element characterise  frequency_vals  and 
                         tot_observations . Default: [] (empty array). If 
                        empty, the flag is set as false.

 do_obs_resampling      ::scalar_binary::
                        Flag setting whether the returned values of 
                         resampled_obs  must be based on statistical
                        resampling as  resampled_freqs  is, or not. If yes,
                        the total number of observations in each resampled
                        run is respected. If not, each column (i.e. run)
                        of  resampled_obs  has the same elements as  
                         tot_observations . Default: [] (empty array). 
                        If empty, the flag is set as true.

{  frequency_vals  ,  tot_observations  }  ::same_size::


Example of usage

 

 

  
   P       = 0:2:20                  % Positive observations
   PN      = ones(size(P))*max(P)    % All observ. (positive+negative ones)

   ni      = 10000                   % Number of resampling runs
   x       = linspace(0,1,ni);  

   % Bootstrap of local positive and negative observations
   % (the overall number of observation is preserved) 
   [fr,ob] = frequency_resampling( P, PN, ni );
   
   % Binomial distribution 
   % (without bootstrapping the local number of observations) 
   for i=1:numel(P), bino(:,i)=binoinv(rand(ni,1),PN(i),P(i)/PN(i)); end
   
   figure(1);
   hold off; plot( x, sort(fr./ob,2).'       ); 
   hold on;  plot( x, sort(bino)./PN(1), 'o' );
   title( sprintf( [ ...
      '.\nthin: bootstrapped frequencies (P,PN, 10000 runs); \n' ...
      'thick: binomial with fixed PN'                          ] ) )

   % Bootstrap of only local positive observations
   % (the local number of observation is preserved) 
   [fr,ob] = frequency_resampling( P, PN, ni, [], [], [], false );

   figure(2);
   hold off; plot( x, sort(fr./ob,2).'       ); 
   hold on;  plot( x, sort(bino)./PN(1), 'o' );
   title( sprintf( [ ...
      '.\nthin: bootstrapped frequencies (only P, 10000 runs); \n' ...
      'thick: binomial with fixed PN'                            ] ) )


See also:
   rand_idx, frequency2index, mloop



Keywords:
   data-transformation, matrix, blocks



Version: 0.6.12

Support

 

 

The Mastrave modelling library is committed to provide reusable and general - but also robust and scalable - modules for research modellers dealing with computational science.  You can help the Mastrave project by providing feedbacks on unexpected behaviours of this module.  Despite all efforts, all of us - either developers or users - (should) know that errors are unavoidable.  However, the free software paradigm successfully highlights that scientific knowledge freedom also implies an impressive opportunity for collectively evolve the tools and ideas upon which our daily work is based.  Reporting a problem that you found using Mastrave may help the developer team to find a possible bug.  Please, be aware that Mastrave is entirely based on voluntary efforts: in order for your help to be as effective as possible, please read carefully the section on reporting problems.  Thank you for your collaboration.

Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Daniele de Rigo

This page is licensed under a Creative Commons Attribution-NoDerivs 3.0 Italy License.

This document is also part of the book:
de Rigo, D. (2012). Semantic Array Programming with Mastrave - Introduction to Semantic Computational Modelling. http://mastrave.org/doc/MTV-1.012-1


Valid XHTML 1.0 Transitional