## The module "frequency_resampling" of the Mastrave modelling library

Daniele de Rigo

The file frequency_resampling.m is part of Mastrave.

Mastrave is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Mastrave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Mastrave. If not, see http://www.gnu.org/licenses/.

#### Function declaration

 [ resampled_freqs, resampled_obs, siz ] =
frequency_resampling( frequency_vals            ,
tot_observations     = [] ,
N_runs               = [] ,
rand_func            = [] ,
do_sparse_resampling = [] ,
use_binomial         = [] ,
do_obs_resampling    = [] )



#### Description

Module supporting the unbiased statistical resampling of binary observations (i.e. observations whose value may be true/positive or false/negative). Given an array recording the frequency of positive observations frequency_vals and an array with the corresponding total amount of recorded observations tot_observations (irrespective of whether they are positive or negative), the module generates a set of N_runs statistical resampling runs for both frequency_vals and tot_observations .

A given run randomly selects the observations following a uniform pseudo-random sequence (by default, generated with @rand) or a custom randomisation function rand_func which might be provided as optional input. The observations are selected with repetition (i.e. a given observation may be selected multiple times). While the overall amount of observations selected within each run is the same as that of tot_observations , this might not be true for the total number of positive observations.

frequency_vals and tot_observations must have the same size siz . The resampled frequencies resampled_freqs and the correspondingly resampled total observations resampled_obs are returned as matrices where each column provides the ourput of a run. The number of elements in each column (i.e. the number of matrix rows) is the same as the number of elements of frequency_vals and tot_observations .

#### Input arguments


frequency_vals         ::numel::
Array counting the positive occurrences (presences).
For each element of the array, the local cumulated
amount of presences is provided. An element's value
of zero is interpreted as an absence of positive
observations, irrespective of whether within the
element there is a lack of observations or instead
the local available observations are all negative.

tot_observations       ::numel::
Array counting the total observations (including both
positive and negative occurences). An element's value
of zero is interpreted as an absence of local
observations. Default: [] (empty array). If empty,
the same array of  frequency_vals  is used (i.e. all
observations are considered as positive).

N_runs                 ::scalar_index::
Number of runs of the statistical resampling.
Default: [] (empty array). If empty, only one run is
generated.

rand_func              ::function_handle::
Optional handle to a custom randomisation function.
Default: [] (empty array). If empty, the function
@rand is used.

do_sparse_resampling   ::scalar_binary::
Flag setting whether the returned values of
resampled_freqs  and  resampled_obs  must be sparse
matrices. Default: [] (empty array). If empty, the
flag is set as false.

use_binomial           ::scalar_binary::
Flag setting whether the returned values of
resampled_freqs  must be computed by explicit
bootstrap or instead by expliting a binomial Monte
Carlo extraction. The binomial method may be
more efficient where a high number of observations
per element characterise  frequency_vals  and
tot_observations . Default: [] (empty array). If
empty, the flag is set as false.

do_obs_resampling      ::scalar_binary::
Flag setting whether the returned values of
resampled_obs  must be based on statistical
resampling as  resampled_freqs  is, or not. If yes,
the total number of observations in each resampled
run is respected. If not, each column (i.e. run)
of  resampled_obs  has the same elements as
tot_observations . Default: [] (empty array).
If empty, the flag is set as true.

{  frequency_vals  ,  tot_observations  }  ::same_size::



#### Example of usage


P       = 0:2:20                  % Positive observations
PN      = ones(size(P))*max(P)    % All observ. (positive+negative ones)

ni      = 10000                   % Number of resampling runs
x       = linspace(0,1,ni);

% Bootstrap of local positive and negative observations
% (the overall number of observation is preserved)
[fr,ob] = frequency_resampling( P, PN, ni );

% Binomial distribution
% (without bootstrapping the local number of observations)
for i=1:numel(P), bino(:,i)=binoinv(rand(ni,1),PN(i),P(i)/PN(i)); end

figure(1);
hold off; plot( x, sort(fr./ob,2).'       );
hold on;  plot( x, sort(bino)./PN(1), 'o' );
title( sprintf( [ ...
'.\nthin: bootstrapped frequencies (P,PN, 10000 runs); \n' ...
'thick: binomial with fixed PN'                          ] ) )

% Bootstrap of only local positive observations
% (the local number of observation is preserved)
[fr,ob] = frequency_resampling( P, PN, ni, [], [], [], false );

figure(2);
hold off; plot( x, sort(fr./ob,2).'       );
hold on;  plot( x, sort(bino)./PN(1), 'o' );
title( sprintf( [ ...
'.\nthin: bootstrapped frequencies (only P, 10000 runs); \n' ...
'thick: binomial with fixed PN'                            ] ) )


See also:
rand_idx, frequency2index, mloop

Keywords:
data-transformation, matrix, blocks

Version: 0.6.12

#### Support

The Mastrave modelling library is committed to provide reusable and general - but also robust and scalable - modules for research modellers dealing with computational science.  You can help the Mastrave project by providing feedbacks on unexpected behaviours of this module.  Despite all efforts, all of us - either developers or users - (should) know that errors are unavoidable.  However, the free software paradigm successfully highlights that scientific knowledge freedom also implies an impressive opportunity for collectively evolve the tools and ideas upon which our daily work is based.  Reporting a problem that you found using Mastrave may help the developer team to find a possible bug.  Please, be aware that Mastrave is entirely based on voluntary efforts: in order for your help to be as effective as possible, please read carefully the section on reporting problems.  Thank you for your collaboration.

Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Daniele de Rigo