de Rigo, D. (2012). Weighting data for reducing data-transformation bias in linear regressions: the module "mwlin" of the Mastrave modelling library. In: Semantic Array Programming with Mastrave - Introduction to Semantic Computational Modelling. http://mastrave.org/doc/mtv_m/mwlin

## Weighting data for reducing data-transformation bias in linear regressions: the module "mwlin" of the Mastrave modelling library

Daniele de Rigo

The file mwlin.m is part of Mastrave.

Mastrave is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Mastrave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Mastrave. If not, see http://www.gnu.org/licenses/.

#### Function declaration

 [x, transformed_M, transformed_y, transformed_weights] =
mwlin( M                 ,
y                 ,
weights   = []    ,
func      = @(x)x ,
is_nonneg = false )



#### Description

Module for solving a linear system composed by a matrix of coefficients M and a vector of constant terms y , subject to a data-transformation func of both M and y . To each row of M and the corresponding element of y a given weight can be assigned so that the corresponding error is accordingly weighted. Weights can be provided within an array weights (with the same size of y ). If the flag is_nonneg is set to true, all elements of the solution x are constrained not to be negative.

The implemented algorithm is generally able to provide better solutions than the ones which can be achieved by solving the transformed system:

M1 = func( M )
y1 = func( y )
x1 = M1 \ y1

if weights are omitted or:

x1 = ( M1' * diag(weights) * M1 ) \ ( M1' * diag(weights) * y1 )

if them are passed as input argument. The improvement relies on complementing weights with the relative distance between the transformed values and the original ones so to take into account the modified effect of each error on the transformed values. The resulting transformed weights are returned within transformed_weights . In general, some original values might be tranformed in infinite or NaN values, which would numerically degrade the linear regression. Therefore, only transformed values which are @finite are considered when solving the transformed linear system.

#### Input arguments


M              ::matrix,numeric::
Matrix of coefficients of the linear system whose weighted
least squares solution  x  has to be computed after
transforming both  M  and  y  with the transformation
function  y_func .

y              ::matrix,numeric::
Constant terms.

weights        ::matrix,nonnegative::
Weights associated to the constant terms  y  and the
corresponding lines of  M .  If an empty matrix  []  is
passed, then all weights are considered as having the
same value.
If omitted, default value is  [] .

func           ::function_handle::
Handle to a data-transformation function to be applied
to both  M  and  y .
If omitted, the default value is the trivial function
@(x)x .

is_nonneg      ::scalar,logical::
Flag describing whether the returne solution  x  must be
of nonnegative elements, or not.
If omitted, the default value is  false .



#### Example of usage


[a, b] = deal( 0.01, 3 );
x      = rand( 1000, 1 )*4;
noise  = randn(1000,1)/20;
y_true = a*x.^b;

% Observations affected by growing noise
% ( noise also generates systematic overestimation for low values )
y1     = max( eps, y_true + noise/3 );
y2     = max( eps, y_true + noise   );
M      = [ x, exp(ones(1000,1)) ];

w1a    = log( M ) \ log( y1 )
w1b    = mwlin( M, y1, [], @log )

w2a    = log( M ) \ log( y2 )
w2b    = mwlin( M, y2, [], @log )

f      = @(x,w) exp( w(2) ) * x.^w(1)

figure( 1 )
plot( x, [ y1, y_true, f(x,w1a) f(x,w1b) ], '.' )
legend( 'noisy data', 'true data', 'log-regression', 'mwlin' )

figure( 2 )
plot( x, [ y2, y_true, f(x,w2a) f(x,w2b) ], '.' )
legend( 'noisy data', 'true data', 'log-regression', 'mwlin' )


See also:
train_pca

Keywords:
linear system, regression, weighting, data-transformation, nonnegative

Version: 0.2.9

#### Support

The Mastrave modelling library is committed to provide reusable and general - but also robust and scalable - modules for research modellers dealing with computational science.  You can help the Mastrave project by providing feedbacks on unexpected behaviours of this module.  Despite all efforts, all of us - either developers or users - (should) know that errors are unavoidable.  However, the free software paradigm successfully highlights that scientific knowledge freedom also implies an impressive opportunity for collectively evolve the tools and ideas upon which our daily work is based.  Reporting a problem that you found using Mastrave may help the developer team to find a possible bug.  Please, be aware that Mastrave is entirely based on voluntary efforts: in order for your help to be as effective as possible, please read carefully the section on reporting problems.  Thank you for your collaboration.

Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Daniele de Rigo