Mastrave project

The module "get_reference_data" of the Mastrave modelling library

Daniele de Rigo

Copyright and license notice of the function get_reference_data

The file get_reference_data.m is part of Mastrave.

Mastrave is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Mastrave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Mastrave. If not, see http://www.gnu.org/licenses/.

Function declaration

 [data, url, reference, models, params] = get_reference_data( dataset_code )

Description

Utility to transparently download and import reference datasets. The supported datasets fulfill three general characteristics:

1. are well-estabilished scientific datasets and well-known to
have been used as reference data;

2. are concise datasets with few quantities and with an overall
amount of data which is suitable to be loaded and manipulated
within the RAM as a single vector, matrix or multi-dimensional
array;

3. have been made available online at stable URLs offering
free access to their content (their provenance is always
acknowledged by providing it as reference output argument).

The data are returned as the output array data . Their stable web address is returned in url and a reference to the original work from which the data are derived is returned in reference . Each dimension or quantity of the dataset is returned column-wise so that each column of data contains a corresponding dimension or quantity.

Some datasets are also provided with proposed models whose general formulation is returned in a cell-array models . If the proposed models are n , models is a cell-array of n rows and 3 columns. Each row of models contains

1. the function handle of the model;

2. a scalar index referring to the dimension (i.e. the column) of
data approximated by the model;

3. a row array of indexes referring to the dimensions (columns)
of data required as input arguments of the model. If for example data is a 4-column matrix [x1,x2,x3,x4] (assuming the 4 column-vectors of data composing data were denoted as x1, x2, x3, x4 ) and a model f for estimating x3 is available as a function of x1 and x4 , models would be:

{ f , 3 , [1,4] }

In case a given dataset is not provided with recommended models, models would be empty.

In case they are available, certified parameters for each model are returned in the cell-array of matrices param (otherwise it would be empty). Each matrix element of params refers to the corresponding row of models . The first column of the i-th matrix of param contains the certified parameters of the i-th model, while the second column (which may be omitted) contains the corresponding standard deviations. The input arguments of a model that is function of m dimensions of data are n+1 : the vector of parameters and the m arrays referring to the m data dimensions. In the previous example, the approximation of x3 offered by the model with the average values of the certified parameters would be:

f( params{1}, x1, x4 )

All data are downloaded in the form with which they have been made available online and are imported as arrays in the hope that they will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. In case of errors (e.g. incorrect stable URL or acknowledgment) please contact the Mastrave project to help to keep up-to-date this utility.

References

NIST, NIST Statistical Reference Dataset.
Free access at
http://www.itl.nist.gov/div898/strd/general/dataarchive.html

Misra, D., NIST (1978). Dental Research Monomolecular Adsorption
Study. Free access at
http://www.itl.nist.gov/div898/strd/nls/data/misra1a.shtml

Chwirut, D., NIST (1979). Ultrasonic Reference Block Study.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/chwirut2.shtml
http://www.itl.nist.gov/div898/strd/nls/data/chwirut1.shtml

Lanczos, C. (1956). Applied Analysis. Englewood Cliffs, NJ:
Prentice Hall, pp. 272-280. Free access at
http://www.itl.nist.gov/div898/strd/nls/data/lanczos3.shtml

Rust, B., NIST (1996). Free access at
http://www.itl.nist.gov/div898/strd/nls/data/gauss1.shtml

Rust, B., NIST (1996). Free access at
http://www.itl.nist.gov/div898/strd/nls/data/gauss2.shtml

Daniel, C. and F. S. Wood (1980). Fitting Equations to Data,
Second Edition. New York, NY: John Wiley and Sons, pp. 428-431.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/daniel_wood.shtml

Misra, D., NIST (1978). Dental Research Monomolecular Adsorption
Study. Free access at
http://www.itl.nist.gov/div898/strd/nls/data/misra1b.shtml

Kirby, R., NIST (1979). Scanning electron microscope line width
standards. Free access at
http://www.itl.nist.gov/div898/strd/nls/data/kirby2.shtml

Hahn, T., NIST (1979). Copper Thermal Expansion Study.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/hahn1.shtml

Nelson, W. (1981). Analysis of Performance-Degradation Data. IEEE
Transactions on Reliability. Vol. 2, R-30, No. 2, pp. 149-155.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/nelson.shtml

Osborne, M. R. (1972). Some aspects of nonlinear least squares
calculations. In Numerical Methods for Nonlinear Optimization,
Lootsma (Ed). New York, NY: Academic Press, pp. 171-189.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/mgh17.shtml

Lanczos, C. (1956). Applied Analysis. Englewood Cliffs, NJ:
Prentice Hall, pp. 272-280. Free access at
http://www.itl.nist.gov/div898/strd/nls/data/lanczos1.shtml
http://www.itl.nist.gov/div898/strd/nls/data/lanczos2.shtml

Rust, B., NIST (1996). Free access at
http://www.itl.nist.gov/div898/strd/nls/data/gauss3.shtml

Misra, D., NIST (1978). Dental Research Monomolecular Adsorption.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/misra1c.shtml

Misra, D., NIST (1978). Dental Research Monomolecular Adsorption
Study. Free access at
http://www.itl.nist.gov/div898/strd/nls/data/misra1d.shtml

Roszman, L., NIST (1979). Quantum Defects for Sulfur I Atom.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/roszman1.shtml

Kahaner, D., C. Moler, and S. Nash, (1989). Numerical Methods and
Software. Englewood Cliffs, NJ: Prentice Hall, pp. 441-445.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/enso.shtml

Kowalik, J.S., and M. R. Osborne (1978). Methods for Unconstrained
Optimization Problems. New York, NY: Elsevier North-Holland.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/mgh09.shtml

Thurber, R., NIST (1979). Semiconductor electron mobility modeling.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/thurber.shtml

Box, G. P., W. G. Hunter, and J. S. Hunter, (1978).
Statistics for Experimenters. New York, NY: Wiley, pp. 483-487.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/boxbod.shtml

Ratkowsky, D.A. (1983). Nonlinear Regression Modeling.
New York, NY: Marcel Dekker, pp. 61 and 88. Free access at
http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky2.shtml

Meyer, R. R. (1970). Theoretical and computational aspects of
nonlinear regression. In Nonlinear Programming, Rosen, Mangasarian
and Ritter (Eds). New York, NY: Academic Press, pp. 465-486.
Free access at
http://www.itl.nist.gov/div898/strd/nls/data/mgh10.shtml

Eckerle, K., NIST (1979). Circular Interference Transmittance
Study. Free access at
http://www.itl.nist.gov/div898/strd/nls/data/eckerle4.shtml

Ratkowsky, D.A. (1983). Nonlinear Regression Modeling. New York,
NY: Marcel Dekker, pp. 62 and 88. Free access at
http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml

Bennett, L., Swartzendruber L., and H. Brown, NIST (1994).
Superconductivity Magnetization Modeling. Free access at
http://www.itl.nist.gov/div898/strd/nls/data/bennett5.shtml

Norris, J., NIST. Calibration of Ozone Monitors.
Free access at
http://www.itl.nist.gov/div898/strd/lls/data/Norris.shtml

Pontius, P., NIST. Load Cell Calibration.
Free access at
http://www.itl.nist.gov/div898/strd/lls/data/Pontius.shtml

Eberhardt, K., NIST.
Free access at
http://www.itl.nist.gov/div898/strd/lls/data/NoInt1.shtml
http://www.itl.nist.gov/div898/strd/lls/data/NoInt2.shtml

Filippelli, A., NIST.
Free access at
http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml

Longley, J.W. (1967). An Appraisal of Least Squares Programs
for the Electronic Computer from the Viewpoint of the User.
Journal of the American Statistical Association, 62, pp. 819-841.
Free access at
http://www.itl.nist.gov/div898/strd/lls/data/Longley.shtml

Wampler, R.H. (1970). A Report of the Accuracy of Some
Widely-Used Least Squares Computer Programs. Journal of the
American Statistical Association, 65, pp. 549-565.
Free access at
http://www.itl.nist.gov/div898/strd/lls/data/Wampler1.shtml
http://www.itl.nist.gov/div898/strd/lls/data/Wampler2.shtml
http://www.itl.nist.gov/div898/strd/lls/data/Wampler3.shtml
http://www.itl.nist.gov/div898/strd/lls/data/Wampler4.shtml
http://www.itl.nist.gov/div898/strd/lls/data/Wampler5.shtml

Input arguments


 dataset_code      ::string::
                   Mnemonic string to refer to the supported datasets
                   Valid codes are:

                         code        │   reference data
                      ───────────────┼──────────────────────────────────────
                        'Misra1a'    │ Misra and NIST (1978).
                                     │ 14 x 2 data, 1 model with 2 params.
                      ───────────────┼──────────────────────────────────────
                        'Chwirut2'   │ Chwirut and NIST (1979).
                                     │ 54 x 2 data, 1 model with 3 params.
                      ───────────────┼──────────────────────────────────────
                        'Chwirut1'   │ Chwirut and NIST (1979).
                                     │ 214 x 2 data, 1 model with 3 params.
                      ───────────────┼──────────────────────────────────────
                        'Lanczos3'   │ Lanczos and (1956).
                                     │ 24 x 2 data, 1 model with 6 params.
                      ───────────────┼──────────────────────────────────────
                        'Gauss1'     │ Rust and NIST (1996).
                                     │ 250 x 2 data, 1 model with 8 params.
                      ───────────────┼──────────────────────────────────────
                        'Gauss2'     │ Rust and NIST (1996).
                                     │ 250 x 2 data, 1 model with 8 params.
                      ───────────────┼──────────────────────────────────────
                        'DanWood'    │ Daniel and and Wood (1980).
                                     │ 6 x 2 data, 1 model with 2 params.
                      ───────────────┼──────────────────────────────────────
                        'Misra1b'    │ Misra and NIST (1978).
                                     │ 14 x 2 data, 1 model with 2 params.
                      ───────────────┼──────────────────────────────────────
                        'Kirby2'     │ Kirby and NIST (1979).
                                     │ 151 x 2 data, 1 model with 5 params.
                      ───────────────┼──────────────────────────────────────
                        'Hahn1'      │ Hahn and NIST (1979).
                                     │ 236 x 2 data, 1 model with 7 params.
                      ───────────────┼──────────────────────────────────────
                        'Nelson'     │ Nelson and (1981).
                                     │ 128 x 3 data, 1 model with 3 params.
                      ───────────────┼──────────────────────────────────────
                        'MGH17'      │ Osborne and (1972).
                                     │ 33 x 2 data, 1 model with 5 params.
                      ───────────────┼──────────────────────────────────────
                        'Lanczos1'   │ Lanczos and (1956).
                                     │ 24 x 2 data, 1 model with 6 params.
                      ───────────────┼──────────────────────────────────────
                        'Lanczos2'   │ Lanczos and (1956).
                                     │ 24 x 2 data, 1 model with 6 params.
                      ───────────────┼──────────────────────────────────────
                        'Gauss3'     │ Rust and NIST (1996).
                                     │ 250 x 2 data, 1 model with 8 params.
                      ───────────────┼──────────────────────────────────────
                        'Misra1c'    │ Misra and NIST (1978).
                                     │ 14 x 2 data, 1 model with 2 params.
                      ───────────────┼──────────────────────────────────────
                        'Misra1d'    │ Misra and NIST (1978).
                                     │ 14 x 2 data, 1 model with 2 params.
                      ───────────────┼──────────────────────────────────────
                        'Roszman1'   │ Roszman and NIST (1979).
                                     │ 25 x 2 data, 1 model with 4 params.
                      ───────────────┼──────────────────────────────────────
                        'ENSO'       │ Kahaner et al. (1989).
                                     │ 168 x 2 data, 1 model with 9 params.
                      ───────────────┼──────────────────────────────────────
                        'MGH09'      │ Kowalik et al. (1978).
                                     │ 11 x 2 data, 1 model with 4 params.
                      ───────────────┼──────────────────────────────────────
                        'Thurber'    │ Thurber and NIST (1979).
                                     │ 37 x 2 data, 1 model with 7 params.
                      ───────────────┼──────────────────────────────────────
                        'BoxBOD'     │ Box et al. (1978).
                                     │ 6 x 2 data, 1 model with 2 params.
                      ───────────────┼──────────────────────────────────────
                        'Rat42'      │ Ratkowsky and (1983).
                                     │ 9 x 2 data, 1 model with 3 params.
                      ───────────────┼──────────────────────────────────────
                        'MGH10'      │ Meyer and (1970).
                                     │ 16 x 2 data, 1 model with 3 params.
                      ───────────────┼──────────────────────────────────────
                        'Eckerle4'   │ Eckerle and NIST (1979).
                                     │ 35 x 2 data, 1 model with 3 params.
                      ───────────────┼──────────────────────────────────────
                        'Rat43'      │ Ratkowsky and (1983).
                                     │ 15 x 2 data, 1 model with 4 params.
                      ───────────────┼──────────────────────────────────────
                        'Bennett5'   │ Bennett et al. (1994).
                                     │ 154 x 2 data, 1 model with 3 params.
                      ───────────────┼──────────────────────────────────────
                        'Norris'     │ Norris and NIST.
                                     │ 36 x 2 data, 1 model with 2 params.
                      ───────────────┼──────────────────────────────────────
                        'Pontius'    │ Pontius and NIST.
                                     │ 40 x 2 data, 1 model with 3 params.
                      ───────────────┼──────────────────────────────────────
                        'NoInt1'     │ Eberhardt and NIST.
                                     │ 11 x 2 data, 1 model with 1 params.
                      ───────────────┼──────────────────────────────────────
                        'NoInt2'     │ Eberhardt and NIST.
                                     │ 3 x 2 data, 1 model with 1 params.
                      ───────────────┼──────────────────────────────────────
                        'Filip'      │ Filippelli and NIST.
                                     │ 82 x 2 data, 1 model with 11 params.
                      ───────────────┼──────────────────────────────────────
                        'Longley'    │ Longley (1967).
                                     │ 16 x 7 data, 1 model with 7 params.
                      ───────────────┼──────────────────────────────────────
                        'Wampler1'   │ Wampler (1970).
                                     │ 21 x 2 data, 1 model with 6 params.
                      ───────────────┼──────────────────────────────────────
                        'Wampler2'   │ Wampler (1970).
                                     │ 21 x 2 data, 1 model with 6 params.
                      ───────────────┼──────────────────────────────────────
                        'Wampler3'   │ Wampler (1970).
                                     │ 21 x 2 data, 1 model with 6 params.
                      ───────────────┼──────────────────────────────────────
                        'Wampler4'   │ Wampler (1970).
                                     │ 21 x 2 data, 1 model with 6 params.
                      ───────────────┼──────────────────────────────────────
                        'Wampler5'   │ Wampler (1970).
                                     │ 21 x 2 data, 1 model with 6 params.

Example of usage


   % Import a bi-dimensional dataset and perform
   % an easy nonlinear regression.
   [ d, url, ref, m, p ] = get_reference_data( 'Thurber' );
   [ y, x ] = mdeal( d );

   my       = mean( y );
   sy       = std(  y );
   tan_y    = tan( ( y - my ) / sy );

   xi       = linspace( min(x), max(x), 1000 ).';
   M        = bsxfun( @power, x,  [0:5] ); 
   Mi       = bsxfun( @power, xi, [0:5] ); 
   theta    = M \ tan_y
   y_est    = atan( M  * theta ) * sy + my;
   yi_est   = atan( Mi * theta ) * sy + my;

   certified_model = @(x) m{1,1}( p{1} , x )
   plot( x, y, 'o', xi, yi_est, '-g', xi, certified_model( xi ) )
   legend( 'data' , 'quick regressor' , 'certified model' ) 
   title( ref )

See also:
   mdeal



Keywords:
   reference data, internet-based utilities 



Version: 0.4.6

Support

The Mastrave modelling library is committed to provide reusable and general - but also robust and scalable - modules for research modellers dealing with computational science. You can help the Mastrave project by providing feedbacks on unexpected behaviours of this module. Despite all efforts, all of us - either developers or users - (should) know that errors are unavoidable. However, the free software paradigm successfully highlights that scientific knowledge freedom also implies an impressive opportunity for collectively evolve the tools and ideas upon which our daily work is based. Reporting a problem that you found using Mastrave may help the developer team to find a possible bug. Please, be aware that Mastrave is entirely based on voluntary efforts: in order for your help to be as effective as possible, please read carefully the section on reporting problems. Thank you for your collaboration.

This page is licensed under a Creative Commons Attribution-NoDerivs 3.0 Italy License.

This document is also part of the book:
de Rigo, D. (2012). Semantic Array Programming with Mastrave - Introduction to Semantic Computational Modelling. http://mastrave.org/doc/MTV-1.012-1