The module "mstream" of the Mastrave modelling library

 

Daniele de Rigo

 


Copyright and license notice of the function mstream

 

 

Copyright © 2008,2009,2010,2011,2012,2013,2014 Daniele de Rigo

The file mstream.m is part of Mastrave.

Mastrave is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Mastrave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Mastrave. If not, see http://www.gnu.org/licenses/.

Function declaration

 

 

 answer = mstream( func               ,
                   in_streams         , 
                   out_streams        , 
                   block_size         , 
                   precision   = []   ,
                   skip        = []   ,
                   arch        = []   ,
                   header      = []   ,
                   header_fill = []   ,
                   same_size   = true )

Description

 

 

Utility to apply a data-transformation function func to a set of input streams in_streams and save the result inside a set of output streams out_streams .

The data-transformation function is identified by the function handle func and is expected to receive as input a matrix of block_size rows (or less) and n_in columns, where n_in is the number of input streams. The expected output of func is another matrix of block_size rows (or less, in accordance with the input matrix) and n_out columns, where n_out is the number of output streams.

The n_in input streams can be one of the following: - the n_in columns of a matrix passed as in_streams argument; - the n_in files identified by the corresponding file IDs passed
as in_stream argument in the form of a cell-array of file IDs
(which are integer numbers: see @fopen, @fread and @fwrite for
more details); - the n_in files identified by the corresponding file paths passed
as in_stream argument in the form of a cell-array of strings.

The same alternatives apply to the n_out output streams.

func is iteratively applied to blocks of the input streams and the result is iteratively saved in corresponding blocks of the output streams. block_size identifies the number of rows to be processed per iteration.

The optional input arguments precision , skip and arch have the same meaning as respectively the third, fourth and fifth argument of @fread and @fwrite.

The optional input argument header allows the desired number of bytes to be specified in order for them to be skipped before reading/writing the first block of values respectively from in_streams and out_stream . This may be useful so as to manipulate file formats containing data in binary format, when the data are prefixed with an initial set of bytes (of known size) used as metadata (header). For example, the Binary Terrain file format (Discoe, 2007) uses a fixed header of 256 bites, followed by single-precision binary data.

When files are to be written as requested with out_streams , the optional argument header_fill allows a reference file to be read so as for its header to be copied in the files defined in out_streams .

References
Discoe, B.: The BT (Binary Terrain) File Format, 2007
http://www.vterrain.org/Implementation/Formats/BT.html

Input arguments

 

 


 func               ::function_handle::
                    Function handle identifying the data-transformation
                    function to be applied to  in_streams  to transform
                    them in  out_streams .

 in_streams         ::matrix|cellnumstring-1::
                    Input streams.  If a cell-array of strings (file 
                    paths) is passed, the  corresponding files will be
                    opened in read-mode and closed at the end of the
                    data-transformation.

 out_streams        ::matrix|cellnumstring-1::
                    Output streams.  If a cell-array of strings (file 
                    paths) is passed, the  corresponding files will be
                    opened in write-mode and closed at the end of the
                    data-transformation.

 block_size         ::scalar_index::
                    Number of rows to be processed within each iteration.
                    See the description of the second argument of @fread.  

 precision          ::string::
                    Type of data to be read and write.
                    If  in_streams  (or  out_streams ) is a matrix, 
                     precision  has no effects on its manipulation.
                    See the description of the third argument of @fread
                    and @fwrite functions.  If empty, the default
                    value is 'double'.  
                    If omitted, the default value is [] (empty).

 skip               ::scalar_numel::
                    Number of bytes to skip before reading/writing each 
                    value respectively from  in_streams  and  out_stream .
                    If  in_streams  (or  out_streams ) is a matrix, 
                     skip  has no effects on its manipulation.
                    See the description of the fourth argument of @fread
                    and @fwrite functions.  If empty, the default
                    value is 0.
                    If omitted, the default value is [] (empty).

 arch               ::string::
                    Type of data-format of the files (order of bytes). 
                    If  in_streams  (or  out_streams ) is a matrix, 
                     arch  has no effects on its manipulation.
                    See the description of the fifth argument of @fread
                    and @fwrite functions.  If empty, the default
                    value is 'native'.
                    If omitted, the default value is [] (empty).

 header             ::scalar_numel::
                    Number of bytes to skip before reading/writing the
                    first block of values respectively from  in_streams 
                    and  out_stream .  Passing this optional argument
                    allows to manipulate file formats containing data
                    in binary format even when the data are prefixed 
                    with an initial set of bytes  of known size  
                    used as metadata (header).     
                    If  in_streams  (or  out_streams ) is a matrix, 
                     skip  has no effects on its manipulation.
                    If empty, the default value is 0.
                    If omitted, the default value is [] (empty).

 header_fill        ::string::
                    Reference file to be read so as for its header to be
                    copied as header in the files defined in  out_streams .
                    If empty or omitted, the header in  out_streams  are
                    filled with spaces.

 same_size          ::scalar,logical::
                    If set to  true, this flag requires the size of the
                    output blocks to be the same of that of the input ones.
                    In particular, the number of rows of the input and 
                    output matrix is expected to be the same in each block.
                    Different blocks may still have different number of
                    rows. If omitted, the default value is true.


Example of usage

 

 

   % Creating a series of files from a pre-existing matrix
   vals  = reshape( [1:40].', 10, 4 ) 
   func  = @(x)10*x
   mstream( func, vals, {'t1','t2','t3','t4'}, 3 )

   % Loading a series of files as a matrix
   vals2 = mstream( @(x)x, {'t1','t2','t3','t4'}, [], 3 )

   % Transforming a series of files in another series of files
   func2 = @(x)x/10
   mstream( func2, {'t1','t2','t3','t4'}, {'T1','T2','T3','T4'}, 3 )
   vals3 = mstream( @(x)x, {'T1','T2','T3','T4'}, [], 3 )
   % Functions changing the order of rows are allowed.  However,
   % for such functions the result of the data-transformation depends
   % on the value passed as  block_size 
   func2_rev = @(x)x(end:-1:1,:)/10
   mstream( func2_rev, {'t1','t2','t3','t4'}, {'T1','T2','T3','T4'}, 3 )
   vals3 = mstream( @(x)x, {'T1','T2','T3','T4'}, [], 3 )

   % Transforming a series of files in another file
   func3 = @(x)sum(x,2)
   mstream( func3, {'T1','T2','T3','T4'}, {'Tall'}, 3 )
  
   % Reading a file with the classic @fopen -> @fread -> @fclose
   fid   = fopen( 'Tall', 'rb' )
   vals4 = fread( fid, 'double' )
   fclose( fid )

   % Consistency check
   assert( vals4 == func3( vals3    ) )
   assert( vals4 == sum(   vals3, 2 ) )


   % Advanced usage
   % Importing/manipulating a Binary Terrain (Discoe, 2007) file. 
   % Obtaining a reference file (Island of Maui - USGS 10m DEM)
   url      = 'http://vterrain.org/repo/elev/maui_1k.zip'
   file     = 'maui_1k.bt'
   command  = [ 'wget -O - ' url '| gunzip -c > ' file ]
   download = @(url,file) unix([ 'wget -O - ' url '| gunzip -c > ' file ])
   assert( download(url,file) == 0 )

   % Utility function to read the Binary Terrain file size
   function  siz = btsize( fn ) 
      fid        = fopen( fn, 'rb', 'ieee-le' ); fseek(  fid, 10, 'bof' ); 
      siz([2 1]) = fread( fid, 2, 'int' );       fclose( fid );
   end

   siz      = btsize( file )
   data     = mstream( @(x)x, {file}, [], 1e5, 'float', 0, 'ieee-le', 256 );
   data     = reshape( data, siz );
   imagesc( data(end:-1:1,:) )
   title( 'Island of Maui - USGS 10m DEM' )


   diff50   = @(x)reshape(                                       ...
      mblk_fun( reshape(x,siz(1),[]), @(x)x-mean(x), 50 ), [], 1 ...
   )
   fil2     = 'diff50.bt';
   % Using a reference input file to fill the output files' header
   mstream( diff50, {file}, {fil2}, siz(1)*50,                   ...
      'float', 0, 'ieee-le', 256, file                           ...
   );
   data     = mstream( @(x)x, {fil2}, [], 1e5, 'float', 0, 'ieee-le', 256 );
   data     = reshape( data, siz );
   imagesc( data(end:-1:1,:) )
   title( 'Island of Maui - USGS 10m DEM: difference from 50x50 mean' )


See also:
   mdeal, mbash



Keywords:
   data-transformation, streams, blocks



Version: 0.8.4

Support

 

 

The Mastrave modelling library is committed to provide reusable and general - but also robust and scalable - modules for research modellers dealing with computational science.  You can help the Mastrave project by providing feedbacks on unexpected behaviours of this module.  Despite all efforts, all of us - either developers or users - (should) know that errors are unavoidable.  However, the free software paradigm successfully highlights that scientific knowledge freedom also implies an impressive opportunity for collectively evolve the tools and ideas upon which our daily work is based.  Reporting a problem that you found using Mastrave may help the developer team to find a possible bug.  Please, be aware that Mastrave is entirely based on voluntary efforts: in order for your help to be as effective as possible, please read carefully the section on reporting problems.  Thank you for your collaboration.

Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Daniele de Rigo

This page is licensed under a Creative Commons Attribution-NoDerivs 3.0 Italy License.

This document is also part of the book:
de Rigo, D. (2012). Semantic Array Programming with Mastrave - Introduction to Semantic Computational Modelling. http://mastrave.org/doc/MTV-1.012-1


Valid XHTML 1.0 Transitional