[% setvar title Efficient numerics with perl %]

This file is part of the Perl 6 Archive

Note: these documents may be out of date. Do not use as reference!

To see what is currently happening visit http://www.perl6.org/

TITLE

Efficient numerics with perl

VERSION

  Maintainer: pdl-porters team <pdl-porters@jach.hawaii.edu>
  Date: 16 Aug 2000
  Last Modified: 30 Sep 2000
  Mailing List: perl6-language-data@perl.org
  Number: 116
  Version: 3
  Status: Frozen

DISCUSSION

Not much happened. Most seemed to be dumbfounded by the density of the description. What we really need is a good manual or book on PDL as a reference. Short of this we have this RFC :(.

At the very least it should help to point out that good support for numerical functionality is more than just matrix multiplication on multidim arrays.

The RFCs on multi-dimensional arrays for Perl6 address some of the issues discussed in this RFC.

ABSTRACT

This RFC describes basic requirements of a numerical object/variable type for perl. A reference implementation (that could almost certainly do with some syntactical sweetening) is provided. It is currently available as the PDL distribution. A few key concepts of the reference implementation will be described. Any efforts to get numerical support into the perl core should provide means to either easily hook a numerical core into perl to achieve this or supply the features at the core design level. Current implementation aspects are sketched.

This RFC is not really asking for any feature in particular but rather a description of existing PDL features and implementation aspects to show what's possible and desired in a numerical module.

It is also the pdl-porters' contribution to overstuff everybodies brains [1].

DESCRIPTION

PDL provides an object class to compactly store and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing.

Apart from the object representation itself PDL implements three key concepts which its users have found critical for efficient and enjoyable numerical computing with perl:

Compact typed multi-dimensional arrays

The basic data object of PDL is a zero- to multidimensional array of numbers, all of the same data type (for efficiency reasons, it is desirable to store the numbers in the native binary format and consecutively in memory instead of having a Perl SV for each one). Perl6 could come up with a representation for such types as part of its core (or be sufficiently flexible in its SV design).

It is also critical to be able to specify a range of types, for example PDL offers support for single and double precision (float/double) as well as a variety of integer types ('byte', 'ushort', 'short', 'long'). A future implementation should support a complex type at the core level. In the current implemenetation this is difficult due to parser limitations of the code generator (PDL::PP) and C's lack of an intrinsic complex type.

Smart references and transformations: slicing and dicing

During data analysis it is often desirable to be able to refer to parts of the data, for example every other number in this vector without allocating new storage: if the original vector is 100Mb long, we don't want to allocate and copy the data for another 50Mb vector. For this purpose PDL supports "virtual piddles", which are pointers to parts of other piddles (the implementation is transparent, virtual piddles appearing just as any other piddle object to the user).

  $a = zeroes(1000,1000,100); # more than 100MB worth of data
  $b = $a->slice('0:-1:2');   # virtual piddle of every 2nd element
  $b *= 2;                    # changes the elements in $a

slice is just one example of this type of operation. $b contains only information how to access $a's data rather than a copy. Similarly, a transposed object with the first two dimensions exchanged is easily obtained:

  $c = $a->xchg(0,1); # exchange dimension 0 and 1

Again, $c only contains information how to access $a's data.

Similarly we can create an index:

  $i = pdl(1,9,27); # a number of indices
  $c = $a->index($i);

Changing $c changes $a (and vice versa). However in this case PDL makes no nice optimisation and manipulating $c uses memory for temps even though the changes are propagated back. The crucial point is that $c and $a are tightly linked objects that propagate changes in either to the other (in a way similar to dataflow). Find out more about this in the implementation section. An optimisation is in principle possible, and would be useful for all array types not just PDL ones.

Similar optimizations for sparse and non-rectangular arrays would be desirable, this would be especially useful in a perl for numerics.

Signatures: threading over elementary operations

In a large number of practical situations encountered in scientific numerical programming fundamental (simple) operations are iterated over many (small) chunks of (multi-dimensional) data.

A very simple examples is addition

  $c = $a + $b

of two multi-dimensional arrays (PDL objects). The fundamental operation is the addition of two scalar elements. This fundamental operation is just iterated over all pairs of elements of $a and $b and results placed element by element into $c.

Based on this observation PDL introduced the notion of PDL threading to implement such functionality in a flexible and efficient way: it provides tools (PDL::PP) to define those optimized innermost loops (the fundamental operations) and establishes a framework to efficiently execute loops over these fundamental operations automatically as prescribed by the dimensionality of the input data. Most importantly, those loops over the innermost fundamental operations can generally be run in parallel (i.e. run in separate threads). PDL currently has experimental support to exploit this observation.

Two examples using the inner product as the fundamental operation may serve to illustrate the idea. To convert an RGB image into a greyscale image an inner product with a triple of weights with each RGB triple is performed:

  $grey = inner $rgb, pdl(0.301,0.586,0.113);

The key is here that regardless of the number of trailing dimensions of the RGB object $rgb (which has a shape [3,.....]) the fundamental operation (the inner product) is speedily iterated over all triples of the data. So $rgb could be a single RGB pixel (shape [3]), a line of RGB data (shape [3,x]), an RGB image (shape [3,x,y]), a stack of RGB images ([3,x,y,z]) and so on.

Each function that can be used in such a way in PDL has a signature that symbolically states dimensions consumed and generated in a fundamental operation. Illustrating this by the two functions we have used above the signatures of addition and inner product are

  a(); b(); [o] c()    # addition, we use ';'s as separators
  a(n); b(n); [o] c()  # inner product  

In this symbolic notation [o] flags arguments which are created (output arguments). The symbols in parentheses name any dimensions of the variables involved. Empty parentheses refer to scalars (0-dim). So addition takes two scalar input arguments and generates one output scalar. The inner product consumes, as expected, two one-dimensional input args (symbolized by a(n) and b(n)) to produce a scalar output ([o] c()).

To complete the picture PDL has rules how these fundamental operations consume chunks of data of the input args to produce a muti-dimensional result of chunks of output data (the possibly multi-dimensional result piddle). The user decides exactly how those operations are iterated over his data by manipulating the dimensions of the input data appropriately. That is illustrated in the promised second example.

Those familar with linear algebra will know that a matrix product consists of the result of inner products between all pairs of row and column vectors of two input matrices. Not surprisingly we can therefore use the inner product to compute the matrix product of two PDL objects $a and $b:

 $c =  inner $a->dummy(1), $b->xchg(0,1)->dummy(2);

dummy and xchg are dimension manipulation functions (of the smart reference type) that are used to change the dimensionality of the input piddles so that the inner product is iterated over the desired pairs of sub-vectors and placed into the resulting piddle $c. We do not expect the casual reader to follow the details of this example but to appreciate the underlying idea.

The essence of PDL threading is here that (1) it is a useful feature in many practical situations (2) can be very efficiently implemented and (3) exposes parallelism in the underlying algorithm.

One could go as far as saying that the ideas underlying threading are PDL's main programming paradigm and the user is expected to rethink his approach accordingly for optimal performance.

Defining new PDL functions -- Glue code generation

One of the foremost design criteria was to implement a small set of core structures, routines and code generators that would depend heavily on each other. The rest of the package (implementing the real numerical functionality) should only depend on the external interface of these core functions.

Encapsulation can be achieved by consciously looking for code that would be repeated in subroutines and either making a new subroutine or a code generator do the job for the programmer. This way, when the repeated code suddenly needs to be different, there is no painstaking need of changing all instances of it - simply changing the subroutine or code generator will do it. We emphasize this because this kind of thinking seems to be forgotten all too often.

PDL introduced the code generator PDL::PP to create subroutines from concise descriptions to allow the programmer to focus on the algorithmic side of things. In many ways PDL::PP can be thought of as the XS of PDL: it is complex to learn, powerful, allows easy interfacing at the C-level and is, once mastered, straightforward to use. Beyond that using PDL::PP ensures that new routines follow uniform calling conventions and support PDL threading, slicing, etc.

In the current implementation it suffers from similar problems as XS in perl5: the syntax is strange and hard to learn, it is also feature incomplete (i.e. certain things can not be done right now). However, it implements a number of ideas that we find are absolutely crucial to pertain and develop further in a numerically friendly perl: algorithmic abstraction, encapsulation and provision of a tool for easy interfacing to the hundreds of existing numerical, graphics and IO libraries that are to be tapped into.

As an illustration for the abstraction that is achieved, here is the definition for an inner product function. This definition tells the code generator (PDL::PP) to generate the appropriate XS code that will then be compiled as part of a new loadable PDL module:

   pp_def(
        'inner',
        Pars => 'a(n); b(n); [o]c(); ', # the signature, see above
        Code => 'double tmp = 0;
                 loop(n) %{ tmp += $a() * $b(); %}
                 $c() = tmp;' );

The above code specifies firstly that the two first vectors must have the same first dimension and the third vector will not have that dimension. Inside the loop it is possible to use $a() without specifying the index because we have specified that we are looping over the dimension called n. Notice that we do not do any checking that $a and $b have the same sized first dimension - we just tell PDL::PP that we expect this to be so and it checks whether any data passed to the inner routine obeys this assumption and croaks if it does not.

After installation the user can then use

	$c = inner $a,$b;

to compute the inner product of two piddles.

The code clearly illustrates the abstraction achieved, making no reference to an actual data type or how the vectors are stored. This kind of abstraction is similar to the ideas used in the C++ Standard Template Library. As a bonus, the use of PP means a lot of the tedious housekeeping of programming can be automated. Almost all index troubles vanish when writing and using these subroutines: all the necessary index checks are done automatically by the code generator.

The code generator also ensures that calling the function with arguments that have more dimensions than the function expects works as well - the function is simply repeatedly called in a loop over all these dimensions (threaded over the dimensions in PDL parlance), providing a fast way of performing many similar operations in a row without going back to the Perl level (with the latest versions of PDL, it is possible to use multiple OS-level threads for these loops, enabling an easy way of gaining from a multiprocessor computer). This threading allows many algorithms to be expressed concisely and PDL scripts rarely have cause for loops over elements on the Perl level.

In a future perl this functionality might be integrated with a perl (JIT?) compiler. Code could be specified in perl(-like?) syntax and compiled into an optimized C level transformation as required.

If this sounds rather vague it is not by accident -- we still lack a clear idea how to optimally achieve this goal. A nice perlish syntax to replace the PP gibberish should certainly be the starting point.

IMPLEMENTATION

The PDL core is currently suffering from similar problems as perl 5: it works but is poorly documented and many hacks make it increasingly difficult to modify and maintain. But it works ;).

In any case, some of the perl6 guys have asked for a rundown of the principal guts of the implementation. Here it goes...

Compact, typed multi-dimensional arrays

On the Perl side, an opaque scalar reference turned out to be the most practical representation: Perl5 allows overloading of operators so $c=$a+$b can mean "add the objects (here representing matrices) $a and $b and store the result in the scalar $c with whatever type that the result has". As with any perl object, new objects can be easily derived from piddles for specialised purposes.

In Perl6 it may be desirable to use @syntax for all arrays - whether compact, implementing smart references, or not. The use of $a for PDL arrays and @a for normal perl arrays is confusing, we would like to see some unification. Ideally perl arrays would support some, or all, of the PDL features whether they are multi-dimensional or not.

Smart references and transformations: slicing and dicing

Smart references and most other fundamental functions operating on piddles are implemented via transformations which are represented by the type pdl_trans in PDL.

A transformation links input and output piddles and contains all the infrastructure that defines how

In general, executing a PDL function on a group of piddles results in creation of a transformation of the requested type that links all input and output arguments (at least those that are piddles). In PDL functions that support data flow between input and output args (e.g. slice, index) this transformation links parent (input) and child (output) piddles permanently until either the link is explicitly broken by user request (sever at the perl level) or all parents and childen have been destroyed. In those cases the transformation is lazy-evaluated, e.g. only executed when piddle values are actually accessed.

In non-flowing functions, for example addition (+) and inner products (inner), the transformation is installed just as in flowing functions but then the transformation is immediately executed and destroyed (breaking the link between input and output args) before the function returns.

It should be noted that the close link between input and output args of a flowing function requires that piddle objects that are linked in such a way be kept alive beyond the point where they have gone out of scope from the point of view of perl:

  $a = zeroes(20);
  $b = $a->slice('2:4');
  undef $a;    # last reference to $a is now destroyed

Although $a should now be destroyed according to perl's rules the underlying *pdl must actually only be freed when $b also goes out of scope (since it still references internally some of $a's data). This example demonstrates that such a dataflow paradigm between PDL objects necessitates a special destruction algorithm that takes the links between piddles into account and couples the lifespan of those objects. The non-trivial algorithm is implemented in the function pdl_destroy in pdlapi.c.

What's in a pdl_trans

All transformations are implemented as structures

  struct XXX_trans {
	int magicno; /* to detect memory overwrites */
	short flags; /* state of the trans */
	pdl_transvtable *vtable;   /* the all important vtable */
	void (*freeproc)(struct pdl_trans *);  /* Call to free this trans
		(in case we had to malloc some stuff dor this trans) */
        pdl *pdls[NP]; /* The pdls involved in the transformation */
	int __datatype; /* the type of the transformation */
        /* in general more members
        /* depending on the actual transformation (slice, add, etc)
	 */
  };

The transformation identifies all pdls involved in the trans

  pdl *pdls[NP];

with NP depending on the number of piddle args of the particular trans. It records a state

  short flags;

and the datatype

  int __datatype;

of the trans (to which all piddles must be converted unless they are explicitly typed). Most important is the pointer to the vtable that contains the actual functionality

 pdl_transvtable *vtable;

The vtable structure in turn looks something like (slightly simplified from pdl.h for clarity)

  typedef struct pdl_transvtable {
	pdl_transtype transtype;
	int flags;
	int nparents;   /* number of parent pdls (input) */
	int npdls;      /* number of child pdls (output) */
	char *per_pdl_flags;  /* optimization flags */
	void (*redodims)(pdl_trans *tr);  /* figure out dims of children */
	void (*readdata)(pdl_trans *tr);  /* flow parents to children  */
	void (*writebackdata)(pdl_trans *tr); /* flow backwards */
	void (*freetrans)(pdl_trans *tr); /* Free both the contents and it of
					the trans member */
	pdl_trans *(*copy)(pdl_trans *tr); /* Full copy */
  	int structsize;
	char *name; /* For debuggers, mostly */
  } pdl_transvtable;

We focus on the callback functions:

  	void (*redodims)(pdl_trans *tr);

redodims will work out the dimensions of piddles that need to be created and is called from within the API function that should be called to ensure that the dimensions of a piddle are accessible (pdlapi.c):

   void pdl_make_physdims(pdl *it)

readdata and writebackdata are responsible for the actual computations of the child data from the parents or parent data from those of the children, respectively (the dataflow aspect). The PDL core makes sure that these are called as needed when piddle data is accessed (lazy-evaluation). The general API function to ensure that a piddle is up-to-date is

  void pdl_make_physvaffine(pdl *it)

which should be called before accessing piddle data from XS/C (see Core.xs for some examples).

freetrans frees dynamically allocated memory associated with the trans as needed and copy can copy the transformation.

The transformation and vtable code is hardly ever written by hand but rather generated by PDL::PP from concise descriptions (see the example of the definition of inner above).

Certain types of transformations can be optimized very efficiently obviating the need for explicit readdata and writebackdata methods. Those transformations are called pdl_vaffine. Most dimension manipulating functions (e.g., slice, xchg) belong to this class.

The basic trick is that parent and child of such a transformation work on the same (shared) block of data which they just choose to interpret differently (by dusing different dims, dimincs and offs on the same data, compare the pdl structure above). Each operation on a piddle sharing data with another one in this way is therefore automatically flown from child to parent and back -- they are reading and writing the same block of memory. This is currently not perl thread safe -- no big loss since the whole PDL core is not reentrant (perl threading != PDL threading).

A more detailed description of vaffines will be added in a later version of this document.

Signatures: threading over elementary operations

Most of that functionality is implemented in the file pdlthread.c. The PDL::PP generated functions (in particular the readdata and writebackdata callbacks) use this infrastructure to make sure that the fundamental operation implemented by the trans is performed in agreement with PDL's threading semantics.

Defining new PDL functions -- Glue code generation

Please, see PDL::PP and examples in the PDL distribution. Implementation and syntax are currently far from perfect.

As the above example on the definition of inner suggests, the principal functionality is implemented through the function pp_def which takes a hash of directives as a concise specification of the desired functionality. The Code and RedoDimsCode fields contain the algorithmic specification in a home grown pseudo code syntax that has some faint similarity to a mixture of perl, XS and C. Here we could do with some good ideas for improvement. From this description PDL::PP produces C and XS code (which will go away in a future perl?) by processing the hash through a translation table.

The ability to write the fundamental algorithm in a perlish way (or even in perl) that is optimized by (ideally JIT) compilation is what we are really after.

SUMMARY

We have explained the key ideas which make PDL work, and work well. It remains to be determined which of these might belong in the core of perl6, in order to make PDL work more seamlessly, and which belong in the module.

Our take on this: perl6 should support a compact, numerical array type which is overloadable and can be manipulated with nice syntax. A key point is to abolish the visual distinction between perl arrays (@x) and PDL arrays ($x) which is confusing but forced upon us by the current object paradigm. There is also considerable scope for improving syntax to make numerical Perl more user-friendly - this is explained in more detail by the RFCs on Ranges and Default Methods.

Compact arrays should support virtual slices when being manipulated.

It seems to us that the more detailed/generalised PDL threading paradigm is beyond the core and belongs in a module. Some syntactical hooks would be appreciated though so one could at least say code like this at the Perl level:

        sub mysub : PDL {
                my($a, $b) = @_;
                signature('a(n), b(n)');
                loop(n) {   # n a bareword, loop has prototype
                # Threaded code
                }
	}

SEE ALSO

PDL (pdl.sourceforge.net

pdl.perl.org

pdl.sourceforge.net

perl6 RFCs on Ranges and Default Methods.

Matlab

IDL (the one from rsinc)

Yorick

Mathematica

NumPython starship.python.net

Hords of other array-oriented languages, try SAL.KachinaTech.COM

RFCs on multidim arrays 202-207, 231

[1] LW:

    Let me just add that I don't mind the brainstorming at all.  To be a
    good language designer, you have to stuff your brain with what you
    *could* do before you can reasonably choose what you *will* do.  At the
    moment, I'm not only trying to follow along here; I'm also reading all
    the books on computer languaes I can get my hands on--not just to look
    for ideas to steal, but also to remind myself of the mindset Perl was
    designed to escape.

    In fact, I'd go as far as to say that it's imperative that you
    overstuff your brain to the point where you no longer feel tempted to
    overstuff the language.  Hypotheticality is your friend.