[% setvar title hooks in pack/unpack %]
Note: these documents may be out of date. Do not use as reference! |
To see what is currently happening visit http://www.perl6.org/
hooks in pack/unpack
Maintainer: Ilya Zakharevich <ilya@math.ohio-state.edu> Date: 16 September 2000 Mailing List: perl6-language-data@perl.org Number: 250 Version: 1 Status: Developing
How to specify pack()/unpack() user-recipes.
The following enhancement covers almost all the of the remaining ways to store binary data, but it is substantially higher on the "bizzareness" scale:
'R[ID,TYPE]'
in a TEMPLATE: during unpack() extracts a value as TYPE
and uses it to choose between several choices of templates. Behaves as
if R[ID,TYPE]
is replaced by the chosen template. The templates to
chose from are looked in/via the array/hash/subroutine referenced by
$unpack::recipes[ID]
.
Similarly, 'R{ID,TYPE}'
uses $unpack::recipes{ID}
instead.
The returned template may be replaced by a reference to array of the form
[TEMPLATE, \&POSTPROCESS]
In such a case a value is extracted with TEMPLATE, then is postprocessed
by calling POSTPROCESS($extracted)
, the return value replaces the
extracted value.
Optional: one should be able to specify that some bits of the last
extracted value which are ignored: 'R[ID,FROM..TO,TYPE]'
uses bits
from FROM to TO (shifted right by FROM) as the index. 'R[ID,mod,TYPE]'
uses the last extracted value modulo the length of the array referenced by
$unpack::recipes[ID].
This extracts UTF8 chars of up to 2-byte encoded length:
sub utf8_2byte_postprocess { (($_[0] & 0x1F00) >> 2) | ($_[0] & 0x3F) } local $unpack::recipes{UTF8} = [ 'C', [ 'n', \&utf8_2byte_postprocess ] ]; $n = unpack 'R{UTF8,7..7,C}', $str;
Symmetrically, during pack() 'R[ID]'
etc. make ID lookup in %pack::recipes
or @pack::recipes. The resulting array/hash/subroutine reference is
indexed-by/called-with the next argument to pack. The result is appended
to the target string.
The symmetric example to the UTF8 example above:
sub utf8_2byte_save { return pack "C", $_[0] if $_[0] <= 127; pack 'n', 0x80C0 | (($_[0] & 0x7C0)<<2) | ($_[0] & 0x3F); }; local $pack::recipes{UTF8} = \&utf8_2byte_save; $str = pack 'R{UTF8}', $n;
Optionally, to allow a usage of the same TEMPLATE during pack() and during
unpack(), anything after the first comma in the argument to 'R'
is ignored:
$str = pack 'R{UTF8}', $n; $str = pack 'R{UTF8,7:7,C}', $n;
are equivalent.
The usage of pack() with only one 'R'
type inside is obviously an
overkill, but it comes very handy if 'R'
is a part of a more complicated
construct, as in
$str = pack 'N/R{UTF8,7:7,C}', @array;
or
$str = pack 'N/( \g[ N/R{UTF8,7:7,C} ] )', @array_of_arrays;
In addition to "funny ways to encode simple data", this same proposal allows handling of streams which consist of repeated blocks of the form
int type; union { struct type_0 t0; struct type_1 t1; ... }
as well as many other similar problems.
None.
Straightforward.
RFC 142: Enhanced Pack/Unpack
RFC 246: pack/unpack uncontrovercial enhancements
RFC 248: enhanced groups in pack/unpack