[% setvar title Access to optimisation information for regular expressions %]

This file is part of the Perl 6 Archive

Note: these documents may be out of date. Do not use as reference!

To see what is currently happening visit http://www.perl6.org/


Access to optimisation information for regular expressions


  Maintainer: Hugo van der Sanden (hv@crypt0.demon.co.uk)
  Date: 25 Sep 2000
  Last Modified: 30 Sep 2000
  Mailing List: perl6-language-regex@perl.org
  Number: 317
  Version: 2
  Status: Frozen


No comments to this except for a reference from RFC 72 (v4), which hopes that the concept will be extended to permit the caller to set study()-type information.

If/when time permits I'll try a patch to perl5 to see how easy it is and to discover whether anyone other than Peter and I want it.


Currently you can see optimisation information for a regexp only by running with -Dr in a debugging perl and looking at STDERR. There should be an interface that allows us to read this information programmatically and possibly to alter it.


At its core, the regular expression matcher knows how to check whether a pattern matches a string starting at a particular location. When the regular expression is compiled, perl may also look for optimisation information that can be used to rule out some or all of the possible starting locations in advance.

Currently you can find out about the optimisation information captured for a particular regexp only in a perl built with DEBUGGING, by turning on -Dr:

  % perl -Dr -e 'qr{test.*pattern}'
  Compiling REx `test.*pattern'
  size 8 first at 1
  rarest char p at 0
  rarest char s at 2
     1: EXACT <test>(3)
     3: STAR(5)
     4:   REG_ANY(0)
     5: EXACT <pattern>(8)
     8: END(0)
  anchored `test' at 0 floating `pattern' at 4..2147483647 (checking floating) minlen 11 
  Omitting $` $& $' support.
  Freeing REx: `test.*pattern'

For some purposes it would help to be able to get at this information programmatically: the test suite could take advantage of this (to test that optimisations occur as expected), and it could also be useful for enhanced development tools, such as a graphical regexp debugger.

Additionally there are times that the programmer is able to supply optimisation that the regexp engine cannot discover for itself. While we could consider making it possible to modify these values, it is important to remember that these are only hints: the regexp engine is free to ignore them. So there is a danger that people will misuse writable optimisation information to move part of the logic out of the regexp, and then blame us when it breaks.

Suggested example usage:

  % perl -wl
  use re;
  $a = qr{test.*pattern};
  print join ':', $a->fixed_string, $a->floating_string, $a->minlen;

.. but perhaps a single new method returning a hashref would be cleaner and more extensible:

  $opt = $a->optimisation;
  print join ':', @$opt{qw/ fixed_string floating_string minlen /};


Straightforward: add interface functions within the perl core to give access to read and/or write the optimisation values; add methods in re.pm that use XS code to reach the internal functions.


Prompted by discussion of RFC 72: The regexp engine should go backward as well as forward.