[% setvar title Behavior of empty regex should be simple %]
Note: these documents may be out of date. Do not use as reference! |
To see what is currently happening visit http://www.perl6.org/
Behavior of empty regex should be simple
Maintainer: Mark Dominus <mjd@plover.com> Date: 24 Aug 2000 Last Modified: 11 Sep 2000 Mailing List: perl6-language-regex@perl.org Number: 144 Version: 3 Status: Frozen
According to perlop:
If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead.
This behavior should be changed. If the PATTERN is empty, Perl should look for the empty string. (That is, if the PATTERN is empty, it should always match.)
Literal empty patterns, such as:
$s =~ // ;
are not the problem here. The real problem is that the special case is invoked for interpolated patterns also. For example,
chomp($pat = <STDIN>); $s =~ /\Q$pat\E/;
looks to see if $pat is a substring of $s, unless $pat is empty, in which case it matches $s against the last regex that was matched successfully. That regex might be far away, in some other module. If the far-away regex happened to contain backreference groups, the backreference variables will be set accordingly.
To make this safe in Perl 5, the programmer has to write something peculiar like
$s =~ /(?=)\Q$pat\E/;
to ensure that the regex, after interpolation, is never empty.
I propose that this 'last successful match' behavior be discarded entirely, and that an empty pattern always match the empty string.
The special behavior for empty patterns has never been particularly useful. For example, you could imagine code like this:
for $pat (@patterns) { if ($a =~ /$pat/ && $b =~ //) { # do something } }
This would be more efficient than the equivalent
for $pat (@patterns) { if ($a =~ /$pat/ && $b =~ /$pat/) { # do something } }
because $pat would be compiled only once per loop instead of twice. It is now more straightforward and efficient to do this sort of thing explicitly with the qr// operator:
@patterns = map qr/$_/, @patterns; for $pat (@patterns) { if ($a =~ /$pat/ && $b =~ /$pat/) { # do something } }
People sometimes propose the following use for the empty pattern special case: They have a pattern, and many strings, and they want to see if every string matches the pattern. This code works, but is inefficient:
sub match_all { my $pat = shift; for (@_) { return 0 unless /$pat/; } return 1; }
This is because /$pat/
must be recompiled for each string, or checked
to see whether recompilation is necessary.
This code does not work:
sub match_all { my $pat = shift; for (@_) { return 0 unless /$pat/o; } return 1; }
because $pat
changes with each call.
One solution is to use 'eval' here to generate the pattern matching
code (with /o
) at run time.
People have sometimes tried to use //
here, but usually without
success. The idea is:
sub match_all { my $pat = shift; # load $pat into 'last successfully matched' space for (@_) { return 0 unless //; } return 1; }
The problem here is that there is no way to designate $pat as the last
successfully matched regex without actually finding a string that
matches it. In the past people attempting this strategy have appeared
in comp.lang.perl.misc
asking how to find a string that matches a
given regex. As far as I know, no useful solutions have been offered.
(In fact, there may not be any such string. Consider the pattern
/a\bx/
for example.)
A better, simpler solution to this problem is to use the qr
operator:
sub match_all { my $pat = shift; $pat = qr($pat); for (@_) { return 0 unless /$pat/; } return 1; }
Any code that contains the innocent-looking
if (/\Q$string\E/) { ... }
is potentially booby-trapped. Such code is common. An example of this type appears in perlfaq6.
Rather than eliminating the special case entirely, alternative changes are sometimes proposed.
This behavior would be more useful than the current behavior and is sometimes proposed as an alternative.
For example, the application discussed in the section 'The feature was
not useful, II' above would be feasible if the empty pattern matched
the last-matched pattern, because it would no longer be necessary to
manufacture a matching string. But it is simpler and more
straightforward to solve the problem with qr//
.
Moreover, this alternative behavior retains all the drawbacks of the current behavior: It yields subtle and intermittent bugs and introduces strange action-at-a-distance effects.
In the past it has been proposed that the special behavior be
discarded for patterns with interpolated variables, but retained for
empty patterns that appear literally in the source. /\Q$string\E/
would look for $string, regardless of whether $string was empty,
becayse /\Q$string\E/
is not a literal empty pattern. But
$s =~ // ;
would still retain the special behavior and match $s against whatever was the last pattern to be successfully matched.
Since the feature is of marginal usefulness (especially now that
qr//
is available) it should be eliminated anyway to reduce code
and documentation bloat. However, see Translation Issues
below.
No special implementation is necessary.
Old Perl 5 scripts that may depend on this feature may be hard to translate unless the feature is implemented in Perl 6 also.
If the literal empty pattern were to retain the special behavior, then Perl 5 code like this:
if (/$FOO/) { ... }
could be translated to Perl 6 code like this:
if (do { my $_tmp = $FOO; ($_tmp eq '') ? // : /$_tmp/ }) { ... }
However, it's not clear that such translation is actually desirable.
perlop discussion of empty patterns, quoted above.