[% setvar title Numeric Value Ranges In Regular Expressions %]

This file is part of the Perl 6 Archive

Note: these documents may be out of date. Do not use as reference!

To see what is currently happening visit http://www.perl6.org/


Numeric Value Ranges In Regular Expressions


  Maintainer: David Nicol <perl6rfc@davidnicol.com>
  Date: 5 Sep 2000
  Last Modified: 22 Sep 2000
  Mailing List: perl6-language-regex@perl.org
  Number: 197
  Version: 2
  Status: Frozen


s/numberic/numeric in title (oops)

expansion of implemention/optimization section


round and square bratches mated around two optional comma separated numbers match iff a gobbled number is within the described range.


the syntax of the numeric range regex element

Given a passage of regex text matching

	($B1,$N1,$N2,$B2) = /(\[|\()(\-?\d*\.?\d*),(\-?\d*\.?\d*)(\]|\))/
	and ($N1 <= $N2 or $N1 eq '' or $N2 eq '')

we've got something we hereinafter call a "range."

what the range matches

A range matches, in the target string, a passage (\-?\d*\.?\d*) also known as a "number" if and only if the number is within the range. In the normal agebraic sense.

"within the range"

Square bracket means, that end of the range may include the range specifying number, and round parenthesis means, that end of the range includes numbers ov value up to (or down to) the number but not equal to it.


in the event that one or the other of the range specifying numbers is the empty string, that end of the range is unbounded. In the further event that we have defined infinity and negative infinity on our numbers, the square/round distinction will come into play.

The range end indicators are literal numbers, although they may be optimized immensely. No expression evaluation occurs w/in the range specifier, beyond the normal rules of double-quote interpolation.


To disambiguate ranges from character sets including digits, commas, and parentheses, either put a backslash on the right parentheses, or the comma, or arrange things so the left hand side of the comma is greater than the right hand side, that way this special case will not apply:

	/(37.3,200)/;	# matches any number x, 37.3 < x < 200
	/((37.3,200))/;	# matches any number x, 37.3 < x < 200 and saves it
	/([37,))/;	# matches and saves any number >= 37.
	/(37.3\,200)/;	# matches and saves the literal text '37.3,200'
	/[-35,9)]/;	# matches any number x, -35 <= x < 9; followed by a ]
	/[3-5,9)]/;	# matches a string containing any of 3,4,5,,,9 or )
	/[$low,$high]/;	# matches a number $low <= $_ <= $high, provided
			# low and high are both numerics.
	/[$low,${\highf(@data)}/;	# complex interpolation tricks

Tieing variables to be interpolated into range matches to types which always produce numbers is reccommended.


Yet more special cases for interpretation of ([)] in regular expressions.

We match regular expressions against

	($B1,$N1,$N2,$B2) = /(\[|\()(\-?\d*\.?\d*),(\-?\d*\.?\d*)(\]|\))/
	and ($N1 <= $N2 or $N1 eq '' or $N2 eq '')

and mark matching passages as ranges.

When applying regular expressions to numeric data, ranges may optimize away all of the digit lookahead we must currently indulge in to implement them in perl5. IOW, if we know a string literal containing interpolated numeric scalars is going to get matched by an expression containing ranges, we may be able to skip both the interpolation and the deinterpolation and go straight to multi-way numeric comparison.

If we have infinity defined, we'll have to look for its string representation.

And if a "simple, fast regex match mode" is defined, this pass could be switched in or out: maybe we want fast range matching.


It is possible that the syntax described in this document may help slice multidimensional containers. (RFC 191)