[% setvar title Simplifying split() %]
Note: these documents may be out of date. Do not use as reference! |
To see what is currently happening visit http://www.perl6.org/
Simplifying split()
Maintainer: Sean M. Burke <sburke@cpan.org> Date: 30 Sep 2000 Mailing List: perl6-language@perl.org Number: 361 Version: 1 Status: Developing
Perl 5's split
function is messy, and should be simplified.
Perl 5 split does five things that I think are just annoying, and which I suggest be removed:
split '.', $foo
doesn't split on dot -- it's currently the same an
split /./, $foo
.) I suggest that split be changed to treat only
regexps as regexps, and everything else as literals.The last three of the above points speak for themselves. I will focus on the first two.
Most notably, I suggest that Perl 6 split('|', ...)
should work as
most people expect -- splitting on a literal bar. (Under Perl 5,
split('|', ...)
is synonymous with split(/|/, ...)
-- i.e.,
split on nullstring or nullstring [sic].)
So I suggest:
Perl 5: split /\|/, ... be synonymous with (and be better written as) Perl 6: split '|', ... # altho split /\|/, $bar... remains valid
And as to the second point, the removal of trailing blanks, I suggest:
Perl 5: @x = split /:/, $bar, -1; be synonymous with Perl 6: @x = split ':', $bar;
If you want to remove trailing fields, under Perl 6 you should have to do it explicitly:
Perl 5: @x = split /:/, $bar; be synonymous with Perl 6: @x = split ':', $bar; while(@x and !length $x[-1]) { pop @x }
I believe that the current behavior of removing trailing empty fields is unintuitive and surprising to learners; nothing about the concept of splitting a string into a list suggests removing trailing empties. (Moreover, I find that when I need to remove empties, it's not just the trailing ones; so the current behavior is rarely just what I want.)
I'll leave the C-coding details to the usual, capable implementers.
But I will note one minor complication with my first suggestion (that literals and regexps be distinguished). Consider:
Perl 6: @x = split $foo, $bar;
I suggest that the correct approach is to treat $foo's value as a literal, unless it holds an object of class Regexp (or a class derived from it?), in which case it should be treated as if the above were:
Perl 6: @x = split qr/$foo/, $bar;
In other words, in such cases it is not possible to know at compile time whether a given "split" operator means literal-split or regexp-split. I note that such cases are rare.
In conclusion, I'll note that there is a conservative alternative approach possible: if any of the above features of Perl 5 split seem really worth keeping, my suggestion for a "clean split" can be implemented as a separate operator called, for example, "cleave".
(Consider the precedent of Perl 5 chomp being added alongside Perl 4 chop, not replacing it.)
I would consider this suboptimal, though; I think that an operator with as straightforward and intuitive a name as "split" should behave in a straightforward and intuitive way.
Nil.