[% setvar title Standardize input record separator (for portability) %]

This file is part of the Perl 6 Archive

Note: these documents may be out of date. Do not use as reference!

To see what is currently happening visit http://www.perl6.org/

TITLE

Standardize input record separator (for portability)

VERSION

  Maintainer: N. Hao Ching <spiderboy@cpan.org>
  Date: 10 Aug 2000
  Mailing List: perl6-language-io@perl.org
  Number: 69
  Version: 3
  Status: Developing

ABSTRACT

The default input record separator is not safe for all input files on all platforms. There should also be support for Unicode line separator (U+2028) and paragraph separator (U+2029).

DESCRIPTION

The input record separator should match the platform's C compiler mappings of "\r\n" (CRLF), "\n" (LF) and "\r" (CR), which are often (but not always, e.g., EBCDIC-based platforms [Peter Prymmer]):

    000D 000A
    000A
    000D

For Unicode-capable platforms, the input record separator should also match:

    2028
    2029

Given this input file:

    D O S CR LF    0044 004F 0053 000D 000A
    U n i x  LF    0055 006E 0069 0078 000A
    M a c CR       004D 0061 0063 000D
    l i n e  LS    006C 0069 006E 0065 2028
    p a r a  PS    0070 0061 0072 0061 2029
    l i n e        006C 0069 006E 0065

This should work as expected on as many platforms as possible:

    my @lines = <FH>;

The @lines array should contain six elements.

Bart Lateur has suggested differentiating between ASCII-compatible and UTF-16. Perhaps a flag?

    {
        local $utf16 = 1;
        my @lines = <FH>;
    }

The binmode function should treat data as binary and not translate line disciplines. (No one objects to this so far?)

Whether $/ will remain in Perl 6 is uncertain, so this is not necessarily about $/.

IMPLEMENTATION

Bart Lateur suggested using a dedicated DFA regex engine.

REFERENCES

    perlport: www.pudge.net

    perlunicode
www.activestate.com

    RFC 58: dev.perl.org

    Larry Wall's response to RFC 58 on 8 Aug 2000
    www.mail-archive.com
    www.mail-archive.com

    Simon Cozens' work on line disciplines in 5.6 binmode
www.xray.mpe.mpg.de