[% setvar title Standardize input record separator (for portability) %]
Note: these documents may be out of date. Do not use as reference! |
To see what is currently happening visit http://www.perl6.org/
Standardize input record separator (for portability)
Maintainer: N. Hao Ching <spiderboy@cpan.org> Date: 10 Aug 2000 Mailing List: perl6-language-io@perl.org Number: 69 Version: 3 Status: Developing
The default input record separator is not safe for all input files on all platforms. There should also be support for Unicode line separator (U+2028) and paragraph separator (U+2029).
The input record separator should match the platform's C compiler mappings of "\r\n" (CRLF), "\n" (LF) and "\r" (CR), which are often (but not always, e.g., EBCDIC-based platforms [Peter Prymmer]):
000D 000A 000A 000D
For Unicode-capable platforms, the input record separator should also match:
2028 2029
Given this input file:
D O S CR LF 0044 004F 0053 000D 000A U n i x LF 0055 006E 0069 0078 000A M a c CR 004D 0061 0063 000D l i n e LS 006C 0069 006E 0065 2028 p a r a PS 0070 0061 0072 0061 2029 l i n e 006C 0069 006E 0065
This should work as expected on as many platforms as possible:
my @lines = <FH>;
The @lines array should contain six elements.
Bart Lateur has suggested differentiating between ASCII-compatible and UTF-16. Perhaps a flag?
{ local $utf16 = 1; my @lines = <FH>; }
The binmode function should treat data as binary and not translate line disciplines. (No one objects to this so far?)
Whether $/ will remain in Perl 6 is uncertain, so this is not necessarily about $/.
Bart Lateur suggested using a dedicated DFA regex engine.
perlport: www.pudge.net perlunicode www.activestate.com RFC 58: dev.perl.org Larry Wall's response to RFC 58 on 8 Aug 2000 www.mail-archive.com www.mail-archive.com Simon Cozens' work on line disciplines in 5.6 binmode www.xray.mpe.mpg.de