[% setvar title Internally, data is stored as UTF8 %]
|Note: these documents may be out of date. Do not use as reference!|
To see what is currently happening visit http://www.perl6.org/
Internally, data is stored as UTF8
Maintainer: Simon Cozens <firstname.lastname@example.org> Date: 25 Sep 2000 Mailing List: email@example.com Number: 294 Version: 1 Status: Developing
We need to settle on an internal data format; this RFC proposes that UTF8 should be that format.
Perl 5.6's Unicode support has been hampered by the fact that it was grafted onto the side of the old string support, and so it tried to handle both Unicode-encoded and non-Unicode data in the same structures; this made it an absolute swine to do any manipulation properly on these strings.
This could all be made a lot easier if we stuck to one single data format for internal representation, just as most other languages out there do. If we're going to have decent Unicode support, it naturally needs to be a UTF. So which one?
UTF32 is just not going to fly. It's too big and bulky. UTF16 is sensible, but there's probably a lot more legacy ASCII data out there than anything else, so it makes sense to propose UTF8 as a halfway house.
We'll need to get data into Unicode, and I have an RFC about that; we need to handle data internally, and I have an RFC about that. This RFC merely settles on the fact that we need a single internal data format for simplicity and that it should be UTF8.
The Unicode FAQ on UTFs and BOMs: (An excellent introduction to what UTFs are, what they look like and how they work.) www.unicode.org
RFC 295: Normalisation and
RFC ??: When UTF8 leaks out
RFC 312: Unicode Combinatorix
RFC 296: Getting Data Into Unicode Is Not Our Problem
RFC ??: Unicode Locales
RFC ??: Abstract the Internal String Interaction