[% setvar title Heredoc contents %]

This file is part of the Perl 6 Archive

Note: these documents may be out of date. Do not use as reference!

To see what is currently happening visit http://www.perl6.org/

TITLE

Heredoc contents

VERSION

  Maintainer: Richard Proctor <richard@waveney.org>
  Date: 27 Aug 2000
  Last Modified: 1 Oct 2000
  Mailing List: perl6-language@perl.org
  Number: 162
  Version: 2
  Status: Frozen

ABSTRACT

The content of a Heredoc is normally included into the program verbatim. RFC 111 allows whitespace (and comments) on the terminator. This RFC covers the content. It introduces the <<< Enhanced Heredoc that removes whitespace and discusses the provision of other dequoting options in the library and documentation enhacements that should follow.

DESCRIPTION

Preamble

I originally wanted to remove leading whitespace from the lines in a heredoc, several other people wanted to remove whitespace equivalent to the shortest span of whitespace at the start of lines, or the whitespace from the first line. TomC pointed out ways to achieve the removal of the whitespace in current perl, although this sort of works (as long as the user is consistent about use of spaces and tabs). I would like to make life easier. This attempts to bring all these ideas together.

Discussion of options

There are several possible ways that have been discussed:

a) No Indenting - this is the current behaviour of <<.

b) Remove all leading whitespace from all lines of input.

This was not popular - no longer supported in this RFC.

c) Remove whitespace equivalent to the first line of the Heredoc

This was not popular - it did not fit many peoples requirements.

d) Remove whitespace equialent to the smallest whitespace - a Realistic option, this can be performed by using regexes and the dequote function.

e) Remove whitespace equialent to the terminator - a realistic option. This takes the whitespace off the content equivalent to that on the terminator and removes that amount of whitespace from the content. (This is now proposed for <<<).

f) Using a Heredoc and a regex to remove unwanted whitespace. TomC provided some examples showing how this would work, and howw this could handle many of the options above.

g) Using a Heredoc and a function to handle the dequoting of the content. This is essentially the same as a regex, but allows common types of dequoting to be written once.

Agreements

There are three things that have been agreed:-

Enhanced Heredoc

There will be two types of heredocs, the simple <<POD which just includes the contents of until the POD terminator and an enhanced <<<POD which removes whitespace equivalent to that on the terminator from each line of the content (case e above). (Note the enhacements to the terminator in RFC 111 apply in both cases).

Distribute a collection of dequote() mutations with perl

These are a set of enhanced dequoting options that can strip of all leading whitespace with all the options mentioned above, treatement for variable expansion and perhaps procedure call expansion. These would be part of the standard library. Names and content to be discussed.

[ NOT as part of this RFC ]

Mention the s/// tricks in the documentation

In the discussion that followed this RFC various ways using regexes were shown that could achieve most of what people want. Some of these should be included as examples in the documentation.

Tabs

Some debate took place on tabs in the whitespace. There were two considerations:

a) The problem comes with mixing editors - some use tabs for indented material some dont, some reduce files using tabs etc etc. [I move between too many editors]. Perl should DWIM. I think that treating tabs=8 as the default would work for most people, even those who set tabs at other values as long as they are consistent - a "use tabs 4" could be used by them if they want to get the same behaviour if they mix tabs and spaces.

b) Tabs are easy, don't expand them. Consider them as a literal character. This assums that the code author is going to use the same keystrokes to indent their here-doc text as the terminator, about as safe an assumption as any for tabs.

There was more support for the second case than the first.

dequoting example

TomC in the debate provided this example, which works as long as there are no inconsistent tabs in the whitespace.

	$poem = dequote<<EVER_ON_AND_ON;
	       Now far ahead the Road has gone,
		  And I must follow, if I can,
	       Pursuing it with eager feet,
		  Until it joins some larger way
	       Where many paths and errands meet.
		  And whither then? I cannot say.
			--Bilbo in /usr/src/perl/pp_ctl.c
	EVER_ON_AND_ON
	print "Here's your poem:\n\n$poem\n";

    The following C<dequote> function handles all these cases.  It
    expects to be called with a here document as its argument.  It
    looks to see whether each line begins with a common substring,
    and if so, strips that off.  Otherwise, it takes the amount of
    leading white space found on the first line and removes that
    much off each subsequent line.

	sub dequote {
	    local $_ = shift;
	    my ($white, $leader);  # common white space and common leading string
	    if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
		($white, $leader) = ($2, quotemeta($1));
	    } else {
		($white, $leader) = (/^(\s+)/, '');
	    }
	    s/^\s*?$leader(?:$white)?//gm;
	    return $_;
	}

example s/// tricks

     print <<"EOF" =~ /^\s*\| ?(.*\n)/g;
         | Attention criminal slacker, we have yet
         | to receive payment for our legal services.
         |
         |     Love and kisses
         |
     EOF

     print <<FOO =~ /^\s+(.*\n)/g;
             Attention, dropsied weasel, we are
             launching our team of legal beagles
             straight for your scrofulous crotch.

                     xx oo
     FOO

CHANGES

RFC 162 V2 - Added a lot more material and the conclusions from the list

IMPLENTATION

This should be a relatively simple addition to perl.

The <<< would just be to scan_heredoc in toke.c + docs in perl5.

The dequote mutations would be in the standard library.

REFERENCES

RFC111 - Here Docs Terminators

and lots of discussion on the list with significant input from Micael Schwern, Tom Christiansen, Eric Roode and others.