[% setvar title Implementation of hash iterators %]

This file is part of the Perl 6 Archive

Note: these documents may be out of date. Do not use as reference!

To see what is currently happening visit http://www.perl6.org/

TITLE

Implementation of hash iterators

VERSION

  Maintainer: Tom Hughes <tom@compton.nu>
  Date: 20 Aug 2000
  Last Modified: 28 Sep 2000
  Mailing List: perl6-internals@perl.org
  Number: 136
  Version: 3
  Status: Frozen

ABSTRACT

Perl 5 makes very limited use of iterators internally. It has long been suggested that certain common idioms could be implemented much more efficiently using iterators and this RFC suggests some possible mechanisms for achieving this goal.

DESCRIPTION

How iterators work in perl 5

In perl 5 every hash has exactly one iterator, which can be used explicitly by the programmer using the each function and which is also used by perl implicitly to implement the keys and values functions.

This is confusing and dangerous as you can easily get an action at a distance effect whereby using each/keys/values in a function affects the iteration being performed by the caller.

How iterators might work in perl 6

In perl 6 the keys and values functions should no longer use the same iterator as the each function - each use of keys and values should use it's own private iterator instead.

In fact those iterators should probably be lazy so that a piece of code like this:

  foreach (keys %x)
  {
    ...
  }

does not immediately iterate over the whole hash, pushing copies of all the keys onto the stack; but should instead create an iterator object of some sort that can be passed to the loop op which will then pull off one value for each pass of the loop.

This is significantly more efficient for large hashes, especially if the loop exits early.

Freezing state for keys and values efficiently

The only problem with this system is that the current implementation of keys and values causes the result to be frozen so that changes to the hash made in the loop body do not effect the values being iterated over.

I believe this problem can be solved by using the vtable for the hash to wrap any mutating functions with a completion routine that will advance the iterator to completion, creating a temporary list of copied keys/values that it can then continue to iterate over.

Obviously after this completion was done the wrapper routine would call the original vtable routine to update the hash. It would also remove itself from the vtable so you would only pay the cost of the wrapper on the first change to the hash. Equally you would only pay for building the temporary list when it was necessary - ie when you change the hash inside the loop.

For keys the iterator would need to trap insertions and deletions and for values it would need to trap these and also modifications of existing values.

Because this mechanism can sit above the main hash implementation it should work equally for both builtin hashes and for things like tied hashes where implementation details are unknown to the core.

Handling tied hashes

The current tied hash interface is only capable of supporting a single iterator via the FIRSTKEY and NEXTKEY functions.

In order to support the enhanced iteration described above it is suggested that the tied hash interface be extended but the precise details of that are a matter for a language RFC rather than this document.

Certain underlying structures with which tied hashes are used (for example some types of dbm file) only support a single iterator which makes support of an enhanced tied hash interface difficult.

It is however possible to work around this limitation using some form of buffering. This could be done either by the tied hash implementation or, more helpfully, at the perl level if the tied hash indicated that it could only support a single iterator. There are basically three ways that this buffering could work:

Overall I believe that the third of these provided the best results - it is free until a second iterator is created and then only buffers as much as is necessary to support the goal of multiple iteration.

CHANGES

REFERENCES

RFC 123: Builtin: lazy

perlfunc manpage for discussion of each, keys and values