[% setvar title Implementation of Threads in Perl %]
Note: these documents may be out of date. Do not use as reference! |
To see what is currently happening visit http://www.perl6.org/
Implementation of Threads in Perl
Maintainer: Bryan C. Warnock <bwarnock@gtemail.net> Date: 1 Aug 2000 Last Modified: 28 September 2000 Mailing List: perl6-internals@perl.org Number: 1 Version: 4 Status: Frozen
No significant changes. Freezing.
Perl 6 should be built around threads from the beginning.
Perl 5 attempted (with relatively good success) to implement threads atop the current architecture. It did, unfortunately, leave several gaps, traps, and "features" in heavy concurrency uses. These weaknesses could be fixed if Perl was built with threading from the start.
All Perl programs are threaded. Most just only have one.
Impatience, Hubris, and Laziness, in that order.
Attempt to build-in thread constructs for the internals, while allowing a Thread module to safely and robustly add user thread constructs, while not making things bad for the single-threaded folks.
The summary is based on the current Perl 5 architecture. As the internal structure changes, like using vtables, the thread design will have to change.
There has been very little direct discussion about threads or this RFC. The bulk of the discussion centered around whether thread-global or program-global global variables should be the default.
There has been a bit of indirect discussion on threads, like how various structures will fit into a multi-threaded program. No attempt has been made to cross-reference these fringe discussions.
Create an additional pseudo-global stash, one per thread created, that is
local to that thread. This stash would be the default space for non-
lexical variables. $main::foo
== $foo
within one thread, while
$main::foo
!= $main::foo
in different threads. There need be no way to
specify the particular thread-space, as it should be visible only to the
owning thread.
Rationale: This allows the bulk of the modules, which don't rely on sharing data across multiple threads, to work as written for a single-threaded program.
The Thread module should add a global
keyword or function that explicitly
accesses a variable in the program-global stash.
global $main::foo = $foo; # Let another thread know what my $foo is. global $main::foo = \$foo; # Share my local foo. Dangerous! $foo = global $main::foo; # Localize this instance of $main::foo.
Rationale: There needs to be some way to distinguish between the program- global stash and the thread-global stashes. It should undeniably mark the data that is being shared.
The Thread module should, on inclusion, also set the optree flag that triggers mutex locking on variables within the perl core itself. (As differentiated by a user-created and controlled mutex.) This is to guarantee that the above constructs will actually work - user created race conditions aside.
Rationale: We don't want to bog single-threaded programs with needless mutex operations, let alone attempt to do so on platforms that have no such beastie.
Populate the thread-space stash with the built-ins, vice the program global stash. Very few of the built-ins are meaningless in this threaded construct, most are truly independent, and those that aren't, like $^O, should probably be read-only anyway.
Threads are a runtime construct only. Lexical scoping and compile issues
are independent of any thread boundaries. The initial interpretation is
done by a single thread. use Threads
may set up the thread constructs,
but threads will not be spawned until runtime.
Rationale: It would probably be too difficult to have the interpreter try to figure out what thread is which, without actual threads. A cart and horse problem.
Modules should be loaded into the thread-global space when use
d. As this
occurs during the compile, there is only one thread in existence to receive
the module. Perl should continue to track these modules (as it currently
does to prevent multiple inclusions), although it may need its own bin.
Subsequent threads should then reslurp these modules back in on their start
up.
Discussion: This is a potential problem child. Your main thread use
s a
module, but may have its own data. It may or may not have changed the
module's data before a secondary thread was started. The second thread
can't simply copy the first thread's space, because it may have thread-
specific information contained there. But you can't blindly globalize
each module, because then you lose the ability for a standard module to
work without its own explicit and robust reentrent handling. Therefore,
each thread needs to reuse the original modules upon creation.
#!/your/path/to/perl -w use English; # Lexical feel. Will work across all threads. use Threads; use Foo; use Bar; # the main thread has all four above in its arena my $thread2 = Threads->new(\&start_thread2); ... sub start_thread2 { ... # Before this sub was called, the second thread was created, and it # reused English, Threads, Foo, and Bar, pulling them into its spaces. }
This, of course, could lead to massive program bloat, as each of ten threads
may each have their own copy of the same 50 modules. Here, however, I'm
hoping that better engineering will take place, and programmers won't
needlessly use
modules globally. See item 7.
Rationale: This is consistent with items 1, 4, and 7.
Threads should be able to use
and no
their own modules, outside of
the global ones. This allows each thread to only use the modules they need,
saving on the global system bloat described above, and giving each thread the
most control over its environment - such as letting two threads use two
different versions of the same module.
Discussion: Threads are mainly used in one of two ways: parallel processing
of the same dataspace, such as loop processing with non-iteration-dependent
data members; or "assembly line" parallel processing, with each thread
doing a different function on a database, like thread signals conversion,
with a buffer read, byte/nybble/bit swapping, data conversion, and buffer
write split across multiple threads. I won't pretend to know which is truly
more common, or more deserving of the threading model, but the second is
certainly easier to use from an end-user perspective, and the first can
be converted to the second with relative ease. This model allows different
threads to do different things with little impact on the footprint of the
program, by allowing each thread to use
its own modules.
use English; # In main thread at compilation use Threads; # In main thread at compilation Threads->use("Foo"); # Loads Foo into main thread at runtime. my $thread2 = Threads->new(\&start_thread2); ... sub start_thread2 { ... # Before this sub was called, the second thread was created, and it # reused English and Threads. Threads->use("Bar"); # Remember, this is this thread's Threads }
Rationale: Consistent with above, with some pleasant side-effects. With
inclusion rules being the same, a thread won't use
something it already
received globally, and modules themselves can use the original, standard
syntax, since the inclusion is deferred to runtime, a use Modules
will
load into the thread-space of the thread that invoked its use
method.
The downside, of course, is that module inclusion is deffered to downside.
Perhaps a separate RFC on program invariant checking is necessary....
BEGIN
and END
blocks will continue to be single-threaded compile time
constructs.
Automatic mutex locking will occur on variable manipulation only. Anything longer will require explicit locking.
use Thread
and the sematics it would add. See the
notes below about module inclusion. (Obviously, other changes to the language
notwithstanding.)RFC 86: IPC Mailboxes for Threads and Signals
RFC 178: Lightweight Threads
RFC 185: Thread Programming Model
RFC 293: MT-Safe Autovariables in perl 5.005 Threading