[% setvar title Improved Module Versioning And Searching %]
Note: these documents may be out of date. Do not use as reference! |
To see what is currently happening visit http://www.perl6.org/
Improved Module Versioning And Searching
Maintainer: Steve Simmons <scs@ans.net> Date: 8 Aug 2000 Mailing List: perl6-language@perl.org Number: 78 Version: 1 Status: Developing
Modern production systems may have many versions of different modules in use simultaneously. Workarounds are possible, but they lead to a vast spaghetti of fragile installation webs. This proposal will attempt to redefine module versioning and its handling in a way that is fully upward compatible but solves the current problems.
An up-to-the-instant version of this RFC will be posted as HTML
at www.nnaf.net
as soon as I
know the RFC number.
There are several classes of problem with the current module versioning and searching, which will be discussed separately. The solutions proposed overlap, and are discussed in IMPLEMENTATION below.
These problems are ones in which I would go so far as to say that
the current (perl5
) performance is actually broken.
Currently a statement
use foo 1.2;
can cause perl
to search @INC
until it finds a foo.pm
file
or exhausts the search path. When a foo.pm
, its VERSION
is checked against the one requested in the use
statement.
If the foo.pm
version is less than 1.2, perl
immediately
gives an error message and halts the compilation.
A satisfactory version may exist elsewhere in @INC
, but it is
not searched for.
I believe that when a programmer writes
use module;
`Do What I Mean' should find the newest version of the module present on the
@INC
search path.
Instead, the very first module.pm
file found is taken,
regardless of the presence of others on the path.
Deployment of perl modules in high-reliability or widely shared
environments often requires multiple versions of modules installed
simultaneously.
(Comments `but that's a bad idea' will be cheerfully ignored --
if I could control what other departments need, I would).
This leads to an endless proliferation of use lib
directories
and ever-more-pervasive `silos of development.'
Part of the problem is the limitations of the current system in
how modules are versioned and how perl
decides which version
to load. In worst case, code such as
use lib '/path/to/department/module/versionX'; use module ; # To get version X for sure no use lib '/path/to/department/module/versionX';
has been found in production equipment. Why does such bogosity occur? It's an attempt to solve both the above problems and the deployment issues which follow below.
Working tools persist. An application which does its job well will live as long as the problem it addresses. This means old code may continue running for a long time.
For perl
itself, most sites solve this problem by having
the perl invocation include versioning:
#!/usr/bin/perl5.005
The indicated version will likely remain installed and stable as long as the script which uses it and the platform on which that script runs.
The proliferation and increasing use of modules is generally a
good thing. However, installation of new modules can and sometimes
does break existing scripts. Workarounds for this problem are
cumbersome at best, and we have existence proofs in other languages
that this can be handled better (notably tcl
, but there are
probably more).
Mission-critical scripts often need to have a final test pass by releasing experimental versions onto productions systems alongside the production systems.
The inflexibility of perl module versioning also contributes to difficulties in releasing systems for test. A new script may require significant changes to internals of one or more supporting modules. The changes need not be visible to existing scripts; if bugs are introduced then previously working systems may change or break in obscure ways.
Ideally, there would be a mechanism by which script.new
or
newscript
could be released
simultaneously with an appropriate version of module.new.pm
or newmodule.pm
while the previous version
remains in place for older code.
A more flexible mechanism for module version specification
and searching can fix the problem.
I believe that relatively simple changes can be made to the version identification and module installation systems which will solve all the above problems. In addition, those changes should be largely upward compatible from current functioning; and if needed could be made 100% compatible.
Several changes, working together, should provide the flexibility needed to solve all the stated problems and deficiencies:
=over4
Clarification on how version numbers are formed (largely done
as per perl5.6
)
Well-defined rules for version number comparison.
Extensions to the use module version
syntax to support
better specification of version numbers.
A modification to the module installation mechanism to make
version numbers more immediately recognizable without requiring
parsing/compiling of module.pm
files.
Modifications to the @INC
path searching rules to reflect the
changes in numbers 1-4 above.
We believe that most if not all of these changes can be made without requiring a change either to older scripts, existing modules, or items already in CPAN. New scripts and new modules should be able to take advantage of the changes with relatively minimal changes.
In brief, I propose the installation method for modules as provided
by perl Makefile.PL
be changed such that version numbers appear
in the path of the module being installed. This would require that
the Makefile.PL
support functions open the module, extract the
value of $VERSION
(if any), and use that to build the pathnames
to install the module.
This change has two huge wins:
Authors of modules would have to do literally nothing to use the new mechanism.
Having the version numbers embedded in the path means they could be
reliably determined without having to actually
open and parse each candidate .pm
file.
Programs which request versions in their use module
statements
would be compiled with the ``best fit'' commensurate with their
request and with the request of other modules.
Note that it may not be possible to satisfy conflicting requests. If
module A
and module B
demand two different versions of the same
module C
, the compiler should halt and state the module conflicts.
``Best fit'' cannot reliably be determined without examining all
the secondary modules required as a consequence of using some
lower-level module and without processing the @INC
changes
introduced by use lib
, etc.
Thus the compiler might have to examine the internals of a number
of versions of some modules before choosing which to use.
But it would not have to do a full parse of those modules,
and the section on Possible Optimization - Indexes suggests
some further wins.
There are a variety of mechanisms which could embed the version number into the path name. This RFC does not strongly favor any one over any other. It does have some general suggestions, but is not imposing a particular solution.
Here are some guidelines for choosing a naming system:
.pm
filename extension should be preserved. Thus
it is probably better to embed the version number into the file
name or a directory name immediately above it on the search path.foo-1.0.pm
) and
the setting in the VERSION
statement (eg, VERSION=1.1;
),
perl6
should issue a compile-time error which includes the full path
to the module and the internal VERSION
number.
No recovery should be done.Here are some possible implementations:
foo.pm
file would currently be installed,
replace it with a foo.pm
directory.
In that directory, versioned modules would be installed as
foo-version.pm
, and versionless as foo.pm
.
or as version.pm
and none.pm
foo.pm
directory as above, but populate it
with subdirectories for each installed version.
A foo.pm
file containing the module code would reside
in that directory.
Versionless modules could be installed into the foo.pm
directory rather than in a subdirectory.
This mechanisms has some possible wins should it be
appropriate to support simultaneous load of multiple versions.A detailed definition of a version number appears immediately below.
It is my belief that this definition and usage is an upward extension
from current perl
performance; and therefore simple (current) use
of version numbers should work without requiring script code change
under this proposal.
Note we are not requiring version numbers, just specifying format and comparison rules.
Version numbers consist of one or more version levels separated by dots.
Each version level must consist of a non-negative number expressed
as a series of the digits, ie [0-9]+
.
The first version level and first dot are required.
There is no limit to how many levels a version may have.
If a version number ends in a dot, a final level of 0
is assumed.
Leading zeros are allowed in level numbers, but are ignored.
If a version level contains leading zeros, those zeros will be
stripped in all cases except for version(s) 0.*
.
Trailing zeros in version numbers, whether explicit or implied by a final dot, are trimmed from the version number internally when deriving paths. See above for pathname deriving.
The following example shows some valid and invalid version numbers
use foo 1.; # Valid, means '1.0' use foo 01.; # Valid, means '1.0' use foo 1.1; # Valid, means '1.1' use foo 1.01; # Valid, means '1.1' use foo 1.01.; # Valid, means '1.1.0' use foo 1.1e; # Invalid, has non-digit use foo .1; # Invalid, must start with explicit level use foo 0.1; # Valid, means '0.1' use foo 1; # Invalid, must have at least one dot use foo 1.-1; # Invalid, no negative numbers (and not a digit) use foo ; # Valid, means no version specified
Invalid version numbers cause a compile-time error on the module.
The existing version request syntax is:
use module [ version ] [ qw(func1 func2 func3)] ;
Currently version is a single perl
-style version number (whatever
the heck that means). I propose we extend the allowable forms to allow
ranges, lists, limits, and version limiting.
Doing this properly requires
some well-defined mechanisms for comparing disparate version numbers.
Version numbers may appear in the use
statement of perl
scripts
and in the VERSION
statement of a perl
modules.
They may either be quoted strings or barewords.
Usage in any other circumstance is not treated as a version number,
but rather the appropriate perl
construct for the circumstances.
If a bareword, it is almost certainly an error.
I believe that this usage is consistent with current perl
.
A program can specify a list of versions in no-preference order by listing them separated by whitespace:
use foo 1.0 1.1 1.3; use foo 1.3 1.1 1.0;
These two requests are effectively identical, with the
compiler accepting any version of foo.pm
beginning with 1.0, 1.1 or 1.3.
A program can specify a list of versions in preference order by adding commas:
use foo 1.0, 1.1, 1.3; use foo 1.3, 1.1, 1.0;
In these cases the compiler can proceed if any of the three versions are available. In the first case some version 1.0 is preferred, the the second 1.3 is preferred.
The whitespace following the commas is optional.
In cases where there are requests for two different versions of
module foo
, both of which were first in the request orders,
the highest-level module (closest to the original script) shall
win.
If both requests are at the same level offset from the original script, the first requester shall win.
A program can indicate a minimum, maximum, exact, and super-exact version it will accept. The following syntax handles these requests:
The form
use foo <1.2;
indicates that any version prior to 1.2 is acceptable. This would mean any version with 1.1 or less for its first two levels.
The form
use foo <=1.2;
indicates that any version prior to 1.2 is acceptable. This would mean any version with 1.2 or less for its first two levels.
The form
use foo >1.2
indicates that any version greater than 1.2 is acceptable. This would mean any version with 1.3 or more for its first two levels.
The form
use foo >=1.2
indicates that any version greater than or equal to 1.2 is acceptable. This would mean any version with has 1.2 or more as the first two levels.
The form
use foo =1.2
indicates that only versions which begins with 1.2 are acceptable.
These may be further tightened by ending the version number in a period. The period forces the rest of the version levels to always be treated at zeros. Thus the form
use foo =1.2.
indicates that only version 1.2.0 is acceptable, not 1.2.0.0...1.
A program should be able to use two version numbers to indicate a range of acceptable version numbers. The separator between the two ranges indicates preference order, with
meaning the right hand side is preferred, and
meaning the left hand side is preferred.
No whitespace is allowed between the separator and the version number; reasons for this will become apparent in the sections on complex lists.
Examples of ranges:
use foo 1.1-1.4
means that any version which begins with 1 and has a 1.4 as the second level is acceptable.
use foo 1.1<1.4
means that any version which begins with 1 and has a 1.4 as the second level is acceptable, with preference given to the highest version in the range.
use foo 1.1>1.4
means that any version which begins with 1 and has a 1.4 as the second level is acceptable, with reference given to the lowest version in the range.
A terminating dot may be used as well, so that
use foo 1.1-1.4.
means that any version 1.1 through 1.3 is acceptable, but the only acceptable 1.4 version is 1.4.0.
Lists and ranges may be combined in arbitrary ways to make complex preference sets. Thus
use foo 1.5 1.0-1.3;
means that any version 1.0, 1.1, 1.2, 1.3 or 1.5 is acceptable, without preference order. By contrast,
use foo 1.5, 1.0-1.3, 1.4;
means that 1.5 is preferred, then anything in the 1.0 to 1.3 range, then 1.4.
It has been suggested that globs
or even full-bore regular
expressions be allowed for version specification. It has not
been included for the following reasons:
use foo 1.[023]
and indicate that there was some ordering to the sub-version. I
am concerned that people would naively expect that the Do What I
Mean principle would cause perl
to assume that the following
use foo 1.[203]
is equivalent to
use foo 1.2, 1.0, 1.3
on the naive theory that the two regexps look different so they should do something different.
globs
have been suggested.
Having more than one mechanism for this would intermediate perl
programmers to assume that there was some subtle difference between
the two.Since regexps and globs
bring little additional utility and
introduce possible confusion, I have chosen not to put them in
this suggestion.
When we permit modules to request only certain versions of
other modules, we will find cases where no version of
module foo
is acceptable to all modules which wish to use it.
In such as case, the compiler should give up with an error
message stating that due to conflicting version requests,
module foo
could not be loaded. This could
become The Error Message From Hell if sufficient detail was
included.
A utility (perl module?!) should be provided which
would recursively examine the use
lines a perl script
and the system configuration and produces the appropriately
voluminous output report.
Modules load modules load modules, ad nauseum. It is quite possible that two or more different modules will request some other module. If there is only one version which satisfies all the requests, we don't have a problem.
If there is more than one version acceptable to all callers, we choose which to use based on the following rules:
If no preference was expressed, first acceptable version that was found is used.
If a preference was expressed, highest preference is given to the requests which come from the original script.
If no request came from the original script, highest preference is given to second-level requesters. If there is more than one second-level requester, the first requesters preferences are used. If there are no second-level requester, the third level is used, and so on.
There were four problems identified with the current system. Implementation of this proposal solves those problems as follows:
The statement
use foo 1.2;
can cause perl
to search @INC
until it finds the first
foo.pm
file of version 1.2 or greater.
I believe that when a programmer writes
use module;
`Do What I Mean' should find the newest version of the module present
in @INC
.
The current proposal does not cause this to occur,
and thereby permits backwards-compatible behavior.
However, the programmer can now write
use module >=0.0;
and accept any module, but give preference to the highest.
This proposal does not prevent new modules from breaking existing scripts. It does, however, permit those scripts to be repaired by the simple change of locking the script to the acceptable version(s) of the module. This is often significantly easier than updating the script, and avoids the possibility of introducing new bugs either due to modifications of the script or from bugs in the newer module.
In mission-critical environments, production versions of scripts could always be released to a version range of a module, reflecting the ranges of the module it had been tested and known to work against:
use foo 2.0-2.1; # Accept any 2.0 or 2.1 version
When a new version of foo.pm
needs to be rolled out with the
new version of the script, the version of foo.pm
could be
set to 2.2 and the new script released with
use foo =2.2; # Accept any 2.2 version
Now new and old versions of script and module can be released with no impact on existing production software. When it is decided that the new versions should become the standard versions, the new script is copied over the old and the modules are not touched.
A possible problem introduced with this proposal is an even greater increase in the amount of searching of directories that must be done. This is an often-expensive process, and can have a serious impact when even small scripts are run tens of thousands of times per day (as ours do).
This could be resolved by adding some sort of simple index
files to the installation tree. The index files could simply
be a list of all the files (pathnames) found under this particular
branch of the file tree. Since those pathnames would contain
the version numbers, examining any index file would be sufficient
for determining what versions lay where. If later uses of
use lib
chose a subset of that tree, the index data would
already be present. In an ideal situation, only one or two
index files might be all that is needed to find all versions
of all modules.
More complex indexes could be built which might include all the
dependency information in a manner not dissimilar to the output
of lorder(1)
.
Reliability would best be done by implementing index construction as an automatic part of module install. As above, this could be automated such that neither the module developer nor the system administrator would have to worry about it; the process would still be:
perl6 Makefile.PM make make install
and the make install
updates the indexes. At that point it
is not clear to mean that such an index is required; in its
absence perl6
should simply search the @INC
directories
as it does now.
There is nothing in this proposal which could not be implemented in perl5.X, and it would probably be a Good Thing if such were done.
Some languages allow multiple versions of a module to be loaded
simultaneously. It is my opinion that In This Way Lies Madness, but
perl
has done stranger things. Should we decide to allow this,
incorporating the version number into the namespace would allow the
appropriate disambiguation.
Let us suppose that module foo
requires module bar v1.0
,
while module baz
requires module bar 2.0
. Also assume
that both versions of bar
provide a op
function. Then
these two modules could do
module foo.pm module baz.pm use bar =1.0.; use bar =2.0. ; # Uses bar1.0::op # Uses bar2.0::op bar::op(); bar::op();
and each time get the appropriate invocation of op
.
Similarly modules foo
and baz
both create a bar
object,
the object should blessed into the appropriate version of baz
,
so that
my $handle1 = new foo ; my $handle2 = new bar ; my $var1 = $handle1->op(); # Always gets op 1.0 my $var2 = $handle2->op(); # Always gets op 2.0
always invokes the appropriate version of $op
.
While I'm not seriously suggesting this dual loading be allowed, it should at least be considered by the folks who know more about objects than I do. Note, though, that this kind of feature might prove to be invaluable in testing new versions of modules. With appropriate aliasing added, a test script could do
use foo <3.0 as foo_old; use foo =3.0. as new_foo; # do something with foo_old # do identical things with foo_new # compare results
Again, I'm not seriously suggesting this feature. But if it comes up, all the module versioning rules above need to be revisited.
lorder(1) - Optimising .o file orders for UNIX loader ld(1)