[% setvar title Improved Module Versioning And Searching %]

This file is part of the Perl 6 Archive

Note: these documents may be out of date. Do not use as reference!

To see what is currently happening visit http://www.perl6.org/

TITLE

Improved Module Versioning And Searching

VERSION

  Maintainer: Steve Simmons <scs@ans.net>
  Date: 8 Aug 2000
  Mailing List: perl6-language@perl.org
  Number: 78
  Version: 1
  Status: Developing

ABSTRACT

Modern production systems may have many versions of different modules in use simultaneously. Workarounds are possible, but they lead to a vast spaghetti of fragile installation webs. This proposal will attempt to redefine module versioning and its handling in a way that is fully upward compatible but solves the current problems.

An up-to-the-instant version of this RFC will be posted as HTML at www.nnaf.net as soon as I know the RFC number.

DESCRIPTION

There are several classes of problem with the current module versioning and searching, which will be discussed separately. The solutions proposed overlap, and are discussed in IMPLEMENTATION below.

Current Implementation Is Broken

These problems are ones in which I would go so far as to say that the current (perl5) performance is actually broken.

Discovery Of Older Versions Aborts Searches

Currently a statement

  use  foo 1.2;

can cause perl to search @INC until it finds a foo.pm file or exhausts the search path. When a foo.pm, its VERSION is checked against the one requested in the use statement. If the foo.pm version is less than 1.2, perl immediately gives an error message and halts the compilation. A satisfactory version may exist elsewhere in @INC, but it is not searched for.

First Module Discovered May Not Be Newest

I believe that when a programmer writes

    use module;

`Do What I Mean' should find the newest version of the module present on the @INC search path. Instead, the very first module.pm file found is taken, regardless of the presence of others on the path.

Current Methods Are Insufficient for Complex Installations

Deployment of perl modules in high-reliability or widely shared environments often requires multiple versions of modules installed simultaneously. (Comments `but that's a bad idea' will be cheerfully ignored -- if I could control what other departments need, I would). This leads to an endless proliferation of use lib directories and ever-more-pervasive `silos of development.'

Part of the problem is the limitations of the current system in how modules are versioned and how perl decides which version to load. In worst case, code such as

    use lib '/path/to/department/module/versionX';
    use module ;	# To get version X for sure
    no use lib '/path/to/department/module/versionX';

has been found in production equipment. Why does such bogosity occur? It's an attempt to solve both the above problems and the deployment issues which follow below.

New Module Releases Can Break Existing Scripts

Working tools persist. An application which does its job well will live as long as the problem it addresses. This means old code may continue running for a long time.

For perl itself, most sites solve this problem by having the perl invocation include versioning:

   #!/usr/bin/perl5.005

The indicated version will likely remain installed and stable as long as the script which uses it and the platform on which that script runs.

The proliferation and increasing use of modules is generally a good thing. However, installation of new modules can and sometimes does break existing scripts. Workarounds for this problem are cumbersome at best, and we have existence proofs in other languages that this can be handled better (notably tcl, but there are probably more).

Test Systems Need Test Modules

Mission-critical scripts often need to have a final test pass by releasing experimental versions onto productions systems alongside the production systems.

The inflexibility of perl module versioning also contributes to difficulties in releasing systems for test. A new script may require significant changes to internals of one or more supporting modules. The changes need not be visible to existing scripts; if bugs are introduced then previously working systems may change or break in obscure ways.

Ideally, there would be a mechanism by which script.new or newscript could be released simultaneously with an appropriate version of module.new.pm or newmodule.pm while the previous version remains in place for older code. A more flexible mechanism for module version specification and searching can fix the problem.

Proposed Solution

I believe that relatively simple changes can be made to the version identification and module installation systems which will solve all the above problems. In addition, those changes should be largely upward compatible from current functioning; and if needed could be made 100% compatible.

IMPLEMENTATION

Several changes, working together, should provide the flexibility needed to solve all the stated problems and deficiencies:

=over4 
  • 1.
  • Clarification on how version numbers are formed (largely done as per perl5.6)

  • 2.
  • Well-defined rules for version number comparison.

  • 3.
  • Extensions to the use module version syntax to support better specification of version numbers.

  • 4.
  • A modification to the module installation mechanism to make version numbers more immediately recognizable without requiring parsing/compiling of module.pm files.

  • 5.
  • Modifications to the @INC path searching rules to reflect the changes in numbers 1-4 above.

    We believe that most if not all of these changes can be made without requiring a change either to older scripts, existing modules, or items already in CPAN. New scripts and new modules should be able to take advantage of the changes with relatively minimal changes.

    Overview

    In brief, I propose the installation method for modules as provided by perl Makefile.PL be changed such that version numbers appear in the path of the module being installed. This would require that the Makefile.PL support functions open the module, extract the value of $VERSION (if any), and use that to build the pathnames to install the module.

    This change has two huge wins:

    Programs which request versions in their use module statements would be compiled with the ``best fit'' commensurate with their request and with the request of other modules.

    Note that it may not be possible to satisfy conflicting requests. If module A and module B demand two different versions of the same module C, the compiler should halt and state the module conflicts.

    ``Best fit'' cannot reliably be determined without examining all the secondary modules required as a consequence of using some lower-level module and without processing the @INC changes introduced by use lib, etc. Thus the compiler might have to examine the internals of a number of versions of some modules before choosing which to use. But it would not have to do a full parse of those modules, and the section on Possible Optimization - Indexes suggests some further wins.

    Path/Module Renaming

    There are a variety of mechanisms which could embed the version number into the path name. This RFC does not strongly favor any one over any other. It does have some general suggestions, but is not imposing a particular solution.

    Here are some guidelines for choosing a naming system:

    Here are some possible implementations:

    Definition Of Version Numbers

    A detailed definition of a version number appears immediately below. It is my belief that this definition and usage is an upward extension from current perl performance; and therefore simple (current) use of version numbers should work without requiring script code change under this proposal.

    Note we are not requiring version numbers, just specifying format and comparison rules.

    The following example shows some valid and invalid version numbers

        use   foo  1.;	# Valid, means '1.0'
        use   foo  01.;	# Valid, means '1.0'
        use   foo  1.1;	# Valid, means '1.1'
        use   foo  1.01;	# Valid, means '1.1'
        use   foo  1.01.;	# Valid, means '1.1.0'
        use   foo  1.1e;	# Invalid, has non-digit
        use   foo  .1;	# Invalid, must start with explicit level
        use   foo  0.1;	# Valid, means '0.1'
        use   foo  1;	# Invalid, must have at least one dot
        use   foo  1.-1;	# Invalid, no negative numbers (and not a digit)
        use   foo  ;	# Valid, means no version specified

    Invalid version numbers cause a compile-time error on the module.

    Usage in Programs

    The existing version request syntax is:

       use module [ version ] [ qw(func1 func2 func3)] ;

    Currently version is a single perl-style version number (whatever the heck that means). I propose we extend the allowable forms to allow ranges, lists, limits, and version limiting. Doing this properly requires some well-defined mechanisms for comparing disparate version numbers.

    Version numbers may appear in the use statement of perl scripts and in the VERSION statement of a perl modules.

    They may either be quoted strings or barewords.

    Usage in any other circumstance is not treated as a version number, but rather the appropriate perl construct for the circumstances. If a bareword, it is almost certainly an error.

    I believe that this usage is consistent with current perl.

    Ordered and Unordered Version Lists

    A program can specify a list of versions in no-preference order by listing them separated by whitespace:

        use   foo 1.0 1.1 1.3;
        use   foo 1.3 1.1 1.0;

    These two requests are effectively identical, with the compiler accepting any version of foo.pm beginning with 1.0, 1.1 or 1.3.

    A program can specify a list of versions in preference order by adding commas:

        use   foo 1.0, 1.1, 1.3;
        use   foo 1.3, 1.1, 1.0;

    In these cases the compiler can proceed if any of the three versions are available. In the first case some version 1.0 is preferred, the the second 1.3 is preferred.

    The whitespace following the commas is optional.

    In cases where there are requests for two different versions of module foo, both of which were first in the request orders, the highest-level module (closest to the original script) shall win.

    If both requests are at the same level offset from the original script, the first requester shall win.

    Version Inequalities

    A program can indicate a minimum, maximum, exact, and super-exact version it will accept. The following syntax handles these requests:

    The form

        use foo <1.2;

    indicates that any version prior to 1.2 is acceptable. This would mean any version with 1.1 or less for its first two levels.

    The form

        use foo <=1.2;

    indicates that any version prior to 1.2 is acceptable. This would mean any version with 1.2 or less for its first two levels.

    The form

        use foo >1.2

    indicates that any version greater than 1.2 is acceptable. This would mean any version with 1.3 or more for its first two levels.

    The form

        use foo >=1.2

    indicates that any version greater than or equal to 1.2 is acceptable. This would mean any version with has 1.2 or more as the first two levels.

    The form

        use foo =1.2

    indicates that only versions which begins with 1.2 are acceptable.

    These may be further tightened by ending the version number in a period. The period forces the rest of the version levels to always be treated at zeros. Thus the form

        use foo =1.2.

    indicates that only version 1.2.0 is acceptable, not 1.2.0.0...1.

    Ordered and Unordered Version Ranges

    A program should be able to use two version numbers to indicate a range of acceptable version numbers. The separator between the two ranges indicates preference order, with

    No whitespace is allowed between the separator and the version number; reasons for this will become apparent in the sections on complex lists.

    Examples of ranges:

        use foo 1.1-1.4

    means that any version which begins with 1 and has a 1.4 as the second level is acceptable.

        use foo 1.1<1.4

    means that any version which begins with 1 and has a 1.4 as the second level is acceptable, with preference given to the highest version in the range.

        use foo 1.1>1.4

    means that any version which begins with 1 and has a 1.4 as the second level is acceptable, with reference given to the lowest version in the range.

    A terminating dot may be used as well, so that

        use foo 1.1-1.4.

    means that any version 1.1 through 1.3 is acceptable, but the only acceptable 1.4 version is 1.4.0.

    Combining Lists and Ranges

    Lists and ranges may be combined in arbitrary ways to make complex preference sets. Thus

        use   foo 1.5 1.0-1.3;

    means that any version 1.0, 1.1, 1.2, 1.3 or 1.5 is acceptable, without preference order. By contrast,

        use   foo 1.5, 1.0-1.3, 1.4;

    means that 1.5 is preferred, then anything in the 1.0 to 1.3 range, then 1.4.

    Why Not Regexps?

    It has been suggested that globs or even full-bore regular expressions be allowed for version specification. It has not been included for the following reasons:

    Since regexps and globs bring little additional utility and introduce possible confusion, I have chosen not to put them in this suggestion.

    Resolving Version Request Problems

    When Nothing Works

    When we permit modules to request only certain versions of other modules, we will find cases where no version of module foo is acceptable to all modules which wish to use it. In such as case, the compiler should give up with an error message stating that due to conflicting version requests, module foo could not be loaded. This could become The Error Message From Hell if sufficient detail was included.

    A utility (perl module?!) should be provided which would recursively examine the use lines a perl script and the system configuration and produces the appropriately voluminous output report.

    When More Than One Acceptable Version Is Found

    Modules load modules load modules, ad nauseum. It is quite possible that two or more different modules will request some other module. If there is only one version which satisfies all the requests, we don't have a problem.

    If there is more than one version acceptable to all callers, we choose which to use based on the following rules:

    How This Solves The Problems

    There were four problems identified with the current system. Implementation of this proposal solves those problems as follows:

    Discovery Of Older Versions No Longer Aborts Searches

    The statement

      use  foo 1.2;

    can cause perl to search @INC until it finds the first foo.pm file of version 1.2 or greater.

    Programmer Can Now Specify Newest Module

    I believe that when a programmer writes

        use module;

    `Do What I Mean' should find the newest version of the module present in @INC. The current proposal does not cause this to occur, and thereby permits backwards-compatible behavior.

    However, the programmer can now write

        use module >=0.0;

    and accept any module, but give preference to the highest.

    New Module Releases Can Break Existing Scripts

    This proposal does not prevent new modules from breaking existing scripts. It does, however, permit those scripts to be repaired by the simple change of locking the script to the acceptable version(s) of the module. This is often significantly easier than updating the script, and avoids the possibility of introducing new bugs either due to modifications of the script or from bugs in the newer module.

    Test Systems Need Test Modules

    In mission-critical environments, production versions of scripts could always be released to a version range of a module, reflecting the ranges of the module it had been tested and known to work against:

        use foo 2.0-2.1;	# Accept any 2.0 or 2.1 version

    When a new version of foo.pm needs to be rolled out with the new version of the script, the version of foo.pm could be set to 2.2 and the new script released with

        use foo =2.2;	# Accept any 2.2 version

    Now new and old versions of script and module can be released with no impact on existing production software. When it is decided that the new versions should become the standard versions, the new script is copied over the old and the modules are not touched.

    Possible Optimization - Indexes

    A possible problem introduced with this proposal is an even greater increase in the amount of searching of directories that must be done. This is an often-expensive process, and can have a serious impact when even small scripts are run tens of thousands of times per day (as ours do).

    This could be resolved by adding some sort of simple index files to the installation tree. The index files could simply be a list of all the files (pathnames) found under this particular branch of the file tree. Since those pathnames would contain the version numbers, examining any index file would be sufficient for determining what versions lay where. If later uses of use lib chose a subset of that tree, the index data would already be present. In an ideal situation, only one or two index files might be all that is needed to find all versions of all modules.

    More complex indexes could be built which might include all the dependency information in a manner not dissimilar to the output of lorder(1).

    Reliability would best be done by implementing index construction as an automatic part of module install. As above, this could be automated such that neither the module developer nor the system administrator would have to worry about it; the process would still be:

        perl6 Makefile.PM
        make
        make install

    and the make install updates the indexes. At that point it is not clear to mean that such an index is required; in its absence perl6 should simply search the @INC directories as it does now.

    Adaptation to Perl5

    There is nothing in this proposal which could not be implemented in perl5.X, and it would probably be a Good Thing if such were done.

    Alternative Idea - Module Versioning In Namespaces

    Some languages allow multiple versions of a module to be loaded simultaneously. It is my opinion that In This Way Lies Madness, but perl has done stranger things. Should we decide to allow this, incorporating the version number into the namespace would allow the appropriate disambiguation.

    Let us suppose that module foo requires module bar v1.0, while module baz requires module bar 2.0. Also assume that both versions of bar provide a op function. Then these two modules could do

        module foo.pm			module baz.pm
    
        use bar =1.0.;			use bar =2.0. ;
        # Uses bar1.0::op                   # Uses bar2.0::op
        bar::op();                          bar::op();

    and each time get the appropriate invocation of op.

    Similarly modules foo and baz both create a bar object, the object should blessed into the appropriate version of baz, so that

        my $handle1 = new foo ;
        my $handle2 = new bar ;
        my $var1 = $handle1->op();	# Always gets op 1.0
        my $var2 = $handle2->op();	# Always gets op 2.0

    always invokes the appropriate version of $op.

    While I'm not seriously suggesting this dual loading be allowed, it should at least be considered by the folks who know more about objects than I do. Note, though, that this kind of feature might prove to be invaluable in testing new versions of modules. With appropriate aliasing added, a test script could do

        use foo <3.0 as foo_old;
        use foo =3.0. as new_foo;
    
        # do something with foo_old
        # do identical things with foo_new
        # compare results

    Again, I'm not seriously suggesting this feature. But if it comes up, all the module versioning rules above need to be revisited.

    REFERENCES

        lorder(1) - Optimising .o file orders for UNIX loader ld(1)