Author: Bryan C. Mills (with substantial input from Russ Cox, Jay Conrod, and Michael Matloob)
Last updated: 2020-02-20
Discussion at https://golang.org/issue/36460.
We propose to change cmd/go
to avoid loading transitive module dependencies
that have no observable effect on the packages to be built.
The key insights that lead to this approach are:
If no package in a given dependency module is ever (even transitively)
imported by any package loaded by an invocation of the go
command, then an
incompatibility between any package in that dependency and any other package
has no observable effect in the resulting program(s). Therefore, we can
safely ignore the (transitive) requirements of any module that does not
contribute any package to the build.
We can use the explicit requirements of the main module as a coarse filter
on the set of modules relevant to the main module and to previous
invocations of the go
command.
Based on those insights, we propose to change the go
command to retain more
transitive dependencies in go.mod
files and to avoid loading go.mod
files
for “irrelevant” modules, while still maintaining high reproducibility for build
and test operations.
In the initial implementation of modules, we attempted to make go mod tidy
prune out of the go.mod
file any module that did not provide a transitive
import of the main module. However, that did not always preserve the remaining
build list: a module that provided no packages might still raise the minimum
requirement on some other module that did provide a package.
We addressed that problem in CL 121304 by explicitly retaining requirements on all modules that provide directly-imported packages, as well as a minimal set of module requirement roots needed to retain the selected versions of transitively-imported packages.
In #29773 and #31248, we realized that, due to the fact that the go.mod
file is pruned to remove indirect dependencies already implied by other
requirements, we must load the go.mod
file for all versions of dependencies,
even if we know that they will not be selected — even including the main module
itself!
In #30831 and #34016, we learned that following deep history makes problematic dependencies very difficult to completely eliminate. If the repository containing a module is no longer available and the module is not cached in a module mirror, then we will encounter an error when loading any module — even a very old, irrelevant one! — that required it.
In #26904, #32058, #33370, and #34417, we found that the need to
consider every version of a module separately, rather than only the selected
version, makes the replace
directive difficult to understand, difficult to use
correctly, and generally more complex than we would like it to be.
In addition, users have repeatedly expressed the desire to avoid the cognitive
overhead of seeing “irrelevant” transitive dependencies (#26955, #27900,
#32380), reasoning about older-than-selected transitive dependencies
(#36369), and fetching large numbers of go.mod
files (#33669, #29935).
In this proposal, we aim to achieve a property that we call lazy loading:
In the steady state, an invocation of the go
command should not load any
go.mod
file or source code for a module (other than the main module) that
provides no packages loaded by that invocation.
go
command, the go
command should not load a go.mod
file or source
code for any other version of that module.We also want to preserve reproducibility of go
command invocations:
go
command should either load the same version of
each package as every other invocation since the last edit to the go.mod
file, or should edit the go.mod
file in a way that causes the next
invocation on any subset of the same packages to use the same versions.We propose that, when the main module's go.mod
file specifies go 1.15
or
higher, every invocation of the go
command should update the go.mod
file to
maintain three invariants.
(The import invariant.) The main module's go.mod
file
explicitly requires the selected version of every module that contains one
or more packages that were transitively imported by any package in the main
module.
(The argument invariant.) The main module's go.mod
file
explicitly requires the selected version of every module that contains one
or more packages that matched an explicit package pattern argument.
(The completeness invariant.) The version of every module that
contributed any package to the build is recorded in the go.mod
file of
either the main module itself or one of modules it requires explicitly.
The completeness invariant alone is sufficient to ensure reproducibility and
lazy loading. However, it is under-constrained: there are potentially many
minimal sets of requirements that satisfy the completeness invariant, and even
more valid solutions. The import and argument invariants guide us toward a
specific solution that is simple and intuitive to explain in terms of the go
commands invoked by the user.
If the main module satisfies the import and argument invariants, and all explicit module dependencies also satisfy the import invariant, then the completeness invariant is also trivially satisfied. Given those, the completeness invariant exists only in order to tolerate incomplete dependencies.
If the import invariant or argument invariant holds at the start of a go
invocation, we can trivially preserve that invariant (without loading any
additional packages or modules) at the end of the invocation by updating the
go.mod
file with explicit versions for all module paths that were already
present, in addition to any new main-module imports or package arguments found
during the invocation.
At the start of each operation, we load all of the explicit requirements from
the main module's go.mod
file.
If we encounter an import from any module that is not already explicitly
required by the main module, we perform a deepening scan. To perform
a deepening scan, we read the go.mod
file for each module explicitly required
by the main module, and add its requirements to the build list. If any
explicitly-required module uses go 1.14
or earlier, we also read the go.mod
files for all of that module's (transitive) module dependencies.
(The deepening scan allows us to detect changes to the import graph without loading the whole graph explicitly: if we encounter a new import from within a previously-irrelevant package, the deepening scan will re-read the requirements of the module containing that package, and will ensure that the selected version of that import is compatible with all other relevant packages.)
As we load each imported package, we also read the go.mod
file for the module
containing that package and add its requirements to the build list — even if
that version of the module was already explicitly required by the main module.
(This step is theoretically redundant: the requirements of the main module will
already reflect any relevant dependencies, and the deepening scan will catch
any previously-irrelevant dependencies that subsequently become relevant.
However, reading the go.mod
file for each imported package makes the go
command much more robust to inconsistencies in the go.mod
file — including
manual edits, erroneous version-control merge resolutions, incomplete
dependencies, and changes in replace
directives and replacement directory
contents.)
If, after the deepening scan, the package to be imported is still not found in
any module in the build list, we resolve the latest
version of a module
containing that package and add it to the build list (following the same search
procedure as in Go 1.14), then perform another deepening scan (this time
including the newly added-module) to ensure consistency.
all
pattern and mod
subcommandsIn module mode in Go 1.11–1.14, the all
package pattern matches each package
reachable by following imports and tests of imported packages recursively,
starting from the packages in the main module. (It is equivalent to the set of
packages obtained by iterating go list -deps -test ./...
over its own output
until it reaches a fixed point.)
go mod tidy
adjusts the go.mod
and go.sum
files so that the main module
transitively requires a set of modules that provide every package matching the
all
package pattern, independent of build tags. After go mod tidy
, every
package matching the all
package pattern is provided by some module matching
the all
module pattern.
go mod tidy
also updates a set of // indirect
comments indicating versions
added or upgraded beyond what is implied by transitive dependencies.
go mod download
downloads all modules matching the all
module pattern,
which normally includes a module providing every package in the all
package
pattern.
In contrast, go mod vendor
copies in only the subset of packages transitively
imported by the packages and tests in the main module: it does not scan the
imports of tests outside of the main module, even if those tests are for
imported packages. (That is: go mod vendor
only covers the packages directly
reported by go list -deps -test ./...
.)
As a result, when using -mod=vendor
the all
and ...
patterns may match
substantially fewer packages than when using -mod=mod
(the default) or
-mod=readonly
.
all
package pattern and go mod tidy
We would like to preserve the property that, after go mod tidy
, invocations of
the go
command — including go test
— are reproducible (without changing
the go.mod
file) for every package matching the all
package pattern. The
completeness invariant is what ensures reproducibility, so go mod tidy
must
ensure that it holds.
Unfortunately, even if the import invariant holds for all of the dependencies
of the main module, the current definition of the all
pattern includes
dependencies of tests of dependencies, recursively. In order to establish the
completeness invariant for distant test-of-test dependencies, go mod tidy
would sometimes need to record a substantial number of dependencies of tests
found outside of the main module in the main module's go.mod
file.
Fortunately, we can omit those distant dependencies a different way: by changing
the definition of the all
pattern itself, so that test-of-test dependencies
are no longer included. Feedback from users (in #29935, #26955, #32380,
#32419, #33669, and perhaps others) has consistently favored omitting those
dependencies, and narrowing the all
pattern would also establish a nice new
property: after running go mod vendor
, the all
package pattern with
-mod=vendor
would now match the all
pattern with -mod=mod
.
Taking those considerations into account, we propose that the all
package
pattern in module mode should match only the packages transitively imported by
packages and tests in the main module: that is, exactly the set of packages
preserved by go mod vendor
. Since the all
pattern is based on package
imports (more-or-less independent of module dependencies), this change should be
independent of the go
version specified in the go.mod
file.
The behavior of go mod tidy
should change depending on the go
version. In a
module that specifies go 1.15
or later, go mod tidy
should scan the packages
matching the new definition of all
, ignoring build tags. In a module that
specifies go 1.14
or earlier, it should continue to scan the packages matching
the old definition (still ignoring build tags). (Note that both of those sets
are supersets of the new all
pattern.)
all
and ...
module patterns and go mod download
In Go 1.11–1.14, the all
module pattern matches each module reachable by
following module requirements recursively, starting with the main module and
visiting every version of every module encountered. The module pattern ...
has
the same behavior.
The all
module pattern is important primarily because it is the default set of
modules downloaded by the go mod download
subcommand, which sets up the local
cache for offline use. However, it (along with ...
) is also currently used by
a few other tools (such as go doc
) to locate “modules of interest” for other
purposes.
Unfortunately, these patterns as defined in Go 1.11–1.14 are not compatible
with lazy loading: they examine transitive go.mod
files without loading any
packages. Therefore, in order to achieve lazy loading we must change their
behavior.
Since we want to compute the list of modules without loading any packages or
irrelevant go.mod
files, we propose that when the main module's go.mod
file
specifies go 1.15
or higher, the all
and wildcard module patterns should
match only those modules found in a deepening scan of the main module's
dependencies. That definition includes every module whose version is
reproducible due to the completeness invariant, including modules needed by
tests of transitive imports.
With this redefinition of the all
module pattern, and the above redefinition
of the all
package pattern, we again have the property that, after go mod tidy && go mod download all
, invoking go test
on any package within all
does not need to download any new dependencies.
Since the all
pattern includes every module encountered in the deepening scan,
rather than only those that provide imported packages, go mod download
may
continue to download more source code than is strictly necessary to build the
packages in all
. However, as is the case today, users may download only that
narrower set as a side effect of invoking go list all
.
go.mod
sizeUnder this approach, the set of modules recorded in the go.mod
file would in
most cases increase beyond the set recorded in Go 1.14. However, the set of
modules recorded in the go.sum
file would decrease: irrelevant modules would
no longer be included.
The modules recorded in go.mod
under this proposal would be a strict
subset of the set of modules recorded in go.sum
in Go 1.14.
go
command
would still not require a separate “manifest” file, and unlike a lock
file, the go.mod
file would still be updated automatically to reflect
new requirements discovered during package loading.)For modules with few test-of-test dependencies, the go.mod
file after
running go mod tidy
will typically be larger than in Go 1.14. For modules
with many test-of-test dependencies, it may be substantially smaller.
For modules that are tidy:
The module versions recorded in the go.mod
file would be exactly those
listed in vendor/modules.txt
, if present.
The module versions recorded in vendor/modules.txt
would be the same
as under Go 1.14, although the ## explicit
annotations could perhaps
be removed (because all relevant dependencies would be recorded
explicitly).
The module versions recorded in the go.sum
file would be exactly those
listed in the go.mod
file.
The go.mod
file syntax and semantics proposed here are backward compatible
with previous Go releases: all go.mod
files for existing go
versions would
retain their current meaning.
Under this proposal, a go.mod
file that specifies go 1.15
or higher will
cause the go
command to lazily load the go.mod
files for its requirements.
When reading a go 1.15
file, previous versions of the go
command (which do
not prune irrelevant dependencies) may select higher versions than those
selected under this proposal, by following otherwise-irrelevant dependency
edges. However, because the require
directive continues to specify a minimum
version for the required dependency, a previous version of the go
command will
never select a lower version of any dependency.
Moreover, any strategy that prunes out a dependency as interpreted by a previous
go
version will continue to prune out that dependency as interpreted under
this proposal: module maintainers will not be forced to break users on new go
versions in order to support users on older versions (or vice-versa).
Versions of the go
command before 1.14 do not preserve the proposed invariants
for the main module: if go
command from before 1.14 is run in a go 1.15
module, it may automatically remove requirements that are now needed. However,
as a result of CL 204878, go
version 1.14 does preserve those invariants in
all subcommands except for go mod tidy
: Go 1.14 users will be able to work (in
a limited fashion) within a Go 1.15 main module without disrupting its
invariants.
bcmills
is working on a prototype of this design for cmd/go
in Go 1.15.
At this time, we do not believe that any other tooling changes will be needed.
Because go mod tidy
will now preserve seemingly-redundant requirements, we may
find that we want to expand or update the // indirect
comments that it
currently manages. For example, we may want to indicate “indirect dependencies
at implied versions” separately from “upgraded or potentially-unused indirect
dependencies”, and we may want to indicate “direct or indirect dependencies of
tests” separately from “direct or indirect dependencies of non-tests”.
Since these comments do not have a semantic effect, we can fine-tune them after implementation (based on user feedback) without breaking existing modules.
The following examples illustrate the proposed behavior using the cmd/go
script test format. For local testing and exploration, the test files can be
extracted using the txtar
tool.
cp go.mod go.mod.old
go mod tidy
cmp go.mod go.mod.old
# Before adding a new import, the go.mod file should
# enumerate modules for all packages already imported.
go list all
cmp go.mod go.mod.old
# When a new import is found, we should perform a deepening scan of the existing
# dependencies and add a requirement on the version required by those
# dependencies — not re-resolve 'latest'.
cp lazy.go.new lazy.go
go list all
cmp go.mod go.mod.new
-- go.mod --
module example.com/lazy
go 1.15
require (
example.com/a v0.1.0
example.com/b v0.1.0 // indirect
)
replace (
example.com/a v0.1.0 => ./a
example.com/b v0.1.0 => ./b
example.com/c v0.1.0 => ./c1
example.com/c v0.2.0 => ./c2
)
-- lazy.go --
package lazy
import (
_ "example.com/a/x"
)
-- lazy.go.new --
package lazy
import (
_ "example.com/a/x"
_ "example.com/a/y"
)
-- go.mod.new --
module example.com/lazy
go 1.15
require (
example.com/a v0.1.0
example.com/b v0.1.0 // indirect
example.com/c v0.1.0 // indirect
)
replace (
example.com/a v0.1.0 => ./a
example.com/b v0.1.0 => ./b
example.com/c v0.1.0 => ./c1
example.com/c v0.2.0 => ./c2
)
-- a/go.mod --
module example.com/a
go 1.15
require (
example.com/b v0.1.0
example.com/c v0.1.0
)
-- a/x/x.go --
package x
import _ "example.com/b"
-- a/y/y.go --
package y
import _ "example.com/c"
-- b/go.mod --
module example.com/b
go 1.15
-- b/b.go --
package b
-- c1/go.mod --
module example.com/c
go 1.15
-- c1/c.go --
package c
-- c2/go.mod --
module example.com/c
go 1.15
-- c2/c.go --
package c
cp go.mod go.mod.old
go mod tidy
cmp go.mod go.mod.old
# 'go list -m all' should include modules that cover the test dependencies of
# the packages imported by the main module, found via a deepening scan.
go list -m all
stdout 'example.com/b v0.1.0'
! stdout example.com/c
cmp go.mod go.mod.old
# 'go test' of any package in 'all' should use its existing dependencies without
# updating the go.mod file.
go list all
stdout example.com/a/x
go test example.com/a/x
cmp go.mod go.mod.old
-- go.mod --
module example.com/lazy
go 1.15
require example.com/a v0.1.0
replace (
example.com/a v0.1.0 => ./a
example.com/b v0.1.0 => ./b1
example.com/b v0.2.0 => ./b2
example.com/c v0.1.0 => ./c
)
-- lazy.go --
package lazy
import (
_ "example.com/a/x"
)
-- a/go.mod --
module example.com/a
go 1.15
require example.com/b v0.1.0
-- a/x/x.go --
package x
-- a/x/x_test.go --
package x
import (
"testing"
_ "example.com/b"
)
func TestUsingB(t *testing.T) {
// …
}
-- b1/go.mod --
module example.com/b
go 1.15
require example.com/c v0.1.0
-- b1/b.go --
package b
-- b1/b_test.go --
package b
import _ "example.com/c"
-- b2/go.mod --
module example.com/b
go 1.15
require example.com/c v0.1.0
-- b2/b.go --
package b
-- b2/b_test.go --
package b
import _ "example.com/c"
-- c/go.mod --
module example.com/c
go 1.15
-- c/c.go --
package c
cp go.mod go.mod.old
go mod tidy
cmp go.mod go.mod.old
# 'go list -m all' should include modules that cover the test dependencies of
# the packages imported by the main module, found via a deepening scan.
go list -m all
stdout 'example.com/b v0.1.0'
cmp go.mod go.mod.old
# 'go test all' should use those existing dependencies without updating the
# go.mod file.
go test all
cmp go.mod go.mod.old
-- go.mod --
module example.com/lazy
go 1.15
require (
example.com/a v0.1.0
)
replace (
example.com/a v0.1.0 => ./a
example.com/b v0.1.0 => ./b1
example.com/b v0.2.0 => ./b2
example.com/c v0.1.0 => ./c
)
-- lazy.go --
package lazy
import (
_ "example.com/a/x"
)
-- a/go.mod --
module example.com/a
go 1.15
require (
example.com/b v0.1.0
)
-- a/x/x.go --
package x
-- a/x/x_test.go --
package x
import _ "example.com/b"
func TestUsingB(t *testing.T) {
// …
}
-- b1/go.mod --
module example.com/b
go 1.15
-- b1/b.go --
package b
-- b1/b_test.go --
package b
import _ "example.com/c"
-- b2/go.mod --
module example.com/b
go 1.15
require (
example.com/c v0.1.0
)
-- b2/b.go --
package b
-- b2/b_test.go --
package b
import _ "example.com/c"
-- c/go.mod --
module example.com/c
go 1.15
-- c/c.go --
package c
go 1.14
dependencycp go.mod go.mod.old
go mod tidy
cmp go.mod go.mod.old
# 'go list -m all' should include modules that cover the test dependencies of
# the packages imported by the main module, found via a deepening scan.
go list -m all
stdout 'example.com/b v0.1.0'
stdout 'example.com/c v0.1.0'
cmp go.mod go.mod.old
# 'go test' of any package in 'all' should use its existing dependencies without
# updating the go.mod file.
#
# In order to satisfy reproducibility for the loaded packages, the deepening
# scan must follow the transitive module dependencies of 'go 1.14' modules.
go list all
stdout example.com/a/x
go test example.com/a/x
cmp go.mod go.mod.old
-- go.mod --
module example.com/lazy
go 1.15
require example.com/a v0.1.0
replace (
example.com/a v0.1.0 => ./a
example.com/b v0.1.0 => ./b
example.com/c v0.1.0 => ./c1
example.com/c v0.2.0 => ./c2
)
-- lazy.go --
package lazy
import (
_ "example.com/a/x"
)
-- a/go.mod --
module example.com/a
go 1.14
require example.com/b v0.1.0
-- a/x/x.go --
package x
-- a/x/x_test.go --
package x
import (
"testing"
_ "example.com/b"
)
func TestUsingB(t *testing.T) {
// …
}
-- b/go.mod --
module example.com/b
go 1.14
require example.com/c v0.1.0
-- b/b.go --
package b
import _ "example.com/c"
-- c1/go.mod --
module example.com/c
go 1.14
-- c1/c.go --
package c
-- c2/go.mod --
module example.com/c
go 1.14
-- c2/c.go --
package c