On many systems pkgsrc supports, gcc is the standard compiler. In general, different versions of each OS have different gcc versions, and some packages require newer gcc versions, in order to support newer language standards (e.g. c++11, written in the style of USE_LANGUAGES), or because older versions don't work (infrequently).
This page discusses issues related to version selection, and intends to be a design document for how pkgsrc should address this problem, to be converted into historical design rationale once implemented. It freely takes content from extensive mailinglist discussions, and attempts to follow the rough consensus that has emerged.
Base system gcc vs pkgsrc gcc
Systems using gcc (e.g. NetBSD) have a compiler as /usr/bin/gcc, and this is usable by pkgsrc without any bootstrapping activity. One can build gcc versions (typically newer versions) from pkgsrc, resulting in a compiler within ${PREFIX}, e.g. /usr/pkg/gcc6/bin/gcc. This compiler can then be used to compile other packages.
The issue with using base system gcc is typically that it is too old, such as gcc 4.5 with NetBSD 6, which cannot compile c++11. Another example is gcc 4.8 with NetBSD 7. While this can compile most c++11 programs, it cannot be used for firefox or glibmm (and therefore any package that links against glibmm).
Issues when using pkgsrc gcc are that
- on some platforms, pkgsrc gcc does not build and work
- it must be bootstrapped, requiring compiling a number of packages with the system compiler
- C++ packages that are linked together should be built with the same compiler, because the standard library ABI is not necessarily the same for each compiler version
- While C packages can be built with mixed versions, the binary should be linked with the higher version because the support library is backwards compatible but not forward compatible.
Specific constraints and requirements
This section attempts to gather all the requirements.
By default, pkgsrc should be able to build working packages, even for packages that need a newer compiler than that provided in the base system.
The set of packages that are needed when building a bootstrap compiler should be minimized.
All packages that use C should have final linking with the highest version used in any included library.
All packages that use C++ should be built with the same compiler version. Because these in the general case may include C, the version used for C++ must be at least as new as the version used for any used C package.
pkgsrc should avoid building gcc unless it is more or less necessary to build packges. (As an example, if the base system gcc can build c99 but not c++11, building a c99-only program should not trigger building a gcc version adequate for c++11.)
The compiler selection logic should work on NetBSD 6 and newer, and other systems currently supported by pkgsrc, including in-use LTS GNU/Linux systems. It should work on systems that default to clang, when set to use GCC, at least as well as the current scheme. It is desirable for this logic to work on NetBSD 5.
All systems should work at least as well as they do before implementation of new compiler selection logic.
The compiler selection logic should be understandable and not brittle.
Design
The above requirements could in theory be satisfied in many ways, but most of them are too complicated. We present a design that aims to be sound while mimimizing complexity.
Packages declare what languages they need, with c++, c++11, and c++14 being expressed differently. (This is exactly current practice and just noted for completeness.)
The package-settable variable GCC_REQD will be used only when a compiler that generally can compile the declared language version is insufficient. These cases are expected to be relatively rare; an example is firefox that is in c++ (but not c+11) and needs gcc 4.9.
A user-settable variable PKGSRC_GCC_VERSION will declare the version of gcc to be used for C programs, with an OS-, version- and architeture- specific default.
A user-settable variable PKGSRC_GXX_VERSION will declare the version of gcc to be used for all C++ programs, again with an OS-, version- and architeture-specific default. It must be at least PKGSRC_GCC_VERSION.
If PKGSRC_GCC_VERSION and PKGSRC_GXX_VERSION are not set, the system will behave much as before. As a possible exception, builds may still fail if the required version is greater than the base system version. So far the only known reason to avoid setting these variable is if pkgsrc gcc cannot be built.
Each of c99, c++, c++11, and c++14 will be associated with a minimum gcc version, such that almost all programs declaring that language can be built with that version. (This avoids issues of strict compliance with c++11, which requires a far higher version of gcc than the version required to compile almost all actual programs in c++11.)
The minimum version inferred from the language tag will be combined with any GCC_REQD declarations to find a minimum version for a specific package. If that is greater than PKGSRC_GCC_VERSION (programs using only C) or PKGSRC_GXX_VERSION, package building will fail. We call the resulting PKGSRC_GCC_VERSION or PKGSRC_GXX_VERSION the chosen version.
When building a program using C or C++, if the chosen version is not provided by the base system, and the chosen version is not installed via pkgsrc, then it (and its dependencies) will be built from pkgsrc in a special bootstrap mode. When building in bootstrap mode, the version selection logic is ignored and the base system compiler is used. Consistency and reproducible builds require that a package built with the normal prefix must be the same whether built because of compiler bootstrapping or normal use.
There are thus two choices for dealing with bootstrapping. One is to use a distinct prefix, which will ensure that all packages that are part of the compiler bootstrap will not be linked into normal pkgsrc programs. This implies that any dependencies of gcc may exist twice, once in bootstrap mode and once if built normally. A gcc version itself will be built twice, if it is desired for regular use. This double building and the complexity of a second prefix are the negatives of this approach.
The other choice is to mark gcc and all depending packages as used for compiler bootstrapping, and to always build those with the base compiler. We use the package-settable variable PKGSRC_GCC_BOOTSTRAP=yes to denote this. The negative with this approach is possible inconsistency with gcc's dependencies being built with the base compiler and used later.
As an alternative, we store lists of bootstrap packages in a variable, because it will vary with OS and version, and with PREFER_PKGSRC settings.
As a third alternative, we pass a GCC_BOOTSTRAPPING variable recursively. This is easier but less consistent.
We hope that the chosen version can be built using the base system version, and hope to avoid multi-stage bootstrapping.
We expect that any program containing C++ will undergo final linking with a C++ compiler. This is not a change from the current situation.
Remaining issues
gcc dependencies introduction
Because gcc can have dependencies, there could be packages built with the system compiler that are then later used with the chosen version. For now, we defer worrying about these problems (judging that they will be less serious than the current situation where all c++11 programs fail to build on NetBSD 6).
\todo: Perhaps change gcc 4.8 and 4.9 to enable gcc-inplace-math by default. Perhaps decide that if we want to build gcc, we want to build 5 or 6, and 4.9 is no longer of interest as a bootstrap target.
\todo: Analyze what build-time and install-time dependencies actually exist. Include old GNU/Linux in this analysis.
\todo: Consider if dropping nls would help. (On NetBSD, it seems that base system libraries are used, so it would not help.)
\todo: Consider failing if optins that we want one way are another, when bootstrapping.
managing gcc dependencies
There are multiple paths forward.
\todo Choose one. Straw proposal is "Don't worry" and recursive variable for the initial implementation.
Separate prefix
Build compilers in a separate prefix, or a subprefix, so that the compiler and the packages needed to build it will not be used by any normal packages. This completely avoids the issue of building a package one way in bootstrap and another not in bootstrap, at the cost of two builds and writing the separate-prefix code.
Don't worry
Don't worry that packages used to bootstrap the needed compiler are compiled with an older compiler. Don't worry that they might be different depending on build order. If we have an actual problem, deal with it. This requires choosing an approach to omit compiler selection logic when building the compiler:
Mark bootstrap packages
Mark packages used to build gcc as PKGSRC_GCC_BOOTSTRAP=yes. Conditionalize this on OPSYS if necessary. Don't force the compiler if this is set.
Alternatively, manage a per-OS list of packages in a central mk file.
Pass a recursive variable
As above, but set PKGSRC_GCC_BOOTSTRAP=yes in the evniroment of the call to build the compiler, so that all dependencies inherit permission to skip compiler selection logic. (Alternatively, use some other mechanism such as passing a make variable explicitly.)
Differing GCC and GXX versions
Perhaps it is a mistake to allow the chosen GCC and GXX versions to differ. If we require them to be the same, then essentially all systems with a base system compiler older than gcc 5 will have to bootstrap the compiler. For now, we allow them to differ and will permit the defaults to differ.
gcc versions and number of buildable packages
A gcc version that is too old will not build a number of packages. Anything older than 4.8 fails for c++11. 4.8 fails on some c++11 packages, such as firefox and glibmm.
A version that is too new also fails to build packages. Jason Bacon posted counts to tech-pkg indicate that 5 is close to 4.8 in the number of packages built, and that moving to 6 causes hundreds of additional failures. (Keep in mind that currently, building with 4.8 will build 4.9 for firefox, but in the future will not.)
www/pkgsrc/packages/sharedapps/pkg-2017Q3/RHEL6-gcc48/All 16461
www/pkgsrc/packages/sharedapps/pkg-2017Q3/RHEL6-gcc6/All 15849
www/pkgsrc/packages/sharedapps/pkg-2017Q3/RHEL7-gcc48/All 16414
www/pkgsrc/packages/sharedapps/pkg-2017Q3/RHEL7-gcc5/All 16338
Therefore, the current answer to "What is the best version to use" is 5.
Default versions for various systems
Note that if for any particular system's set of installed packages (or bulk build), a newer gcc has to be built, it does not hurt to have built it earlier.
When the base system is old (e.g., gcc 4.5 in NetBSD 6, or 4.1, in NetBSD 5), then it is clear that a newer version must be built. For these, PKGSRC_GXX_VERSION should default to a newish gcc, avoiding being so new as to cause building issues. PKGSRC_GCC_VERSION should probably default to the system version if it can build all C99 programs, or match PKGSRC_GXX_VERSION, if the system version is too old. Perhaps gcc 4.5 would be used, but 4.1 not used. \todo Discuss.
When the base system is almost new enough, the decision about the default is more complicated. A key example is gcc 4.8, found in NetBSD 7. Firefox requires gcc 4.9, and all programs using c++14 also need a newer version. One options is to choose 4.8, resulting in firefox failing, as well as all c++14 programs. Another is to choose 4.9, but this makes little sense because c++14 programs will still fail, and the general rule of moving to the most recent generally-acceptable version applies, which currently leads to gcc5. This is in effect a declaration that "almost new enough" does not count as new enough. Thus the plan for NetBSD 7 is to set PKGSRC_GCC_VERSION to 4.8 and PKGSRC_GXX_VERSION to 5.
When the base system is new enough, e.g. gcc 5, 6 or 7 it should simply be used. By "new enough", we mean that almost no programs in pkgsrc fail to build with it (because it is too old), which implies that it supports (almost all) C++14 programs. Our current definiton of new enough is gcc 5.
Limited mixed versions
One approach would be to allow limited mixed versions, where individual programs could force a specific version to be bootstrapped and used, so that e.g. firefox could use 4.9 even though most programs use 4.8, which is what happens now on NetBSD 7. This would rely on being able to link c++ with 4.9 including some things built with 4.8 (which is done presently). However, this approach would become unsound with a library rather than an end program. We reject this as too much complexity for avoiding building a newer compiler in limited situations.
Fortran
Fortran support is currently somewhat troubled.. It seems obvious to extend to PGKSRC_GFORTRAN_VERSION, and have that match PKGSRC_GCC_VERSION or PKGSRC_GXX_VERSION, but the Fortran situation is not worsened by the above design.
When building a gcc version, we get gfortran. Perhaps, because of fortran, we should require a single version, vs a C and a C++ version.
\todo Discuss.
C++ libraries used by C programs
The choice of one version for C++ and one for C (e.g. 5, 4.8 on netbsd-7) breaks down if a C program links against a library that is written in C++ but provides a C API, because we still need the C++ version's stdlib.
\todo Define a variable for such packages to have in their buildlink3, which will not add c++ to USE_LANGUAGES but will force PKGSRC_GXX_VERSION to be used. Or decide that this is a good reason to really just have one compiler version.
Path forward
(This assumes per-package marking of bootstrap packages, but is reasonably obviously extended to the other schemes.)
Modify all gcc packages to have minimal dependencies, and to add PKGSRC_GCC_BOOTSTRAP.
Modify the compiler selection logic to do nothing if PKGSRC_GCC_BOOTSTRAP is set.
Modify the compiler selection logic for LANGUAGES= to fail if PKGSRC_GCC_VERSION/PKGSRC_GXX_VERSION is not new enough.
Modify the compiler selection logic for GCC_REQD to fail if PKGSRC_GCC_VERSION/PKGSRC_GXX_VERSION is not new enough.
Decide on defaults. The straw proposal is that PKGSRC_GCC_VERSION is the base system version if >= 4.5 (or 4.4?), and otherwise 5, and that PKGSRC_GXX_VERSION is the base system version if >= 5, and otherwise 5. Implement these in platform.mk as they are tested.
Later steps
- Address fortran. Probably add PKGSRC_GFORTRAN_VERSION, after determining how Fortran, C and C++ interact with library ABI compatibility.
Data
This section has data points that are relevant to the discussion.
amd64/i386
It is believed that pkgsrc gcc generally builds on these systems. gcc6 builds on netbsd-5/i386.
macppc
On macppc, lang/gcc5 fails on netsbd-6 and netbsd-7, but succeeds on netbsd-8.