This question is related to that one: How can multiple definitions in static libraries be detected/prevented? and a sentence in the first answer.
The paragraph "One Definition Rule" of Cppreference () states:
"One and only one definition of every non-inline function or variable that is odr-used (see below) is required to appear in the entire program (including any standard and user-defined libraries). The compiler is not required to diagnose this violation, but the behavior of the program that violates it is undefined."
Does that mean that a program that has been created by linking to static libraries containing multiply defined symbols, has undefined behaviour? Because it violates the One Definition Rule?
Or is the behaviour defined? Because the linker will ensure that only one symbol is being used? And the behaviour of the linker is defined? Where is it defined how linkers have to create programs from static libraries?
This question is related to that one: How can multiple definitions in static libraries be detected/prevented? and a sentence in the first answer.
The paragraph "One Definition Rule" of Cppreference (https://en.cppreference.com/w/cpp/language/definition) states:
"One and only one definition of every non-inline function or variable that is odr-used (see below) is required to appear in the entire program (including any standard and user-defined libraries). The compiler is not required to diagnose this violation, but the behavior of the program that violates it is undefined."
Does that mean that a program that has been created by linking to static libraries containing multiply defined symbols, has undefined behaviour? Because it violates the One Definition Rule?
Or is the behaviour defined? Because the linker will ensure that only one symbol is being used? And the behaviour of the linker is defined? Where is it defined how linkers have to create programs from static libraries?
Share Improve this question edited Feb 6 at 10:14 Lundin 214k45 gold badges274 silver badges430 bronze badges asked Feb 6 at 9:52 Benjamin BihlerBenjamin Bihler 2,07316 silver badges39 bronze badges 3- try to link program having multiple (not weak) function definitions and you will know the answer – 0___________ Commented Feb 6 at 10:03
- @0___________, I don't get what you mean. With g++ the linker will choose the first definition. This depends for example on the order in which I have stated the static libraries on the command line. But even though I know for one set of tools what the outcome will be, this doesn't answer, whether the program behaviour is defined to be like that, does it?! – Benjamin Bihler Commented Feb 6 at 10:29
- If the multiple definitions are identical, then the program has defined behavior. If the multiple definitions are not identical, then that is an ODR violation, and the C++ standard does not define the behavior. – Eljay Commented Feb 6 at 12:10
3 Answers
Reset to default 2You are hoping for a C++ language lawyer's answer to a question about the well-definedness or otherwise of symbol resolution in the linkage of static libaries. You won't get that, because the C++ Standard has nothing to say about the linker's behaviour. Although all language compilers vitally rely on the linker's behaviour, the linker is not a language compiler, is unaware of source languages, and is not governed by any language standard.
When I mention symbols here, references or definitions thereof, I mean regular strong references and definitions of global symbols. There may be local symbols in object files that by definition are not in the linkage namespace. And there are also such things as global weak symbol references and definitions in the ELF format but they are an irrelevant technicality for your question.
And when I say program I mean the executable image output by the static linker.
The symbols that are defined in a program are exactly those symbols for which definitions are statically linked into it. Even symbols that the linkage resolves to definitions that are provided by shared libraries and are (therefore) not statically linked into the program are not defined in the program, although they are resolved by the linker. They are undefined references, left for the dynamic linker to define at runtime.
The only way in which symbol definitions get into a program is from the linkage of object files. Object files may be input to the linkage either explicitly and unconditionally:
(gcc|g++) -o prog main.o ...
or implicitly and unconditionally:
(gcc|g++) -o prog main.(c|cpp) ...
which is implicitly:
(gcc|g++) -c main.(c|cpp) -o main.o ...
(gcc|g++) -o prog main.o ...
or they may get in conditionally:
(gcc|g++) -o prog ... libfoobar.a
where let us say the static library libfoobar.a
archives the object files foo.o
and
bar.o
respectively providing definitions of global symbols foo
and bar
and nothing else.
In this case libfoobar(foo.o)
is extracted from the archive and linked into
the program on condition that a reference to foo
has already been linked
into the program in an earlier object file for which no definition has been linked
at the time when libfoobar.a
is inspected. Otherwise libfoobar(foo.o)
is not linked. The same goes for libfoobar(bar.o)
.
If libfoobar.a
does not furnish any definitions of symbols for which
there are undefined references in the program at the time when libfoobar.a
is inspected then it contributes nothing to the program. It might as well
not exist.
This behaviour - that the linker will link the first available archive member that defines a hitherto undefined symbol reference already in the program, and not thereafter look for other definitions - is the original and fundamental principle of linkage against static libraries, which amongst other things serves to ensure that such linkage honours the One Definition Rule in an absolutely well-defined way. The linkage fails with a multiple-definition error if the ODR cannot be upheld.
Now consider libbarfoo.a
which provides other definitions of foo
and bar
(or identical ones, it doesn't matter), and nothing else, and consider
a linkage:
(gcc|g++) -o prog ... libfoobar.a libbarfoo.a
in which undefined references to foo
and/or bar
have been linked into
the program when libfoobar.a
is inspected. In this linkage there are multiple
definitions of foo
and bar
within the sequence libfoobar.a libbarfoo.a
of static libraries. It doesn't matter. The references to foo
and/or
bar
are defined by linking libfoobar.a(foo.o)
and/or libfoobar.a(bar.o)
into the program. No further definitions are needed or sought. libbarfoo.a
contributes nothing to the program. All of the linkages:
(gcc|g++) -o prog ... libfoobar.a libbbarfoo.a
(gcc|g++) -o prog ... libfoobar.a libfoobar.a
(gcc|g++) -o prog ... libfoobar.a
(gcc|g++) -o prog ... foo.o bar.o
are equivalent and emit the same prog
. The fact that the program exists demonstrates that it does not harbour multiple definitions of any symbol.
It is the linker that guarantees you this, not the C++ Standard. The C++ Standard does not speak for the linker and can only declare that it does not define the behaviour when the behaviour in point falls to the linker, not the C++ compiler. The linker's behaviour is well-defined, and enforces the ODR, but not because any language standard says it must.
There is no overarching body issuing standards to which reputable linkers strive to conform. Each linker has a manual or reference documentation maintained by the maintainers of the linker. Since linkers have been fundamental tools for over 70 years, the weight of industry reliance on their well-defined and conservative behaviour is compelling. In the relevant case of the GNU Linker the manual is The GNU Linker. The item most relevant to your question is the option definition:
-l namespec
--library=namespec
Add the archive or object file specified by namespec to the list of files to link. This option may be used any number of times. If namespec is of the form ‘:filename ’, ld will search the library path for a file called filename, otherwise it will search the library path for a file called ‘libnamespec .a’. On systems which support shared libraries, ld may also search for files other than ‘libnamespec .a’. Specifically, on ELF and SunOS systems, ld will search a directory for a library called ‘libnamespec .so’ before searching for one called ‘libnamespec .a’. (By convention, a .so extension indicates a shared library.) Note that this behavior does not apply to ‘:filename ’, which always specifies a file called filename.
The linker will search an archive only once, at the location where it is specified on the command line. If the archive defines a symbol which was undefined in some object which appeared before the archive on the command line, the linker will include the appropriate file(s) from the archive. However, an undefined symbol in an object appearing later on the command line will not cause the linker to search the archive again.
...
The sentence emphasised by me means that at most one member object file will be linked from the static libraries in a linkage to obtain a definition of a symbol to which an undefined reference has accrued, and it will be the first such member object file found.
There are confusion with the word "undefined" here because it's mixing 2 different topics.
The linker behavior is correctly defined, it's documented and expected. It'll pick the first symbol it finds for GNU tools for example.
Yet, the C++ reference "undefined" state that, from a compiler perspective, the behavior of the program isn't defined. The compiler doesn't know anything about the linker (well, sort of).
So for the compiler, when it emits the binary code, as long as a declaration is provided and a definition is validated, the code is warranted to work as written.
Yet, if you compile 2 different definition for a declaration, the compiler doesn't know about it. It's only the linker that can sort out and select the appropriate definition.
From a user's perspective, however, the behavior is not "expected". Depending on the linking order (which is often misunderstood), either one of the definition will be used which can lead to ahem moments.
I think this is what CPP reference referred to when they used the word "undefined".
The standard says nothing about how linkers or static libraries work. It only says that things should not appear in the program more than once, but what does appear in the program actually mean?
Most linkers do not include unused parts of static libraries in the resulting executable file. Does this mean that the linker determines what does and does not appear in the program? Or the entire library conceptually appears in the program, and omission of the unused parts is merely an optimisation and thus an implementation detail?
This is open to interpretation. I am inclined towards the first view. The program is linked together from object files that are either given directly to, or pulled from libraries by, the linker. The entirety of the program is the set of source files that correspond to just those object files, but not to object files left behind by the linker. By this theory, the behaviour is well-defined by the standard (unless there are conflicting definitions of entities that do not correspond to linker symbols, such as types or inline functions). If you subscribe to the opposite view, then the standard leaves the behavioour undefined, but the implementation defines the behaviour instead.