1: [[!meta title="Symbol Versions in NetBSD Libraries"]]
2:
3: NetBSD implements various standard C language interfaces such as the
4: [[!template id=man name="time" section="3"]] function in POSIX in
5: `libc`, which has a prototype like this:
6:
7: time_t time(time_t *);
8:
9: However, between NetBSD 5 and NetBSD 6, the definition of the type
10: `time_t` in NetBSD changed on many architectures from 32-bit to 64-bit
11: to avoid the
12: [year 2038 problem](https://en.wikipedia.org/wiki/Year_2038_problem).
13: So programs compiled in NetBSD<=5 saw a declaration like
14:
15: int time(int *);
16:
17: which on most architectures is 32-bit, while programs compiled in
18: NetBSD>=6 see a declaration like
19:
20: int64_t time(int64_t *);
21:
22: These declarations are not compatible -- consider a program with a
23: fragment like:
24:
25: int before, after;
26:
27: time(&before);
28: ...
29: time(&after);
30:
31: This would work in NetBSD<=5, but in NetBSD>=6, the calls to
32: [[!template id=man name="time" section="3"]] might overwrite adjacent
33: positions on the stack, or crash altogether because the argument is
34: misaligned.
35:
36: Programs written and compiled on older versions of NetBSD are supposed
37: to continue to work -- with suitable emulators/compatNN packages and
38: compatNN.kmod modules or COMPAT_NN kernel options -- on newer versions
39: of NetBSD.
40:
41: To make this work, NetBSD's `libc` provides _two_ symbols:
42:
43: - `time`, which still implements the legacy prototype as before; and
44: - `__time50` (yes, this is not a typo for `__time60`), which implements
45: the new 64-bit prototype.
46:
47: The declaration in newer NetBSD
48: [time.h](https://nxr.netbsd.org/xref/src/include/time.h) is actually:
49:
50: time_t time(time_t *) __RENAME(__time50);
51:
52: where `__RENAME(__time50)` is a
53: [macro](https://nxr.netbsd.org/search?q=&project=src&defs=__RENAME&refs=&path=&hist=)
54: expanding to `__asm("__time50")`, which has the effect that the
55: compiler will use the symbol `__time50` for calls to the C function
56: this declares.
57: Thus, old programs with calls to the symbol `time` using the 32-bit
58: prototype will continue to work, and new programs will be compiled to
59: call the symbol `__time50` using the 64-bit prototype.
60: ([Details on how the symbols are implemented in `libc`.](https://nxr.netbsd.org/xref/src/lib/libc/README))
61:
62: # [[!template id=man name="dlsym" section="3"]] and symbol interposition
63:
64: **Programs that use
65: [[!template id=man name="dlsym" section="3"]],
66: such as C foreign function interfaces in dynamic languages like Python,
67: need to know that if they want the legacy 32-bit time() function, they
68: must use the symbol `time`, and if they want the modern 64-bit time()
69: function, they must use the symbol `__time50`.**
70:
71: **Similarly, programs that use `LD_PRELOAD` (see
72: [[!template id=man name="ld.elf_so" section="3"]])
73: to interpose their own definitions of symbols, such as
74: [[!template id=man name="rumphijack" section="3"]]
75: and
76: [torsocks](https://gitlab.torproject.org/legacy/trac/-/wikis/doc/torsocks),
77: must know to define `__time50` if they want to replace the new
78: semantics in new programs, or `time` if they want to replace the old
79: semantics in old programs.**
80:
81: The same applies to many other standard C functions, such as
82: [[!template id=man name="clock_gettime" section="3"]]
83: (`__clock_gettime50`) and
84: [[!template id=man name="socket" section="3"]]
85: (`__socket30`), which have all had their prototypes or semantics
86: revised at some point.
87:
88: Symbol interposition is very difficult to get right, and it is hard to
89: make programs that do it reliably.
90: On NetBSD, it should be reserved for certain standard library functions
91: like `malloc` and `free` (and `calloc` and everything else in that
92: family), and some system call stubs; except for the `__...50`
93: pseudo-versioned renames of public functions, you should not try to
94: interpose your own definition of any symbol beginning with ‘`_`’ (a
95: single underscore), which is reserved to the implementation in C.
96:
97: # Appendix: ELF symbol versions
98:
99: The renaming scheme of `__time50` is informal -- any symbol can be
100: renamed the same way, and NetBSD uses it for some other purposes too,
101: such as exposing a slightly different
102: [[!template id=man name="rename" section="2"]]
103: function via the symbol `__posix_rename` in programs that define
104: `_POSIX_C_SOURCE` but not `_NETBSD_SOURCE`.
105:
106: The GNU ELF toolchain (gcc, ld, &c.) supports a formal concept of
107: ‘symbol versions’ with sections called `.gnu.version` (associating
108: versions with symbols), `.gnu.version_d` (versions defined in an
109: object), and `.gnu.version_n` (versions needed in an object).
110: As of 2020, NetBSD does not use ELF symbol versions, although the
111: linker and loader support them for libraries developed outside NetBSD.
112:
113: The semantics is:
114:
115: - When creating a library, a version map may be specified like so:
116:
117: NetBSD_BASE {
118: global:
119: __time50;
120: free;
121: malloc;
122: time;
123: local:
124: *;
125: };
126:
127: NetBSD_6 {
128: global:
129: time;
130: };
131:
132: The library can specify what versioned symbol each definition in the
133: library is exposed with:
134:
135: __asm(".symver time_legacy,time@NetBSD_BASE");
136: int time_legacy(int *t) { ... }
137:
138: __asm(".symver time64,time@@NetBSD_6"); /* default version */
139: int64_t time64(int64_t *t) { ... }
140:
141: __asm(".symver __time50,__time50@NetBSD_BASE");
142: __typeof(time) __time50 __attribute__((__alias__("time64")));
143:
144: Versions marked with `@@` are _default_ versions; versions marked
145: with `@` are non-default.
146:
147: - When running a program that was linked _without_ ELF symbol versions,
148: from before the library had ELF symbol versions (like `libc` today),
149: the first version in the map is used to resolve symbols:
150:
151: - Old programs calling the legacy `time` symbol will get
152: `time@NetBSD_BASE`, which is defined via `time_legacy` above.
153:
154: - Programs calling `__time50` will get `__time50@NetBSD_BASE`, which
155: is defined via `time64` above.
156:
157: - When linking a program against a library with symbol versions, the
158: linker will record what the default version was; when later running
159: the program, the stored symbol version will be used.
160: If there is no default version, and the program did not request a
161: specific version with `.symver`, then the linker refuses to link, so
162: obsolete symbols can be ‘removed’ by giving them only non-default
163: versions -- thus old programs continue to work but new programs can't
164: be made that use the obsolete symbols.
165:
166: For example, if [[!template id=man name="time" section="3"]] is
167: declared in a header file as simply
168:
169: typedef int64_t time_t;
170: time_t time(time_t *);
171:
172: then new programs will be linked against `time@NetBSD_6`, which is
173: the default version for the symbol name `time`.
174: If NetBSD ever changed the prototype of
175: [[!template id=man name="time" section="3"]]
176: again, and defined a `time@NetBSD_11` as the new default version,
177: existing programs compiled with `time@NetBSD_6` would continue to get
178: the semantics they were built against.
179:
180: - When a program uses
181: [[!template id=man name="dlsym" section="3"]],
182: it always gets the default version, if any.
183: Programs can request specific versions with
184: [[!template id=man name="dlvsym" section="3"]].
185:
186: ## ELF symbol versions versus `__...50` pseudo-versions
187:
188: ELF symbol versions and NetBSD's `__time50` pseudo-version renaming
189: scheme both try to address the same problem: making sure old programs
190: that were built under the assumption of the old semantics continue to
191: run unmodified with new libraries.
192:
193: Both of them run into problems with
194: [[!template id=man name="dlsym" section="3"]]
195: and symbol interposition:
196:
197: - A program written _today_ that expects to find the function time() in
198: `libc`, such as a C foreign function interface for a dynamic language
199: like Python, needs to know to call `dlsym("__time50")`; otherwise it
200: will get an obsolete definition that does not match the semantics of
201: the current definition of `time_t`, possibly leading to data
202: corruption, crashes, or worse.
203:
204: - If `libc` used used ELF symbol versions, then `dlsym("time")` would
205: return the modern symbol.
206:
207: But any _old_ programs that used `dlsym("time")` assuming it
208: returned the legacy definition (which was the ‘modern’ definition at
209: the time the programs were written and built) will break if it
210: instead returns the 64-bit definition.
211:
212: And if we ever modified
213: [[!template id=man name="time" section="3"]]
214: again (hypothetically, to extend it to 128-bit galactic-scale times),
215: programs written assuming that `dlsym("time")` returns the 64-bit
216: definition will break if it begins to return the 128-bit definition.
217: Programs could future-proof themselves by using `dlsym("time",
218: "NetBSD_6")` explicitly, but this is no better than writing
219: `dlsym("__time50")` explicitly.
220:
221: **Thus, switching from the pseudo-versions we use to ELF symbol
222: versions doesn't improve the
223: [[!template id=man name="dlsym" section="3"]]
224: situation -- in fact, it makes the situation _worse_, by breaking old
225: programs and providing no way for new programs to bind to the name of
226: the current version.**
227:
228: Perhaps we could create a compiler builtin `__builtin_asm_name` which
229: would expand to the `__asm("...")` name by which a C identifier has
230: been declared -- then programs could instead do:
231:
232: __typeof(time) *timep = dlsym(dso, __builtin_asm_name(time));
233:
234: This way the text of the program is the same no matter how
235: [[!template id=man name="time" section="3"]]
236: is declared in the header file, but it will continue to work across
237: changes to the signature of the
238: [[!template id=man name="time" section="3"]]
239: function in newer releases of NetBSD.
240:
241: # References
242:
243: - Jörg Sonnenberger,
244: [How to break long-term compatibility in NetBSD](https://www.NetBSD.org/gallery/presentations/joerg/asiabsdcon2016/asiabsdcon2016.pdf),
245: AsiaBSDcon 2016.
246:
247: - Ulrich Drepper,
248: [How To Write Shared Libraries](https://akkadia.org/drepper/dsohowto.pdf),
249: 2011-12-10.
250:
251: - Ulrich Drepper,
252: [ELF Symbol Versioning](https://akkadia.org/drepper/symbol-versioning)
CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb