docs: add a clang_fortify_anatomy doc
This is mostly copy-pasted from go/clang-fortify-anatomy. Since it
offers extensive documentation on how FORTIFY works in Bionic, having it
also live within Bionic seems quite helpful.
Bug: 235150687
Fixes: 235150687
Test: None
Change-Id: I20145a5ba3155b1c7b3977f9b688320a7fda4ea2
diff --git a/docs/clang_fortify_anatomy.md b/docs/clang_fortify_anatomy.md
new file mode 100644
index 0000000..4b95fdc
--- /dev/null
+++ b/docs/clang_fortify_anatomy.md
@@ -0,0 +1,841 @@
+*This document was originally written for a broad audience, and it was*
+*determined that it'd be good to hold in Bionic's docs, too. Due to the*
+*ever-changing nature of code, it tries to link to a stable tag of*
+*Bionic's libc, rather than the live code in Bionic. Same for Clang.*
+*Reader beware. :)*
+
+# The Anatomy of Clang FORTIFY
+
+## Objective
+
+The intent of this document is to run through the minutiae of how Clang FORTIFY
+actually works in Bionic at the time of writing. Other FORTIFY implementations
+that target Clang should use very similar mechanics. This document exists in part
+because many Clang-specific features serve multiple purposes simultaneously, so
+getting up-to-speed on how things function can be quite difficult.
+
+## Background
+
+FORTIFY is a broad suite of extensions to libc aimed at catching misuses of
+common library functions. Textually, these extensions exist purely in libc, but
+all implementations of FORTIFY rely heavily on C language extensions in order
+to function at all.
+
+Broadly, FORTIFY implementations try to guard against many misuses of C
+standard(-ish) libraries:
+- Buffer overruns in functions where pointers+sizes are passed (e.g., `memcpy`,
+ `poll`), or where sizes exist implicitly (e.g., `strcpy`).
+- Arguments with incorrect values passed to libc functions (e.g.,
+ out-of-bounds bits in `umask`).
+- Missing arguments to functions (e.g., `open()` with `O_CREAT`, but no mode
+ bits).
+
+FORTIFY is traditionally enabled by passing `-D_FORTIFY_SOURCE=N` to your
+compiler. `N==0` disables FORTIFY, whereas `N==1`, `N==2`, and `N==3` enable
+increasingly strict versions of it. In general, FORTIFY doesn't require user
+code changes; that said, some code patterns
+are [incompatible with stricter versions of FORTIFY checking]. This is largely
+because FORTIFY has significant flexibility in what it considers to be an
+"out-of-bounds" access.
+
+FORTIFY implementations use a mix of compiler diagnostics and runtime checks to
+flag and/or mitigate the impacts of the misuses mentioned above.
+
+Further, given FORTIFY's design, the effectiveness of FORTIFY is a function of
+-- among other things -- the optimization level you're compiling your code at.
+Many FORTIFY implementations are implicitly disabled when building with `-O0`,
+since FORTIFY's design for both Clang and GCC relies on optimizations in order
+to provide useful run-time checks. For the purpose of this document, all
+analysis of FORTIFY functions and commentary on builtins assume that code is
+being built with some optimization level > `-O0`.
+
+### A note on GCC
+
+This document talks specifically about Bionic's FORTIFY implementation targeted
+at Clang. While GCC also provides a set of language extensions necessary to
+implement FORTIFY, these tools are different from what Clang offers. This
+divergence is an artifact of Clang and GCC's differing architecture as
+compilers.
+
+Textually, quite a bit can be shared between a FORTIFY implementation for GCC
+and one for Clang (e.g., see [ChromeOS' Glibc patch]), but this kind of sharing
+requires things like macros that expand to unbalanced braces depending on your
+compiler:
+
+```c
+/*
+ * Highly simplified; if you're interested in FORTIFY's actual implementation,
+ * please see the patch linked above.
+ */
+#ifdef __clang__
+# define FORTIFY_PRECONDITIONS
+# define FORTIFY_FUNCTION_END
+#else
+# define FORTIFY_PRECONDITIONS {
+# define FORTIFY_FUNCTION_END }
+#endif
+
+/*
+ * FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN is not defined, due to its
+ * complexity and irrelevance. It turns into a compile-time warning if the
+ * compiler can determine `*buf` has fewer than `size` bytes available.
+ */
+
+char *getcwd(char *buf, size_t size)
+FORTIFY_PRECONDITIONS
+ FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN(buf, size, "`buf` is too smol.")
+{
+ // Actual shared function implementation goes here.
+}
+FORTIFY_FUNCTION_END
+```
+
+All talk of GCC-focused implementations and how to merge Clang and GCC
+implementations is out-of-scope for this doc, however.
+
+## The Life of a Clang FORTIFY Function
+
+As referenced in the Background section, FORTIFY performs many different checks
+for many functions. This section intends to go through real-world examples of
+FORTIFY functions in Bionic, breaking down how each part of these functions
+work, and how the pieces fit together to provide FORTIFY-like functionality.
+
+While FORTIFY implementations may differ between stdlibs, they broadly follow
+the same patterns when implementing their checks for Clang, and they try to
+make similar promises with respect to FORTIFY compiling to be zero-overhead in
+some cases, etc. Moreover, while this document specifically examines Bionic,
+many stdlibs will operate _very similarly_ to Bionic in their Clang FORTIFY
+implementations.
+
+**In general, when reading the below, be prepared for exceptions, subtlety, and
+corner cases. The individual function breakdowns below try to not offer
+redundant information. Each one focuses on different aspects of FORTIFY.**
+
+### Terminology
+
+Because FORTIFY should be mostly transparent to developers, there are inherent
+naming collisions here: `memcpy(x, y, z)` turns into fundamentally different
+generated code depending on the value of `_FORTIFY_SOURCE`. Further, said
+`memcpy` call with `_FORTIFY_SOURCE` enabled needs to be able to refer to the
+`memcpy` that would have been called, had `_FORTIFY_SOURCE` been disabled.
+Hence, the following convention is followed in the subsections below for all
+prose (namely, multiline code blocks are exempted from this):
+
+- Standard library function names preceded by `__builtin_` refer to the use of
+ the function with `_FORTIFY_SOURCE` disabled.
+- Standard library function names without a prefix refer to the use of the
+ function with `_FORTIFY_SOURCE` enabled.
+
+This convention also applies in `clang`. `__builtin_memcpy` will always call
+`memcpy` as though `_FORTIFY_SOURCE` were disabled.
+
+## Breakdown of `mempcpy`
+
+The [FORTIFY'ed version of `mempcpy`] is a full, featureful example of a
+FORTIFY'ed function from Bionic. From the user's perspective, it supports a few
+things:
+- Producing a compile-time error if the number of bytes to copy trivially
+ exceeds the number of bytes available at the destination pointer.
+- If the `mempcpy` has the potential to write to more bytes than what is
+ available at the destination, a run-time check is inserted to crash the
+ program if more bytes are written than what is allowed.
+- Compiling away to be zero overhead when none of the buffer sizes can be
+ determined at compile-time[^1].
+
+The declaration in Bionic's headers for `__builtin_mempcpy` is:
+```c
+void* mempcpy(void* __dst, const void* __src, size_t __n) __INTRODUCED_IN(23);
+```
+
+Which is annotated with nothing special, except for Bionic's versioner, which
+is Android-specific (and orthogonal to FORTIFY anyway), so it will be ignored.
+
+The [source for `mempcpy`] in Bionic's headers for is:
+```c
+__BIONIC_FORTIFY_INLINE
+void* mempcpy(void* const dst __pass_object_size0, const void* src, size_t copy_amount)
+ __overloadable
+ __clang_error_if(__bos_unevaluated_lt(__bos0(dst), copy_amount),
+ "'mempcpy' called with size bigger than buffer") {
+#if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
+ size_t bos_dst = __bos0(dst);
+ if (!__bos_trivially_ge(bos_dst, copy_amount)) {
+ return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
+ }
+#endif
+ return __builtin_mempcpy(dst, src, copy_amount);
+}
+```
+
+Expanding some of the important macros here, this function expands to roughly:
+```c
+static
+__inline__
+__attribute__((no_stack_protector))
+__attribute__((always_inline))
+void* mempcpy(
+ void* const dst __attribute__((pass_object_size(0))),
+ const void* src,
+ size_t copy_amount)
+ __attribute__((overloadable))
+ __attribute__((diagnose_if(
+ __builtin_object_size(dst, 0) != -1 && __builtin_object_size(dst, 0) <= copy_amount),
+ "'mempcpy' called with size bigger than buffer"))) {
+#if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
+ size_t bos_dst = __builtin_object_size(dst, 0);
+ if (!(__bos_trivially_ge(bos_dst, copy_amount))) {
+ return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
+ }
+#endif
+ return __builtin_mempcpy(dst, src, copy_amount);
+}
+```
+
+So let's walk through this step by step, to see how FORTIFY does what it says on
+the tin here.
+
+[^1]: "Zero overhead" in a way [similar to C++11's `std::unique_ptr`]: this will
+turn into a direct call `__builtin_mempcpy` (or an optimized form thereof) with
+no other surrounding checks at runtime. However, the additional complexity may
+hinder optimizations that are performed before the optimizer can prove that the
+`if (...) { ... }` can be optimized out. Depending on how late this happens,
+the additional complexity may skew inlining costs, hide opportunities for e.g.,
+`memcpy` coalescing, etc etc.
+
+### How does Clang select `mempcpy`?
+
+First, it's critical to notice that `mempcpy` is marked `overloadable`. This
+function is a `static inline __attribute__((always_inline))` overload of
+`__builtin_mempcpy`:
+- `__attribute__((overloadable))` allows us to perform overloading in C.
+- `__attribute__((overloadable))` mangles all calls to functions marked with
+ `__attribute__((overloadable))`.
+- `__attribute__((overloadable))` allows exactly one function signature with a
+ given name to not be marked with `__attribute__((overloadable))`. Calls to
+ this overload will not be mangled.
+
+Second, one might note that this `mempcpy` implementation has the same C-level
+signature as `__builtin_mempcpy`. `pass_object_size` is a Clang attribute that
+is generally needed by FORTIFY, but it carries the side-effect that functions
+may be overloaded simply on the presence (or lack of presence) of
+`pass_object_size` attributes. Given two overloads of a function that only
+differ on the presence of `pass_object_size` attributes, the candidate with
+`pass_object_size` attributes is preferred.
+
+Finally, the prior paragraph gets thrown out if one tries to take the address of
+`mempcpy`. It is impossible to take the address of a function with one or more
+parameters that are annotated with `pass_object_size`. Hence,
+`&__builtin_mempcpy == &mempcpy`. Further, because this is an issue of overload
+resolution, `(&mempcpy)(x, y, z);` is functionally identical to
+`__builtin_mempcpy(x, y, z);`.
+
+All of this accomplishes the following:
+- Direct calls to `mempcpy` should call the FORTIFY-protected `mempcpy`.
+- Indirect calls to `&mempcpy` should call `__builtin_mempcpy`.
+
+### How does Clang offer compile-time diagnostics?
+
+Once one is convinced that the FORTIFY-enabled overload of `mempcpy` will be
+selected for direct calls, Clang's `diagnose_if` and `__builtin_object_size` do
+all of the work from there.
+
+Subtleties here primarily fall out of the discussion in the above section about
+`&__builtin_mempcpy == &mempcpy`:
+```c
+#define _FORTIFY_SOURCE 2
+#include <string.h>
+void example_code() {
+ char buf[4]; // ...Assume sizeof(char) == 1.
+ const char input_buf[] = "Hello, World";
+ mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued.
+
+ mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5.
+ __builtin_mempcpy(buf, input_buf, 5); // No compile-time error.
+ (&mempcpy)(buf, input_buf, 5); // No compile-time error, since __builtin_mempcpy is selected.
+}
+```
+
+Otherwise, the rest of this subsection is dedicated to preliminary discussion
+about `__builtin_object_size`.
+
+Clang's frontend can do one of two things with `__builtin_object_size(p, n)`:
+- Evaluate it as a constant.
+ - This can either mean declaring that the number of bytes at `p` is definitely
+ impossible to know, so the default value is used, or the number of bytes at
+ `p` can be known without optimizations.
+- Declare that the expression cannot form a constant, and lower it to
+ `@llvm.objectsize`, which is discussed in depth later.
+
+In the examples above, since `diagnose_if` is evaluated with context from the
+caller, Clang should be able to trivially determine that `buf` refers to a
+`char` array with 4 elements.
+
+The primary consequence of the above is that diagnostics can only be emitted if
+no optimizations are required to detect a broken code pattern. To be specific,
+clang's constexpr evaluator must be able to determine the logical object that
+any given pointer points to in order to fold `__builtin_object_size` to a
+constant, non-default answer:
+
+```c
+#define _FORTIFY_SOURCE 2
+#include <string.h>
+void example_code() {
+ char buf[4]; // ...Assume sizeof(char) == 1.
+ const char input_buf[] = "Hello, World";
+ mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued.
+ mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5.
+ char *buf_ptr = buf;
+ mempcpy(buf_ptr, input_buf, 5); // No compile-time error; `buf_ptr`'s target can't be determined.
+}
+```
+
+### How does Clang insert run-time checks?
+
+This section expands on the following statement: FORTIFY has zero runtime cost
+in instances where there is no chance of catching a bug at run-time. Otherwise,
+it introduces a tiny additional run-time cost to ensure that functions aren't
+misused.
+
+In prior sections, the following was established:
+- `overloadable` and `pass_object_size` prompt Clang to always select this
+ overload of `mempcpy` over `__builtin_mempcpy` for direct calls.
+- If a call to `mempcpy` was trivially broken, Clang would produce a
+ compile-time error, rather than producing a binary.
+
+Hence, the case we're interested in here is one where Clang's frontend selected
+a FORTIFY'ed function's implementation for a function call, but was unable to
+find anything seriously wrong with said function call. Since the frontend is
+powerless to detect bugs at this point, our focus shifts to the mechanisms that
+LLVM uses to support FORTIFY.
+
+Going back to Bionic's `mempcpy` implementation, we have the following (ignoring
+diagnose_if and assuming run-time checks are enabled):
+```c
+static
+__inline__
+__attribute__((no_stack_protector))
+__attribute__((always_inline))
+void* mempcpy(
+ void* const dst __attribute__((pass_object_size(0))),
+ const void* src,
+ size_t copy_amount)
+ __attribute__((overloadable)) {
+ size_t bos_dst = __builtin_object_size(dst, 0);
+ if (bos_dst != -1 &&
+ !(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) {
+ return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
+ }
+ return __builtin_mempcpy(dst, src, copy_amount);
+}
+```
+
+In other words, we have a `static`, `always_inline` function which:
+- If `__builtin_object_size(dst, 0)` cannot be determined (in which case, it
+ returns -1), calls `__builtin_mempcpy`.
+- Otherwise, if `copy_amount` can be folded to a constant, and if
+ `__builtin_object_size(dst, 0) >= copy_amount`, calls `__builtin_mempcpy`.
+- Otherwise, calls `__builtin___mempcpy_chk`.
+
+
+How can this be "zero overhead"? Let's focus on the following part of the
+function:
+
+```c
+ size_t bos_dst = __builtin_object_size(dst, 0);
+ if (bos_dst != -1 &&
+ !(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) {
+```
+
+If Clang's frontend cannot determine a value for `__builtin_object_size`, Clang
+lowers it to LLVM's `@llvm.objectsize` intrinsic. The `@llvm.objectsize`
+invocation corresponding to `__builtin_object_size(p, 0)` is guaranteed to
+always fold to a constant value by the time LLVM emits machine code.
+
+Hence, `bos_dst` is guaranteed to be a constant; if it's -1, the above branch
+can be eliminated entirely, since it folds to `if (false && ...)`. Further, the
+RHS of the `&&` in this branch has us call `__builtin_mempcpy` if `copy_amount`
+is a known value less than `bos_dst` (yet another constant value). Therefore,
+the entire condition is always knowable when LLVM is done with LLVM IR-level
+optimizations, so no condition is ever emitted to machine code in practice.
+
+#### Why is "zero overhead" in quotes? Why is `unique_ptr` relevant?
+
+`__builtin_object_size` and `__builtin_constant_p` are forced to be constants
+after most optimizations take place. Until LLVM replaces both of these with
+constants and optimizes them out, we have additional branches and function calls
+in our IR. This can have negative effects, such as distorting inlining costs and
+inhibiting optimizations that are conservative around branches in control-flow.
+
+So FORTIFY is free in these cases _in isolation of any of the code around it_.
+Due to its implementation, it may impact the optimizations that occur on code
+around the literal call to the FORTIFY-hardened libc function.
+
+`unique_ptr` was just the first thing that came to the author's mind for "the
+type should be zero cost with any level of optimization enabled, but edge-cases
+might make it only-mostly-free to use."
+
+### How is checking actually performed?
+
+In cases where checking can be performed (e.g., where we call
+`__builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);`), Bionic provides [an
+implementation for `__mempcpy_chk`]. This is:
+
+```c
+extern "C" void* __mempcpy_chk(void* dst, const void* src, size_t count, size_t dst_len) {
+ __check_count("mempcpy", "count", count);
+ __check_buffer_access("mempcpy", "write into", count, dst_len);
+ return mempcpy(dst, src, count);
+}
+```
+This function itself boils down to a few small branches which abort the program
+if they fail, and a direct call to `__builtin_mempcpy`.
+
+### Wrapping up
+
+In the above breakdown, it was shown how Clang and Bionic work together to:
+- represent FORTIFY-hardened overloads of functions,
+- report misuses of stdlib functions at compile-time, and
+- insert run-time checks for uses of functions that might be incorrect, but only
+ if we have the potential of proving the incorrectness of these.
+
+## Breakdown of open
+
+In Bionic, the [FORTIFY'ed implementation of `open`] is quite large. Much like
+`mempcpy`, the `__builtin_open` declaration is simple:
+
+```c
+int open(const char* __path, int __flags, ...);
+```
+
+With some macros expanded, the FORTIFY-hardened header implementation is:
+```c
+int __open_2(const char*, int);
+int __open_real(const char*, int, ...) __asm__(open);
+
+#define __open_modes_useful(flags) (((flags) & O_CREAT) || ((flags) & O_TMPFILE) == O_TMPFILE)
+
+static
+int open(const char* pathname, int flags, mode_t modes, ...) __overloadable
+ __attribute__((diagnose_if(1, "error", "too many arguments")));
+
+static
+__inline__
+__attribute__((no_stack_protector))
+__attribute__((always_inline))
+int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags)
+ __attribute__((overloadable))
+ __attribute__((diagnose_if(
+ __open_modes_useful(flags),
+ "error",
+ "'open' called with O_CREAT or O_TMPFILE, but missing mode"))) {
+#if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
+ return __open_2(pathname, flags);
+#else
+ return __open_real(pathname, flags);
+#endif
+}
+static
+__inline__
+__attribute__((no_stack_protector))
+__attribute__((always_inline))
+int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes)
+ __attribute__((overloadable))
+ __clang_warning_if(!__open_modes_useful(flags) && modes,
+ "'open' has superfluous mode bits; missing O_CREAT?") {
+ return __open_real(pathname, flags, modes);
+}
+```
+
+Which may be a lot to take in.
+
+Before diving too deeply, please note that the remainder of these subsections
+assume that the programmer didn't make any egregious typos. Moreover, there's no
+real way that Bionic tries to prevent calls to `open` like
+`open("foo", 0, "how do you convert a const char[N] to mode_t?");`. The only
+real C-compatible solution the author can think of is "stamp out many overloads
+to catch sort-of-common instances of this very uncommon typo." This isn't great.
+
+More directly, no effort is made below to recognize calls that, due to
+incompatible argument types, cannot go to any `open` implementation other than
+`__builtin_open`, since it's recognized right here. :)
+
+### Implementation breakdown
+
+This `open` implementation does a few things:
+- Turns calls to `open` with too many arguments into a compile-time error.
+- Diagnoses calls to `open` with missing modes at compile-time and run-time
+ (both cases turn into errors).
+- Emits warnings on calls to `open` with useless mode bits, unless the mode bits
+ are all 0.
+
+One common bit of code not explained below is the `__open_real` declaration above:
+```c
+int __open_real(const char*, int, ...) __asm__(open);
+```
+
+This exists as a way for us to call `__builtin_open` without needing clang to
+have a pre-defined `__builtin_open` function.
+
+#### Compile-time error on too many arguments
+
+```c
+static
+int open(const char* pathname, int flags, mode_t modes, ...) __overloadable
+ __attribute__((diagnose_if(1, "error", "too many arguments")));
+```
+
+Which matches most calls to open that supply too many arguments, since
+`int(const char *, int, ...)` matches less strongly than
+`int(const char *, int, mode_t, ...)` for calls where the 3rd arg can be
+converted to `mode_t` without too much effort. Because of the `diagnose_if`
+attribute, all of these calls turn into compile-time errors.
+
+#### Compile-time or run-time error on missing arguments
+The following overload handles all two-argument calls to `open`.
+```c
+static
+__inline__
+__attribute__((no_stack_protector))
+__attribute__((always_inline))
+int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags)
+ __attribute__((overloadable))
+ __attribute__((diagnose_if(
+ __open_modes_useful(flags),
+ "error",
+ "'open' called with O_CREAT or O_TMPFILE, but missing mode"))) {
+#if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
+ return __open_2(pathname, flags);
+#else
+ return __open_real(pathname, flags);
+#endif
+}
+```
+
+Like `mempcpy`, `diagnose_if` handles emitting a compile-time error if the call
+to `open` is broken in a way that's visible to Clang's frontend. This
+essentially boils down to "`open` is being called with a `flags` value that
+requires mode bits to be set."
+
+If that fails to catch a bug, we [unconditionally call `__open_2`], which
+performs a run-time check:
+```c
+int __open_2(const char* pathname, int flags) {
+ if (needs_mode(flags)) __fortify_fatal("open: called with O_CREAT/O_TMPFILE but no mode");
+ return FDTRACK_CREATE_NAME("open", __openat(AT_FDCWD, pathname, force_O_LARGEFILE(flags), 0));
+}
+```
+
+#### Compile-time warning if modes are pointless
+
+Finally, we have the following `open` call:
+```c
+static
+__inline__
+__attribute__((no_stack_protector))
+__attribute__((always_inline))
+int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes)
+ __attribute__((overloadable))
+ __clang_warning_if(!__open_modes_useful(flags) && modes,
+ "'open' has superfluous mode bits; missing O_CREAT?") {
+ return __open_real(pathname, flags, modes);
+}
+```
+
+This simply issues a warning if Clang's frontend can determine that `flags`
+isn't necessary. Due to conventions in existing code, a `modes` value of `0` is
+not diagnosed.
+
+#### What about `&open`?
+One yet-unaddressed aspect of the above is how `&open` works. This is thankfully
+a short answer:
+- It happens that `open` takes a parameter of type `const char*`.
+- It happens that `pass_object_size` -- an attribute only applicable to
+ parameters of type `T*` -- makes it impossible to take the address of a
+ function.
+
+Since clang doesn't support a "this function should never have its address
+taken," attribute, Bionic uses the next best thing: `pass_object_size`. :)
+
+## Breakdown of poll
+
+(Preemptively: at the time of writing, Clang has no literal `__builtin_poll`
+builtin. `__builtin_poll` is referenced below to remain consistent with the
+convention established in the Terminology section.)
+
+Bionic's `poll` implementation is closest to `mempcpy` above, though it has a
+few interesting aspects worth examining.
+
+The [full header implementation of `poll`] is, with some macros expanded:
+```c
+#define __bos_fd_count_trivially_safe(bos_val, fds, fd_count) \
+ ((bos_val) == -1) || \
+ (__builtin_constant_p(fd_count) && \
+ (bos_val) >= sizeof(*fds) * (fd_count)))
+
+static
+__inline__
+__attribute__((no_stack_protector))
+__attribute__((always_inline))
+int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout)
+ __attribute__((overloadable))
+ __attriubte__((diagnose_if(
+ __builtin_object_size(fds, 1) != -1 && __builtin_object_size(fds, 1) < sizeof(*fds) * fd_count,
+ "error",
+ "in call to 'poll', fd_count is larger than the given buffer"))) {
+ size_t bos_fds = __builtin_object_size(fds, 1);
+ if (!__bos_fd_count_trivially_safe(bos_fds, fds, fd_count)) {
+ return __poll_chk(fds, fd_count, timeout, bos_fds);
+ }
+ return (&poll)(fds, fd_count, timeout);
+}
+```
+
+To get the commonality with `mempcpy` and `open` out of the way:
+- This function is an overload with `__builtin_poll`.
+- The signature is the same, modulo the presence of a `pass_object_size`
+ attribute. Hence, for direct calls, overload resolution will always prefer it
+ over `__builtin_poll`. Taking the address of `poll` is forbidden, so all
+ references to `&poll` actually reference `__builtin_poll`.
+- When `fds` is too small to hold `fd_count` `pollfd`s, Clang will emit a
+ compile-time error if possible using `diagnose_if`.
+- If this can't be observed until run-time, `__poll_chk` verifies this.
+- When `fds` is a constant according to `__builtin_constant_p`, this always
+ compiles into `__poll_chk` for always-broken calls to `poll`, or
+ `__builtin_poll` for always-safe calls to `poll`.
+
+The critical bits to highlight here are on this line:
+```c
+int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout)
+```
+
+And this line:
+```c
+ return (&poll)(fds, fd_count, timeout);
+```
+
+Starting with the simplest, we call `__builtin_poll` with `(&poll)(...);`. As
+referenced above, taking the address of an overloaded function where all but one
+overload has a `pass_object_size` attribute on one or more parameters always
+resolves to the function without any `pass_object_size` attributes.
+
+The other line deserves a section. The subtlety of it is almost entirely in the
+use of `pass_object_size(1)` instead of `pass_object_size(0)`. on the `fds`
+parameter, and the corresponding use of `__builtin_object_size(fds, 1);` in the
+body of `poll`.
+
+### Subtleties of __builtin_object_size(p, N)
+
+Earlier in this document, it was said that a full description of each
+attribute/builtin necessary to power FORTIFY was out of scope. This is... only
+somewhat the case when we talk about `__builtin_object_size` and
+`pass_object_size`, especially when their second argument is `1`.
+
+#### tl;dr
+`__builtin_object_size(p, N)` and `pass_object_size(N)`, where `(N & 1) == 1`,
+can only be accurately determined by Clang. LLVM's `@llvm.objectsize` intrinsic
+ignores the value of `N & 1`, since handling `(N & 1) == 1` accurately requires
+data that's currently entirely inaccessible to LLVM, and that is difficult to
+preserve through LLVM's optimization passes.
+
+`pass_object_size`'s "lifting" of the evaluation of
+`__builtin_object_size(p, N)` to the caller is critical, since it allows Clang
+full visibility into the expression passed to e.g., `poll(&foo->bar, baz, qux)`.
+It's not a perfect solution, but it allows `N == 1` to be fully accurate in at
+least some cases.
+
+#### Background
+Clang's implementation of `__builtin_object_size` aims to be compatible with
+GCC's, which has [a decent bit of documentation]. Put simply,
+`__builtin_object_size(p, N)` is intended to evaluate at compile-time how many
+bytes can be accessed after `p` in a well-defined way. Straightforward examples
+of this are:
+```c
+char buf[8];
+assert(__builtin_object_size(buf, N) == 8);
+assert(__builtin_object_size(buf + 1, N) == 7);
+```
+
+This should hold for all values of N that are valid to pass to
+`__builtin_object_size`. The `N` value of `__builtin_object_size` is a mask of
+settings.
+
+##### (N & 2) == ?
+
+This is mostly for completeness sake; in Bionic's FORTIFY implementation, N is
+always either 0 or 1.
+
+If there are multiple possible values of `p` in a call to
+`__builtin_object_size(p, N)`, the second bit in `N` determines the behavior of
+the compiler. If `(N & 2) == 0`, `__builtin_object_size` should return the
+greatest possible size for each possible value of `p`. Otherwise, it should
+return the least possible value. For example:
+
+```c
+char smol_buf[7];
+char buf[8];
+char *p = rand() ? smol_buf : buf;
+assert(__builtin_object_size(p, 0) == 8);
+assert(__builtin_object_size(p, 2) == 7);
+```
+
+##### (N & 1) == 0
+
+`__builtin_object_size(p, 0)` is more or less as simple as the example in the
+Background section directly above. When Clang attempts to evaluate
+`__builtin_object_size(p, 0);` and when LLVM tries to determine the result of a
+corresponding `@llvm.objectsize` call to, they search for the storage underlying
+the pointer in question. If that can be determined, Clang or LLVM can provide an
+answer; otherwise, they cannot.
+
+##### (N & 1) == 1, and the true magic of pass_object_size
+
+`__builtin_object_size(p, 1)` has a less uniform implementation between LLVM and
+Clang. According to GCC's documentation, "If the least significant bit [of
+__builtin_object_size's second argument] is clear, objects are whole variables,
+if it is set, a closest surrounding subobject is considered the object a pointer
+points to."
+
+The "closest surrounding subobject," means that `(N & 1) == 1` depends on type
+information in order to operate in many cases. Consider the following examples:
+```c
+struct Foo {
+ int a;
+ int b;
+};
+
+struct Foo foo;
+assert(__builtin_object_size(&foo, 0) == sizeof(foo));
+assert(__builtin_object_size(&foo, 1) == sizeof(foo));
+assert(__builtin_object_size(&foo->a, 0) == sizeof(foo));
+assert(__builtin_object_size(&foo->a, 1) == sizeof(int));
+
+struct Foo foos[2];
+assert(__builtin_object_size(&foos[0], 0) == 2 * sizeof(foo));
+assert(__builtin_object_size(&foos[0], 1) == sizeof(foo));
+assert(__builtin_object_size(&foos[0]->a, 0) == 2 * sizeof(foo));
+assert(__builtin_object_size(&foos[0]->a, 1) == sizeof(int));
+```
+
+...And perhaps somewhat surprisingly:
+```c
+void example(struct Foo *foo) {
+ // (As a reminder, `-1` is "I don't know" when `(N & 2) == 0`.)
+ assert(__builtin_object_size(foo, 0) == -1);
+ assert(__builtin_object_size(foo, 1) == -1);
+ assert(__builtin_object_size(foo->a, 0) == -1);
+ assert(__builtin_object_size(foo->a, 1) == sizeof(int));
+}
+```
+
+In Clang, [this type-aware requirement poses problems for us]: Clang's frontend
+knows everything we could possibly want about the types of variables, but
+optimizations are only performed by LLVM. LLVM has no reliable source for C or
+C++ data types, so calls to `__builtin_object_size(p, N)` that cannot be
+resolved by clang are lowered to the equivalent of
+`__builtin_object_size(p, N & ~1)` in LLVM IR.
+
+Moreover, Clang's frontend is the best-equipped part of the compiler to
+accurately determine the answer for `__builtin_object_size(p, N)`, given we know
+what `p` is. LLVM is the best-equipped part of the compiler to determine the
+value of `p`. This ordering issue is unfortunate.
+
+This is where `pass_object_size(N)` comes in. To summarize [the docs for
+`pass_object_size`], it evaluates `__builtin_object_size(p, N)` within the
+context of the caller of the function annotated with `pass_object_size`, and
+passes the value of that into the callee as an invisible parameter. All calls to
+`__builtin_object_size(parameter, N)` are substituted with references to this
+invisible parameter.
+
+Putting this plainly, Clang's frontend struggles to evaluate the following:
+```c
+int foo(void *p) {
+ return __builtin_object_size(p, 1);
+}
+
+void bar() {
+ struct { int i, j } k;
+ // The frontend can't figure this interprocedural objectsize out, so it gets lowered to
+ // LLVM, which determines that the answer here is sizeof(k).
+ int baz = foo(&k.i);
+}
+```
+
+However, with the magic of `pass_object_size`, we get one level of inlining to
+look through:
+```c
+int foo(void *const __attribute__((pass_object_size(1))) p) {
+ return __builtin_object_size(p, 1);
+}
+
+void bar() {
+ struct { int i, j } k;
+ // Due to pass_object_size, this is equivalent to:
+ // int baz = foo(&k.i, __builtin_object_size(&k.i, 1));
+ // ...and `int foo(void *)` is actually equivalent to:
+ // int foo(void *const, size_t size) {
+ // return size;
+ // }
+ int baz = foo(&k.i);
+}
+```
+
+So we can obtain an accurate result in this case.
+
+##### What about pass_object_size(0)?
+It's sort of tangential, but if you find yourself wondering about the utility of
+`pass_object_size(0)` ... it's somewhat split. `pass_object_size(0)` in Bionic's
+FORTIFY exists mostly for visual consistency, simplicity, and as a useful way to
+have e.g., `&mempcpy` == `&__builtin_mempcpy`.
+
+Outside of these fringe benefits, all of the functions with
+`pass_object_size(0)` on parameters are marked with `always_inline`, so
+"lifting" the `__builtin_object_size` call isn't ultimately very helpful. In
+theory, users can always have something like:
+
+```c
+// In some_header.h
+// This function does cool and interesting things with the `__builtin_object_size` of its parameter,
+// and is able to work with that as though the function were defined inline.
+void out_of_line_function(void *__attribute__((pass_object_size(0))));
+```
+
+Though the author isn't aware of uses like this in practice, beyond a few folks
+on LLVM's mailing list seeming interested in trying it someday.
+
+#### Wrapping up
+In the (long) section above, two things were covered:
+- The use of `(&poll)(...);` is a convenient shorthand for calling
+ `__builtin_poll`.
+- `__builtin_object_size(p, N)` with `(N & 1) == 1` is not easy for Clang to
+ answer accurately, since it relies on type info only available in the
+ frontend, and it sometimes relies on optimizations only available in the
+ middle-end. `pass_object_size` helps mitigate this.
+
+## Miscellaneous Notes
+The above should be a roughly comprehensive view of how FORTIFY works in the
+real world. The main thing it fails to mention is the use of [the `diagnose_as_builtin` attribute] in Clang.
+
+As time has moved on, Clang has increasingly gained support for emitting
+warnings that were previously emitted by FORTIFY machinery.
+`diagnose_as_builtin` allows us to remove the `diagnose_if`s from some of the
+`static inline` overloads of stdlib functions above, so Clang may diagnose them
+instead.
+
+Clang's built-in diagnostics are often better than `diagnose_if` diagnostics,
+since Clang can format its diagnostics to include e.g., information about the
+sizes of buffers in a suspect call to a function. `diagnose_if` can only have
+the compiler output constant strings.
+
+[ChromeOS' Glibc patch]: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/90fa9b27731db10a6010c7f7c25b24028145b091/sys-libs/glibc/files/local/glibc-2.33/0007-glibc-add-clang-style-FORTIFY.patch
+[FORTIFY'ed implementation of `open`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/fcntl.h#41
+[FORTIFY'ed version of `mempcpy`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/string.h#45
+[a decent bit of documentation]: https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html
+[an implementation for `__mempcpy_chk`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/bionic/fortify.cpp#501
+[full header implementation of `poll`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/bits/fortify/poll.h#43
+[incompatible with stricter versions of FORTIFY checking]: https://godbolt.org/z/fGfEYxfnf
+[similar to C++11's `std::unique_ptr`]: https://stackoverflow.com/questions/58339165/why-can-a-t-be-passed-in-register-but-a-unique-ptrt-cannot
+[source for `mempcpy`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/include/string.h#55
+[the `diagnose_as_builtin` attribute]: https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeReference.html#diagnose-as-builtin
+[the docs for `pass_object_size`]: https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeReference.html#pass-object-size-pass-dynamic-object-size
+[this type-aware requirement poses problems for us]: https://github.com/llvm/llvm-project/issues/55742
+[unconditionally call `__open_2`]: https://android.googlesource.com/platform/bionic/+/refs/heads/android12-release/libc/bionic/open.cpp#70