Merge "Hide overaligned global address from the compiler." into main
diff --git a/android-changes-for-ndk-developers.md b/android-changes-for-ndk-developers.md
index 6ac79cf..8d507d1 100644
--- a/android-changes-for-ndk-developers.md
+++ b/android-changes-for-ndk-developers.md
@@ -11,12 +11,9 @@
 for details about changes in stack unwinding (crash dumps) between
 different releases.
 
-Required tools: the NDK has an _arch_-linux-android-readelf binary
-(e.g. arm-linux-androideabi-readelf or i686-linux-android-readelf)
-for each architecture (under toolchains/), but you can use readelf for
-any architecture, as we will be doing basic inspection only. On Linux
-you need to have the “binutils” package installed for readelf,
-and “pax-utils” for scanelf.
+Required tools: the NDK has an `llvm-readelf` binary that understands all the
+architecture-specific details of all Android's supported architectures. Recent
+versions of Android also have toybox readelf on the device.
 
 
 ## How we manage incompatible changes
@@ -38,42 +35,44 @@
 check logcat for warnings until their app stops functioning, so the
 toasts help bring some visibility to the issues before it's too late.
 
+
 ## Changes to library dependency resolution
 
 Until it was [fixed](https://issuetracker.google.com/36950617) in
-JB-MR2, Android didn't include the application library directory
+API level 18, Android didn't include the application library directory
 on the dynamic linker's search path. This meant that apps
 had to call `dlopen` or `System.loadLibrary` on all transitive
 dependencies before loading their main library. Worse, until it was
-[fixed](https://issuetracker.google.com/36935779) in JB-MR2, the
+[fixed](https://issuetracker.google.com/36935779) in API level 18, the
 dynamic linker's caching code cached failures too, so it was necessary
 to topologically sort your libraries and load them in reverse order.
 
-If you need to support Android devices running OS
-versions older than JB-MR2, you might want to consider
+If you need to support Android devices running OS versions older than
+API level 23, you might want to consider
 [ReLinker](https://github.com/KeepSafe/ReLinker) which claims to solve
-these problems automatically.
+these and other problems automatically.
 
 Alternatively, if you don't have too many dependencies, it can be easiest to
 simply link all of your code into one big library and sidestep the details of
 library and symbol lookup changes on all past (and future) Android versions.
 
+
 ## Changes to library search order
 
 We have made various fixes to library search order when resolving symbols.
 
-With API 22, load order switched from depth-first to breadth-first to
+With API level 22, load order switched from depth-first to breadth-first to
 fix dlsym(3).
 
-Before API 23, the default search order was to try the main executable,
+Before API level 23, the default search order was to try the main executable,
 LD_PRELOAD libraries, the library itself, and its DT_NEEDED libraries
-in that order. For API 23 and later, for any given library, the dynamic
+in that order. For API level 23 and later, for any given library, the dynamic
 linker divides other libraries into the global group and the local
 group. The global group is shared by all libraries and contains the main
 executable, LD_PRELOAD libraries, and any library with the DF_1_GLOBAL
 flag set (by passing “-z global” to ld(1)). The local group is
 the breadth-first transitive closure of the library and its DT_NEEDED
-libraries. The M dynamic linker searches the global group followed by
+libraries. The API level 23 dynamic linker searches the global group followed by
 the local group. This allows ASAN, for example, to ensure that it can
 intercept any symbol.
 
@@ -89,7 +88,7 @@
 ## RTLD_LOCAL (Available in API level >= 23)
 
 The dlopen(3) RTLD_LOCAL flag used to be ignored but is implemented
-correctly in API 23 and later. Note that RTLD_LOCAL is the default,
+correctly in API level 23 and later. Note that RTLD_LOCAL is the default,
 so even calls to dlopen(3) that didn’t explicitly use RTLD_LOCAL will
 be affected (unless they explicitly used RTLD_GLOBAL). With RTLD_LOCAL,
 symbols will not be made available to libraries loaded by later calls
@@ -99,7 +98,7 @@
 ## GNU hashes (Availible in API level >= 23)
 
 The GNU hash style available with `--hash-style=gnu` allows faster
-symbol lookup and is supported by Android's dynamic linker in API 23 and
+symbol lookup and is supported by Android's dynamic linker in API level 23 and
 above. Use `--hash-style=both` if you want to build code that uses this
 feature in new enough releases but still works on older releases.
 If you're using the NDK, clang chooses the right option
@@ -157,34 +156,26 @@
 ## Private API (Enforced for API level >= 24)
 
 Native libraries must use only public API, and must not link against
-non-NDK platform libraries. Starting with API 24 this rule is enforced and
-applications are no longer able to load non-NDK platform libraries. The
-rule is enforced by the dynamic linker, so non-public libraries
+non-NDK platform libraries. On devices running API level 24 or later,
+this rule is enforced and applications are no longer able to load all
+non-NDK platform libraries. This was to prevent future issues similar
+to the disruption caused when Android switched from OpenSSL to BoringSSL
+at API level 23.
+
+The rule is enforced by the dynamic linker, so non-public libraries
 are not accessible regardless of the way code tries to load them:
-System.loadLibrary, DT_NEEDED entries, and direct calls to dlopen(3)
+System.loadLibrary(), DT_NEEDED entries, and direct calls to dlopen(3)
 will all work exactly the same.
 
-Users should have a consistent app experience across updates,
-and developers shouldn't have to make emergency app updates to
-handle platform changes. For that reason, we recommend against using
-private C/C++ symbols. Private symbols aren't tested as part of the
-Compatibility Test Suite (CTS) that all Android devices must pass. They
-may not exist, or they may behave differently. This makes apps that use
-them more likely to fail on specific devices, or on future releases ---
-as many developers found when Android 6.0 Marshmallow switched from
-OpenSSL to BoringSSL.
-
-In order to reduce the user impact of this transition, we've identified
-a set of libraries that see significant use from Google Play's
-most-installed apps, and that are feasible for us to support in the
+In order to reduce the user impact of this transition, we identified
+a set of libraries that saw significant use from Google Play's
+most-installed apps and were feasible for us to support in the
 short term (including libandroid_runtime.so, libcutils.so, libcrypto.so,
-and libssl.so). In order to give you more time to transition, we will
-temporarily support these libraries; so if you see a warning that means
-your code will not work in a future release -- please fix it now!
-
-Between O and R, this compatibility mode could be disabled by setting a
-system property (`debug.ld.greylist_disabled`). This property is ignored
-in S and later.
+and libssl.so). In order to give app developers more time to transition,
+we allowed access to these libraries for apps with a target API level < 24.
+On devices running API level 26 to API level 30, this compatibility mode could be
+disabled by setting a system property (`debug.ld.greylist_disabled`).
+This property is ignored on devices running API level 31 and later.
 
 ```
 $ readelf --dynamic libBroken.so | grep NEEDED
@@ -200,7 +191,7 @@
  0x00000001 (NEEDED)                     Shared library: [libc.so]
 ```
 
-*Potential problems*: starting from API 24 the dynamic linker will not
+*Potential problems*: starting from API level 24 the dynamic linker will not
 load private libraries, preventing the application from loading.
 
 *Resolution*: rewrite your native code to rely only on public API. As a
@@ -238,15 +229,16 @@
 *Resolution*: remove the extra steps from your build that strip section
 headers.
 
+
 ## Text Relocations (Enforced for API level >= 23)
 
-Starting with API 23, shared objects must not contain text
-relocations. That is, the code must be loaded as is and must not be
-modified. Such an approach reduces load time and improves security.
+Apps with a target API level >= 23 cannot load shared objects that contain text
+relocations. Such an approach reduces load time and improves security. This was
+only a change for 32-bit, because 64-bit never supported text relocations.
 
-The usual reason for text relocations is non-position independent
-hand-written assembler. This is not common. Use the scanelf tool as
-described in our documentation for further diagnostics:
+The usual reason for text relocations was non-position independent
+hand-written assembler. This is not common. You can use the scanelf tool
+from the pax-utils debian package for further diagnostics:
 
 ```
 $ scanelf -qT libTextRel.so
@@ -256,10 +248,10 @@
 ```
 
 If you have no scanelf tool available, it is possible to do a basic
-check with readelf instead, look for either a TEXTREL entry or the
+check with readelf instead. Look for either a TEXTREL entry or the
 TEXTREL flag. Either alone is sufficient. (The value corresponding to the
 TEXTREL entry is irrelevant and typically 0 --- simply the presence of
-the TEXTREL entry declares that the .so contains text relocations). This
+the TEXTREL entry declares that the .so contains text relocations.) This
 example has both indicators present:
 
 ```
@@ -276,9 +268,8 @@
 
 *Potential problems*: Relocations enforce code pages being writable, and
 wastefully increase the number of dirty pages in memory. The dynamic
-linker has issued warnings about text relocations since Android K
-(API 19), but on API 23 and above it refuses to load code with text
-relocations.
+linker issued warnings about text relocations from API level 19, but on API
+level 23 and above refuses to load code with text relocations.
 
 *Resolution*: rewrite assembler to be position independent to ensure
 no text relocations are necessary. The
@@ -296,9 +287,9 @@
 leaving the business of finding the library at runtime to the dynamic
 linker.
 
-Before API 23, Android's dynamic linker ignored the full path, and
+Before API level 23, Android's dynamic linker ignored the full path, and
 used only the basename (the part after the last ‘/') when looking
-up the required libraries. Since API 23 the runtime linker will honor
+up the required libraries. Since API level 23 the runtime linker will honor
 the DT_NEEDED exactly and so it won't be able to load the library if
 it is not present in that exact location on the device.
 
@@ -315,8 +306,8 @@
 [C:\Users\build\Android\ci\jni\libBroken.so]
 ```
 
-*Potential problems*: before API 23 the DT_NEEDED entry's basename was
-used, but starting from API 23 the Android runtime will try to load the
+*Potential problems*: before API level 23 the DT_NEEDED entry's basename was
+used, but starting from API level 23 the Android runtime will try to load the
 library using the path specified, and that path won't exist on the
 device. There are broken third-party toolchains/build systems that use
 a path on a build host instead of the SONAME.
@@ -350,16 +341,18 @@
 configured your build system to generate incorrect SONAME entries (using
 the `-soname` linker option).
 
+
 ## `__register_atfork` (Available in API level >= 23)
 
 To allow `atfork` and `pthread_atfork` handlers to be unregistered on
-`dlclose`, the implementation changed in API level 23. Unfortunately this
-requires a new libc function `__register_atfork`. Code using these functions
-that is built with a target API level >= 23 therefore will not load on earlier
-versions of Android, with an error referencing `__register_atfork`.
+`dlclose`, API level 23 added a new libc function `__register_atfork`.
+This means that code using `atfork` or `pthread_atfork` functions that is
+built with a `minSdkVersion` >= 23 will not load on earlier versions of
+Android, with an error referencing `__register_atfork`.
 
-*Resolution*: build your code with an NDK target API level that matches your
-app's minimum API level, or avoid using `atfork`/`pthread_atfork`.
+*Resolution*: build your code with `minSdkVersion` that matches the minimum
+API level you actually support, or avoid using `atfork`/`pthread_atfork`.
+
 
 ## DT_RUNPATH support (Available in API level >= 24)
 
@@ -389,6 +382,7 @@
 into your app. The middleware vendor is aware of the problem and has a fix
 available.
 
+
 ## Invalid ELF header/section headers (Enforced for API level >= 26)
 
 In API level 26 and above the dynamic linker checks more values in
@@ -403,9 +397,10 @@
 ELF files. Note that using them puts application under high risk of
 being incompatible with future versions of Android.
 
-## Enable logging of dlopen/dlsym and library loading errors for apps (Available in Android O)
 
-Starting with Android O it is possible to enable logging of dynamic
+## Enable logging of dlopen/dlsym and library loading errors for apps (Available for API level >= 26)
+
+On devices running API level 26 or later you can enable logging of dynamic
 linker activity for debuggable apps by setting a property corresponding
 to the fully-qualified name of the specific app:
 ```
@@ -429,12 +424,13 @@
 adb shell setprop debug.ld.all dlerror,dlopen
 ```
 
+
 ## dlclose interacts badly with thread local variables with non-trivial destructors
 
 Android allows `dlclose` to unload a library even if there are still
 thread-local variables with non-trivial destructors. This leads to
 crashes when a thread exits and attempts to call the destructor, the
-code for which has been unloaded (as in [issue 360], fixed in P).
+code for which has been unloaded (as in [issue 360], fixed in API level 28).
 
 [issue 360]: https://github.com/android-ndk/ndk/issues/360
 
@@ -442,18 +438,19 @@
 set (so that calls to `dlclose` don't actually unload the library)
 are possible workarounds.
 
-|                   | Pre-M                      | M+      | P+    |
+|                   | API level < 23             | >= 23   | >= 28 |
 | ----------------- | -------------------------- | ------- | ----- |
 | No workaround     | Works for static STL       | Broken  | Works |
 | `-Wl,-z,nodelete` | Works for static STL       | Works   | Works |
 | No `dlclose`      | Works                      | Works   | Works |
 
-## Use of IFUNC in libc (True for all API levels on devices running Q)
 
-Starting with Android Q (API level 29), libc uses
-[IFUNC](https://sourceware.org/glibc/wiki/GNU_IFUNC) functionality in
-the dynamic linker to choose optimized assembler routines at run time
-rather than at build time. This lets us use the same `libc.so` on all
+## Use of IFUNC in libc (True for all API levels on devices running Android 10)
+
+On devices running API level 29, libc uses
+[IFUNC](https://sourceware.org/glibc/wiki/GNU_IFUNC)
+functionality in the dynamic linker to choose optimized assembler routines at
+run time rather than at build time. This lets us use the same `libc.so` on all
 devices, and is similar to what other OSes already did. Because the zygote
 uses the C library, this decision is made long before we know what API
 level an app targets, so all code sees the new IFUNC-using C library.
@@ -462,6 +459,7 @@
 with IFUNC relocations. The affected functions are from `<string.h>`, but
 may expand to include more functions (and more libraries) in future.
 
+
 ## Relative relocations (RELR)
 
 Android added experimental support for RELR relative relocations
@@ -478,23 +476,36 @@
 OS private use constants for RELR, nor for ELF files using packed
 relocations.
 
+Prior to API level 35, there was a bug that caused RELR relocations to
+be applied after packed relocations. This meant that ifunc resolvers
+referenced by `R_*_IRELATIVE` relocations in the packed relocation
+section would have been able to read globals with RELR relocations
+before they were relocated. The version of `lld` in the NDK has never
+produced binaries affected by this bug, but third-party toolchains
+should make sure not to store `R_*_IRELATIVE` relocations in packed
+relocation sections in order to maintain compatibility with API levels
+below 35.
+
 You can read more about relative relocations
 and their long and complicated history at
 https://maskray.me/blog/2021-10-31-relative-relocations-and-relr.
 
+
 ## No more sentinels in .preinit_array/.init_array/.fini_array sections of executables (in All API levels)
 
-In Android <= U and NDK <= 26, Android used sentinels in these sections of
-executables to locate the start and end of arrays. However, when building with
-LTO, the function pointers in the arrays can be reordered, making sentinels no
-longer work. This prevents constructors for global C++ variables from being
-called in static executables when using LTO.
+In Android <= API level 34 and NDK <= r26, Android used sentinels in the
+`.preinit_array`/`.init_array`/`.fini_array` sections of executables to locate
+the start and end of these arrays. When building with LTO, the function pointers
+in the arrays can be reordered, making sentinels no longer work. This prevents
+constructors for global C++ variables from being called in static executables
+when using LTO.
 
-To fix this, in Android >= V and NDK >= 27, we removed sentinels and switched
-to using symbols inserted by LLD (like `__init_array_start`,
-`__init_array_end`) to locate the arrays. This also avoids keeping a section
-when there are no corresponding functions.
+To fix this, in Android >= API level 35 and NDK >= r27, we removed sentinels
+and switched to using symbols inserted by LLD (like `__init_array_start`,
+`__init_array_end`) to locate the arrays. This also avoids the need for an
+empty section when there are no corresponding functions.
 
-For dynamic executables, we kept sentinel support in crtbegin_dynamic.o and
-libc.so. This ensures that executables built with newer crtbegin_dynamic.o
-(in NDK >= 27) work with older libc.so (in Android <= U), and vice versa.
+For dynamic executables, we kept sentinel support in `crtbegin_dynamic.o` and
+`libc.so`. This ensures that executables built with newer `crtbegin_dynamic.o`
+(in NDK >= r27) work with older `libc.so` (in Android <= API level 34), and
+vice versa.
diff --git a/docs/status.md b/docs/status.md
index a9afe55..2919471 100644
--- a/docs/status.md
+++ b/docs/status.md
@@ -56,7 +56,11 @@
 Current libc symbols: https://android.googlesource.com/platform/bionic/+/main/libc/libc.map.txt
 
 New libc functions in V (API level 35):
-  * `tcgetwinsize`, `tcsetwinsize` (POSIX Issue 8 additions).
+  * New `android_crash_detail_register`, `android_crash_detail_unregister`,
+    `android_crash_detail_replace_name`, and `android_crash_detail_replace_data`
+    functionality for adding arbitrary data to tombstones
+    (see `<android/crash_detail.h>` for full documentation).
+  * `tcgetwinsize`, `tcsetwinsize`, `_Fork` (POSIX Issue 8 additions).
   * `timespec_getres` (C23 addition).
   * `localtime_rz`, `mktime_z`, `tzalloc`, and `tzfree` (NetBSD
     extensions implemented in tzcode, and the "least non-standard"
@@ -321,17 +325,25 @@
 ## Target API level behavioral differences
 
 Most bionic bug fixes and improvements have been made without checks for
-the app's `targetSdkVersion`. As of O there were exactly two exceptions,
-but there are likely to be more in future because of Project Treble.
+the app's `targetSdkVersion`. There are a handful of exceptions. (If in
+doubt, search the source for `android_get_application_target_sdk_version()`.)
 
-### Invalid `pthread_t` handling (targetSdkVersion >= O)
+### Destroyed mutex checking (targetSdkVersion >= 28)
+
+If a destroyed `pthread_mutex_t` is passed to any of the mutex functions, apps
+targeting API level 28 or higher will see a
+"<function> called on a destroyed mutex" fortify failure. Apps targeting older
+API levels will just have the function fail with EBUSY (matching the likely
+behavior before we added the check).
+
+### Invalid `pthread_t` handling (targetSdkVersion >= 26)
 
 As part of a long-term goal to remove the global thread list,
 and in an attempt to flush out racy code, we changed how an invalid
 `pthread_t` is handled. For `pthread_detach`, `pthread_getcpuclockid`,
 `pthread_getschedparam`/`pthread_setschedparam`, `pthread_join`, and
 `pthread_kill`, instead of returning ESRCH when passed an invalid
-`pthread_t`, if you're targeting O or above, they'll abort with the
+`pthread_t`, if you're targeting API level 26 or above, they'll abort with the
 message "attempt to use invalid pthread\_t".
 
 Note that this doesn't change behavior as much as you might think: the
@@ -369,13 +381,13 @@
     the tid may have been reused, but your code is inherently unsafe without
     a redesign anyway.
 
-### Interruptable `sem_wait` (targetSdkVersion >= N)
+### Interruptable `sem_wait` (targetSdkVersion >= 24)
 
 POSIX says that `sem_wait` can be interrupted by delivery of a
 signal. This wasn't historically true in Android, and when we fixed this
 bug we found that existing code relied on the old behavior. To preserve
 compatibility, `sem_wait` can only return EINTR on Android if the app
-targets N or later.
+targets API level 24 or later.
 
 
 ## FORTIFY
diff --git a/libc/Android.bp b/libc/Android.bp
index 4020ede..84fa498 100644
--- a/libc/Android.bp
+++ b/libc/Android.bp
@@ -1075,6 +1075,8 @@
                 "arch-arm64/bionic/setjmp.S",
                 "arch-arm64/bionic/syscall.S",
                 "arch-arm64/bionic/vfork.S",
+                "arch-arm64/oryon/memcpy-nt.S",
+                "arch-arm64/oryon/memset-nt.S",
             ],
         },
 
diff --git a/libc/NOTICE b/libc/NOTICE
index 91cd335..1a84d3c 100644
--- a/libc/NOTICE
+++ b/libc/NOTICE
@@ -4024,6 +4024,33 @@
 
 -------------------------------------------------------------------
 
+Copyright (c) 2012, Linaro Limited
+   All rights reserved.
+   Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
+   Redistribution and use in source and binary forms, with or without
+   modification, are permitted provided that the following conditions are met:
+       * Redistributions of source code must retain the above copyright
+         notice, this list of conditions and the following disclaimer.
+       * Redistributions in binary form must reproduce the above copyright
+         notice, this list of conditions and the following disclaimer in the
+         documentation and/or other materials provided with the distribution.
+       * Neither the name of the Linaro nor the
+         names of its contributors may be used to endorse or promote products
+         derived from this software without specific prior written permission.
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+   HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+-------------------------------------------------------------------
+
 Copyright (c) 2012-2014 ARM Ltd
 All rights reserved.
 
@@ -4640,7 +4667,9 @@
 
 SPDX-License-Identifier: BSD-2-Clause
 
-Copyright (c)1999 Citrus Project,
+Copyright (c) 2017, 2018 Dell EMC
+Copyright (c) 2000, 2001, 2008, 2011, David E. O'Brien
+Copyright (c) 1998 John D. Polstra.
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
@@ -4666,11 +4695,9 @@
 
 -------------------------------------------------------------------
 
-SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+SPDX-License-Identifier: BSD-2-Clause
 
-Copyright (c) 2017, 2018 Dell EMC
-Copyright (c) 2000, 2001, 2008, 2011, David E. O'Brien
-Copyright (c) 1998 John D. Polstra.
+Copyright (c)1999 Citrus Project,
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
@@ -5155,3 +5182,11 @@
 
 -------------------------------------------------------------------
 
+memcpy - copy memory area
+
+Copyright (c) 2012-2022, Arm Limited.
+Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
+SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
+
+-------------------------------------------------------------------
+
diff --git a/libc/arch-arm64/dynamic_function_dispatch.cpp b/libc/arch-arm64/dynamic_function_dispatch.cpp
index b9f657b..db002b8 100644
--- a/libc/arch-arm64/dynamic_function_dispatch.cpp
+++ b/libc/arch-arm64/dynamic_function_dispatch.cpp
@@ -30,6 +30,19 @@
 #include <stddef.h>
 #include <sys/auxv.h>
 
+#define MIDR_IMPL_ID_SHIFT 24u
+#define MIDR_IMPL_ID_MASK 0xFF
+#define CPU_VARIANT_SHIFT 20u
+#define CPU_VARIANT_MASK 0xF
+
+/* Macro to identify CPU implementer */
+#define QCOM_IMPL_ID 0x51
+
+/* Macro to indentify qualcomm CPU variants which supports
+ * __memcpy_aarch64_nt routine
+ */
+#define QCOM_ORYON_CPU_VARIANTS 0x5
+
 extern "C" {
 
 typedef void* memchr_func(const void*, int, size_t);
@@ -49,20 +62,107 @@
 
 typedef void* memcpy_func(void*, const void*, size_t);
 DEFINE_IFUNC_FOR(memcpy) {
-    if (arg->_hwcap & HWCAP_ASIMD) {
-        RETURN_FUNC(memcpy_func, __memcpy_aarch64_simd);
-    } else {
+  unsigned long midr;
+  unsigned int impl_id, cpu_variant;
+
+  /* Check if hardware capability CPUID is available */
+  if (arg->_hwcap & HWCAP_CPUID) {
+    /* Read the MIDR register */
+    asm("mrs %0, MIDR_EL1 \n\t" : "=r"(midr));
+
+    /* Extract the CPU Implementer ID */
+    impl_id = (midr >> MIDR_IMPL_ID_SHIFT) & (MIDR_IMPL_ID_MASK);
+
+    /* Check for Qualcomm implementer ID */
+    if (impl_id == QCOM_IMPL_ID) {
+      cpu_variant = (midr >> CPU_VARIANT_SHIFT) & CPU_VARIANT_MASK;
+
+      /* Check for Qualcomm Oryon CPU variants: 0x1, 0x2, 0x3, 0x4, 0x5 */
+      if (cpu_variant <= QCOM_ORYON_CPU_VARIANTS) {
+        RETURN_FUNC(memcpy_func, __memcpy_aarch64_nt);
+      } else {
         RETURN_FUNC(memcpy_func, __memcpy_aarch64);
+      }
     }
+  }
+  /* If CPU implementer is not Qualcomm, choose the custom
+   * implementation based on CPU architecture feature
+   * */
+  if (arg->_hwcap & HWCAP_ASIMD) {
+    RETURN_FUNC(memcpy_func, __memcpy_aarch64_simd);
+  } else {
+    RETURN_FUNC(memcpy_func, __memcpy_aarch64);
+  }
 }
 
 typedef void* memmove_func(void*, const void*, size_t);
 DEFINE_IFUNC_FOR(memmove) {
-    if (arg->_hwcap & HWCAP_ASIMD) {
-        RETURN_FUNC(memmove_func, __memmove_aarch64_simd);
-    } else {
-        RETURN_FUNC(memmove_func, __memmove_aarch64);
+  unsigned long midr;
+  unsigned int impl_id, cpu_variant;
+
+  /* Check if hardware capability CPUID is available */
+  if (arg->_hwcap & HWCAP_CPUID) {
+    /* Read the MIDR register */
+    asm("mrs %0, MIDR_EL1 \n\t" : "=r"(midr));
+
+    /* Extract the CPU Implementer ID */
+    impl_id = (midr >> MIDR_IMPL_ID_SHIFT) & (MIDR_IMPL_ID_MASK);
+
+    /* Check for Qualcomm implementer ID */
+    if (impl_id == QCOM_IMPL_ID) {
+      cpu_variant = (midr >> CPU_VARIANT_SHIFT) & CPU_VARIANT_MASK;
+
+      /* Check for Qualcomm Oryon CPU variants: 0x1, 0x2, 0x3, 0x4, 0x5 */
+      if (cpu_variant <= QCOM_ORYON_CPU_VARIANTS) {
+        RETURN_FUNC(memcpy_func, __memmove_aarch64_nt);
+      } else {
+        RETURN_FUNC(memcpy_func, __memmove_aarch64);
+      }
     }
+  }
+  /* If CPU implementer is not Qualcomm, choose the custom
+   * implementation based on CPU architecture feature
+   * */
+  if (arg->_hwcap & HWCAP_ASIMD) {
+    RETURN_FUNC(memmove_func, __memmove_aarch64_simd);
+  } else {
+    RETURN_FUNC(memmove_func, __memmove_aarch64);
+  }
+}
+
+typedef int memrchr_func(const void*, int, size_t);
+DEFINE_IFUNC_FOR(memrchr) {
+    RETURN_FUNC(memrchr_func, __memrchr_aarch64);
+}
+
+typedef int memset_func(void*, int, size_t);
+DEFINE_IFUNC_FOR(memset) {
+  unsigned long midr;
+  unsigned int impl_id, cpu_variant;
+
+  if (arg->_hwcap & HWCAP_CPUID) {
+    /* Read the MIDR register */
+    asm("mrs %0, MIDR_EL1 \n\t" : "=r"(midr));
+
+    /* Extract the CPU Implementer ID */
+    impl_id = (midr >> MIDR_IMPL_ID_SHIFT) & (MIDR_IMPL_ID_MASK);
+
+    /* Check for Qualcomm implementer ID */
+    if (impl_id == QCOM_IMPL_ID) {
+      cpu_variant = (midr >> CPU_VARIANT_SHIFT) & CPU_VARIANT_MASK;
+
+      /* Check for Qualcomm Oryon CPU variants: 0x1, 0x2, 0x3, 0x4, 0x5 */
+      if (cpu_variant <= QCOM_ORYON_CPU_VARIANTS) {
+        RETURN_FUNC(memset_func, __memset_aarch64_nt);
+      } else {
+        RETURN_FUNC(memset_func, __memset_aarch64);
+      }
+    } else {
+      RETURN_FUNC(memset_func, __memset_aarch64);
+    }
+  } else {
+    RETURN_FUNC(memset_func, __memset_aarch64);
+  }
 }
 
 typedef char* stpcpy_func(char*, const char*, size_t);
diff --git a/libc/arch-arm64/oryon/memcpy-nt.S b/libc/arch-arm64/oryon/memcpy-nt.S
new file mode 100644
index 0000000..46f1541
--- /dev/null
+++ b/libc/arch-arm64/oryon/memcpy-nt.S
@@ -0,0 +1,351 @@
+/*
+ * memcpy - copy memory area
+ *
+ * Copyright (c) 2012-2022, Arm Limited.
+ * Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
+ * SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
+ */
+
+/* Assumptions:
+ *
+ * ARMv8-a, AArch64, unaligned accesses.
+ *
+ */
+
+#include <private/bionic_asm.h>
+
+#define dstin     x0
+#define src       x1
+#define count     x2
+#define dst       x3
+#define srcend    x4
+#define dstend    x5
+#define A_l       x6
+#define A_lw      w6
+#define A_h       x7
+#define B_l       x8
+#define B_lw      w8
+#define B_h       x9
+#define C_l       x10
+#define C_lw      w10
+#define C_h       x11
+#define D_l       x12
+#define D_h       x13
+#define E_l       x14
+#define E_h       x15
+#define F_l       x16
+#define F_h       x17
+#define G_l       count
+#define G_h       dst
+#define H_l       src
+#define H_h       srcend
+#define tmp1      x14
+#define tmp2      x16
+#define SMALL_BUFFER_SIZE    48
+
+/* This implementation handles overlaps and supports both memcpy and memmove
+   from a single entry point.  It uses unaligned accesses and branchless
+   sequences to keep the code small, simple and improve performance.
+
+   Copies are split into 3 main cases: small copies of up to 32 bytes, medium
+   copies of up to 128 bytes, and large copies.  The overhead of the overlap
+   check is negligible since it is only required for large copies.
+
+   Large copies use a software pipelined loop processing 64 bytes per iteration.
+   The destination pointer is 16-byte aligned to minimize unaligned accesses.
+   The loop tail is handled by always copying 64 bytes from the end.
+*/
+
+ALIAS_SYMBOL (__memmove_aarch64_nt, __memcpy_aarch64_nt)
+ENTRY (__memcpy_aarch64_nt)
+
+    add    srcend, src, count
+    add    dstend, dstin, count
+    cmp    count, 128
+    b.hi    L(copy_long)
+    cmp    count, 32
+    b.hi    L(copy32_128)
+
+    /* Small copies: 0..32 bytes.  */
+    cmp    count, 16
+    b.lo    L(copy16)
+    ldp    A_l, A_h, [src]
+    ldp    D_l, D_h, [srcend, -16]
+    stp    A_l, A_h, [dstin]
+    stp    D_l, D_h, [dstend, -16]
+    ret
+
+    /* Copy 8-15 bytes.  */
+L(copy16):
+    tbz    count, 3, L(copy8)
+    ldr    A_l, [src]
+    ldr    A_h, [srcend, -8]
+    str    A_l, [dstin]
+    str    A_h, [dstend, -8]
+    ret
+
+    .p2align 3
+    /* Copy 4-7 bytes.  */
+L(copy8):
+    tbz    count, 2, L(copy4)
+    ldr    A_lw, [src]
+    ldr    B_lw, [srcend, -4]
+    str    A_lw, [dstin]
+    str    B_lw, [dstend, -4]
+    ret
+
+    /* Copy 0..3 bytes using a branchless sequence.  */
+L(copy4):
+    cbz    count, L(copy0)
+    lsr    tmp1, count, 1
+    ldrb    A_lw, [src]
+    ldrb    C_lw, [srcend, -1]
+    ldrb    B_lw, [src, tmp1]
+    strb    A_lw, [dstin]
+    strb    B_lw, [dstin, tmp1]
+    strb    C_lw, [dstend, -1]
+L(copy0):
+    ret
+
+    .p2align 4
+    /* Medium copies: 33..128 bytes.  */
+L(copy32_128):
+    ldp    A_l, A_h, [src]
+    ldp    B_l, B_h, [src, 16]
+    ldp    C_l, C_h, [srcend, -32]
+    ldp    D_l, D_h, [srcend, -16]
+    cmp    count, 64
+    b.hi    L(copy128)
+    stp    A_l, A_h, [dstin]
+    stp    B_l, B_h, [dstin, 16]
+    stp    C_l, C_h, [dstend, -32]
+    stp    D_l, D_h, [dstend, -16]
+    ret
+
+    .p2align 4
+    /* Copy 65..128 bytes.  */
+L(copy128):
+    ldp    E_l, E_h, [src, 32]
+    ldp    F_l, F_h, [src, 48]
+    cmp    count, 96
+    b.ls    L(copy96)
+    ldp    G_l, G_h, [srcend, -64]
+    ldp    H_l, H_h, [srcend, -48]
+    stp    G_l, G_h, [dstend, -64]
+    stp    H_l, H_h, [dstend, -48]
+L(copy96):
+    stp    A_l, A_h, [dstin]
+    stp    B_l, B_h, [dstin, 16]
+    stp    E_l, E_h, [dstin, 32]
+    stp    F_l, F_h, [dstin, 48]
+    stp    C_l, C_h, [dstend, -32]
+    stp    D_l, D_h, [dstend, -16]
+    ret
+
+    .p2align 4
+    /* Copy more than 128 bytes.  */
+L(copy_long):
+    mov tmp2, #SMALL_BUFFER_SIZE
+    cmp count, tmp2, LSL#10
+    bgt L(copy_long_nt)
+    /* Use backwards copy if there is an overlap.  */
+    sub    tmp1, dstin, src
+    cbz    tmp1, L(copy0)
+    cmp    tmp1, count
+    b.lo    L(copy_long_backwards)
+
+    /* Copy 16 bytes and then align dst to 16-byte alignment.  */
+
+    ldp    D_l, D_h, [src]
+    and    tmp1, dstin, 15
+    bic    dst, dstin, 15
+    sub    src, src, tmp1
+    add    count, count, tmp1    /* Count is now 16 too large.  */
+    ldp    A_l, A_h, [src, 16]
+    stp    D_l, D_h, [dstin]
+    ldp    B_l, B_h, [src, 32]
+    ldp    C_l, C_h, [src, 48]
+    ldp    D_l, D_h, [src, 64]!
+    subs    count, count, 128 + 16    /* Test and readjust count.  */
+    b.ls    L(copy64_from_end)
+
+L(loop64):
+    stp    A_l, A_h, [dst, 16]
+    ldp    A_l, A_h, [src, 16]
+    stp    B_l, B_h, [dst, 32]
+    ldp    B_l, B_h, [src, 32]
+    stp    C_l, C_h, [dst, 48]
+    ldp    C_l, C_h, [src, 48]
+    stp    D_l, D_h, [dst, 64]!
+    ldp    D_l, D_h, [src, 64]!
+    subs    count, count, 64
+    b.hi    L(loop64)
+
+    /* Write the last iteration and copy 64 bytes from the end.  */
+L(copy64_from_end):
+    ldp    E_l, E_h, [srcend, -64]
+    stp    A_l, A_h, [dst, 16]
+    ldp    A_l, A_h, [srcend, -48]
+    stp    B_l, B_h, [dst, 32]
+    ldp    B_l, B_h, [srcend, -32]
+    stp    C_l, C_h, [dst, 48]
+    ldp    C_l, C_h, [srcend, -16]
+    stp    D_l, D_h, [dst, 64]
+    stp    E_l, E_h, [dstend, -64]
+    stp    A_l, A_h, [dstend, -48]
+    stp    B_l, B_h, [dstend, -32]
+    stp    C_l, C_h, [dstend, -16]
+    ret
+
+    .p2align 4
+
+    /* Large backwards copy for overlapping copies.
+       Copy 16 bytes and then align dst to 16-byte alignment.  */
+L(copy_long_backwards):
+    ldp    D_l, D_h, [srcend, -16]
+    and    tmp1, dstend, 15
+    sub    srcend, srcend, tmp1
+    sub    count, count, tmp1
+    ldp    A_l, A_h, [srcend, -16]
+    stp    D_l, D_h, [dstend, -16]
+    ldp    B_l, B_h, [srcend, -32]
+    ldp    C_l, C_h, [srcend, -48]
+    ldp    D_l, D_h, [srcend, -64]!
+    sub    dstend, dstend, tmp1
+    subs    count, count, 128
+    b.ls    L(copy64_from_start)
+
+L(loop64_backwards):
+    stp    A_l, A_h, [dstend, -16]
+    ldp    A_l, A_h, [srcend, -16]
+    stp    B_l, B_h, [dstend, -32]
+    ldp    B_l, B_h, [srcend, -32]
+    stp    C_l, C_h, [dstend, -48]
+    ldp    C_l, C_h, [srcend, -48]
+    stp    D_l, D_h, [dstend, -64]!
+    ldp    D_l, D_h, [srcend, -64]!
+    subs    count, count, 64
+    b.hi    L(loop64_backwards)
+
+    /* Write the last iteration and copy 64 bytes from the start.  */
+L(copy64_from_start):
+    ldp    G_l, G_h, [src, 48]
+    stp    A_l, A_h, [dstend, -16]
+    ldp    A_l, A_h, [src, 32]
+    stp    B_l, B_h, [dstend, -32]
+    ldp    B_l, B_h, [src, 16]
+    stp    C_l, C_h, [dstend, -48]
+    ldp    C_l, C_h, [src]
+    stp    D_l, D_h, [dstend, -64]
+    stp    G_l, G_h, [dstin, 48]
+    stp    A_l, A_h, [dstin, 32]
+    stp    B_l, B_h, [dstin, 16]
+    stp    C_l, C_h, [dstin]
+    ret
+
+    .p2align 4
+    /* Copy more than 48 KB using ldnp+stnp (non-temporal) instructions.  */
+L(copy_long_nt):
+    /* Use backwards copy if there is an overlap.  */
+    sub    tmp1, dstin, src
+    cbz    tmp1, L(copy0)
+    cmp    tmp1, count
+    b.lo    L(copy_long_backwards_nt)
+
+    /* Copy 16 bytes and then align dst to 16-byte alignment.  */
+
+    ldnp    D_l, D_h, [src]
+    and    tmp1, dstin, 15
+    bic    dst, dstin, 15
+    sub    src, src, tmp1
+    add    count, count, tmp1    /* Count is now 16 too large.  */
+    ldnp    A_l, A_h, [src, 16]
+    stnp    D_l, D_h, [dstin]
+    ldnp    B_l, B_h, [src, 32]
+    ldnp    C_l, C_h, [src, 48]
+    ldnp    D_l, D_h, [src, 64]
+    add     src, src, #64
+    subs    count, count, 128 + 16    /* Test and readjust count.  */
+    b.ls    L(copy64_from_end_nt)
+
+L(loop64_nt):
+    stnp    A_l, A_h, [dst, 16]
+    ldnp    A_l, A_h, [src, 16]
+    stnp    B_l, B_h, [dst, 32]
+    ldnp    B_l, B_h, [src, 32]
+    stnp    C_l, C_h, [dst, 48]
+    ldnp    C_l, C_h, [src, 48]
+    stnp    D_l, D_h, [dst, 64]
+    add dst, dst, #64
+    ldnp    D_l, D_h, [src, 64]
+    add src, src, #64
+    subs    count, count, 64
+    b.hi    L(loop64_nt)
+
+    /* Write the last iteration and copy 64 bytes from the end.  */
+L(copy64_from_end_nt):
+    ldnp    E_l, E_h, [srcend, -64]
+    stnp    A_l, A_h, [dst, 16]
+    ldnp    A_l, A_h, [srcend, -48]
+    stnp    B_l, B_h, [dst, 32]
+    ldnp    B_l, B_h, [srcend, -32]
+    stnp    C_l, C_h, [dst, 48]
+    ldnp    C_l, C_h, [srcend, -16]
+    stnp    D_l, D_h, [dst, 64]
+    stnp    E_l, E_h, [dstend, -64]
+    stnp    A_l, A_h, [dstend, -48]
+    stnp    B_l, B_h, [dstend, -32]
+    stnp    C_l, C_h, [dstend, -16]
+    ret
+
+    .p2align 4
+
+    /* Large backwards copy for overlapping copies.
+       Copy 16 bytes and then align dst to 16-byte alignment.  */
+L(copy_long_backwards_nt):
+    ldnp    D_l, D_h, [srcend, -16]
+    and    tmp1, dstend, 15
+    sub    srcend, srcend, tmp1
+    sub    count, count, tmp1
+    ldnp    A_l, A_h, [srcend, -16]
+    stnp    D_l, D_h, [dstend, -16]
+    ldnp    B_l, B_h, [srcend, -32]
+    ldnp    C_l, C_h, [srcend, -48]
+    ldnp    D_l, D_h, [srcend, -64]
+    add     srcend, srcend, #-64
+    sub    dstend, dstend, tmp1
+    subs    count, count, 128
+    b.ls    L(copy64_from_start_nt)
+
+L(loop64_backwards_nt):
+    stnp    A_l, A_h, [dstend, -16]
+    ldnp    A_l, A_h, [srcend, -16]
+    stnp    B_l, B_h, [dstend, -32]
+    ldnp    B_l, B_h, [srcend, -32]
+    stnp    C_l, C_h, [dstend, -48]
+    ldnp    C_l, C_h, [srcend, -48]
+    stnp    D_l, D_h, [dstend, -64]
+    add     dstend, dstend, #-64
+    ldnp    D_l, D_h, [srcend, -64]
+    add     srcend, srcend, #-64
+    subs    count, count, 64
+    b.hi    L(loop64_backwards_nt)
+
+    /* Write the last iteration and copy 64 bytes from the start.  */
+L(copy64_from_start_nt):
+    ldnp    G_l, G_h, [src, 48]
+    stnp    A_l, A_h, [dstend, -16]
+    ldnp    A_l, A_h, [src, 32]
+    stnp    B_l, B_h, [dstend, -32]
+    ldnp    B_l, B_h, [src, 16]
+    stnp    C_l, C_h, [dstend, -48]
+    ldnp    C_l, C_h, [src]
+    stnp    D_l, D_h, [dstend, -64]
+    stnp    G_l, G_h, [dstin, 48]
+    stnp    A_l, A_h, [dstin, 32]
+    stnp    B_l, B_h, [dstin, 16]
+    stnp    C_l, C_h, [dstin]
+    ret
+
+END (__memcpy_aarch64_nt)
+
diff --git a/libc/arch-arm64/oryon/memset-nt.S b/libc/arch-arm64/oryon/memset-nt.S
new file mode 100644
index 0000000..b91e7da
--- /dev/null
+++ b/libc/arch-arm64/oryon/memset-nt.S
@@ -0,0 +1,218 @@
+/* Copyright (c) 2012, Linaro Limited
+   All rights reserved.
+   Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
+   Redistribution and use in source and binary forms, with or without
+   modification, are permitted provided that the following conditions are met:
+       * Redistributions of source code must retain the above copyright
+         notice, this list of conditions and the following disclaimer.
+       * Redistributions in binary form must reproduce the above copyright
+         notice, this list of conditions and the following disclaimer in the
+         documentation and/or other materials provided with the distribution.
+       * Neither the name of the Linaro nor the
+         names of its contributors may be used to endorse or promote products
+         derived from this software without specific prior written permission.
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+   HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+*/
+/* Assumptions:
+ *
+ * ARMv8-a, AArch64
+ * Unaligned accesses
+ *
+ */
+#include <private/bionic_asm.h>
+
+#define dstin		x0
+#define val		    w1
+#define count		x2
+#define tmp1		x3
+#define tmp1w		w3
+#define tmp2		x4
+#define tmp2w		w4
+#define zva_len_x	x5
+#define zva_len		w5
+#define zva_bits_x	x6
+#define A_l		    x7
+#define A_lw		w7
+#define dst		    x8
+#define tmp3w		w9
+#define tmp4        x10
+#define SMALL_BUFFER_SIZE    96
+
+ENTRY(__memset_aarch64_nt)
+    mov	dst, dstin		/* Preserve return value.  */
+    ands	A_lw, val, #255
+    b.eq	.Lzero_mem  /* Use DC ZVA instruction if the val = 0 */
+    orr	A_lw, A_lw, A_lw, lsl #8
+    orr	A_lw, A_lw, A_lw, lsl #16
+    orr	A_l, A_l, A_l, lsl #32
+.Ltail_maybe_long:
+    cmp	count, #64
+    b.ge	.Lnot_short
+.Ltail_maybe_tiny:
+    cmp	count, #15
+    b.le	.Ltail15tiny
+.Ltail63:
+    ands	tmp1, count, #0x30
+    b.eq	.Ltail15
+    add	dst, dst, tmp1
+    cmp	tmp1w, #0x20
+    b.eq	1f
+    b.lt	2f
+    stp	A_l, A_l, [dst, #-48]
+1:
+    stp	A_l, A_l, [dst, #-32]
+2:
+    stp	A_l, A_l, [dst, #-16]
+.Ltail15:
+    and	count, count, #15
+    add	dst, dst, count
+    stp	A_l, A_l, [dst, #-16]	/* Repeat some/all of last store. */
+    ret
+.Ltail15tiny:
+    /* Set up to 15 bytes.  Does not assume earlier memory
+       being set.  */
+    tbz	count, #3, 1f
+    str	A_l, [dst], #8
+1:
+    tbz	count, #2, 1f
+    str	A_lw, [dst], #4
+1:
+    tbz	count, #1, 1f
+    strh	A_lw, [dst], #2
+1:
+    tbz	count, #0, 1f
+    strb	A_lw, [dst]
+1:
+    ret
+    /* Critical loop.  Start at a new cache line boundary.  Assuming
+     * 64 bytes per line, this ensures the entire loop is in one line.  */
+    .p2align 6
+.Lnot_short:
+    mov tmp4, #SMALL_BUFFER_SIZE
+    cmp count, tmp4, LSL#10
+    /* Use non-temporal instruction if count > SMALL_BUFFER_SIZE */
+    bgt L(not_short_nt)
+    neg	tmp2, dst
+    ands	tmp2, tmp2, #15
+    b.eq	2f
+    /* Bring DST to 128-bit (16-byte) alignment.  We know that there's
+     * more than that to set, so we simply store 16 bytes and advance by
+     * the amount required to reach alignment.  */
+    sub	count, count, tmp2
+    stp	A_l, A_l, [dst]
+    add	dst, dst, tmp2
+    /* There may be less than 63 bytes to go now.  */
+    cmp	count, #63
+    b.le	.Ltail63
+2:
+    sub	dst, dst, #16		/* Pre-bias.  */
+    sub	count, count, #64
+1:
+    stp	A_l, A_l, [dst, #16]
+    stp	A_l, A_l, [dst, #32]
+    stp	A_l, A_l, [dst, #48]
+    stp	A_l, A_l, [dst, #64]!
+    subs	count, count, #64
+    b.ge	1b
+    tst	count, #0x3f
+    add	dst, dst, #16
+    b.ne	.Ltail63
+    ret
+.Lnot_short_nt:
+    neg	tmp2, dst
+    ands	tmp2, tmp2, #15
+    b.eq	2f
+    /* Bring DST to 128-bit (16-byte) alignment.  We know that there's
+     * more than that to set, so we simply store 16 bytes and advance by
+     * the amount required to reach alignment.  */
+    sub	count, count, tmp2
+    stnp	A_l, A_l, [dst]
+    add	dst, dst, tmp2
+    /* There may be less than 63 bytes to go now.  */
+    cmp	count, #63
+    b.le	.Ltail63
+2:
+    sub	dst, dst, #16		/* Pre-bias.  */
+    sub	count, count, #64
+1:
+    stnp	A_l, A_l, [dst, #16]
+    stnp	A_l, A_l, [dst, #32]
+    stnp	A_l, A_l, [dst, #48]
+    stnp	A_l, A_l, [dst, #64]
+    add     dst, dst, #64
+    subs	count, count, #64
+    b.ge	1b
+    tst	count, #0x3f
+    add	dst, dst, #16
+    b.ne	.Ltail63
+    ret
+.Lzero_mem:
+    mov	A_l, #0
+    cmp	count, #63
+    b.le	.Ltail_maybe_tiny
+    neg	tmp2, dst
+    ands	tmp2, tmp2, #15
+    b.eq	1f
+    sub	count, count, tmp2
+    stp	A_l, A_l, [dst]
+    add	dst, dst, tmp2
+    cmp	count, #63
+    b.le	.Ltail63
+1:
+    /* For zeroing small amounts of memory, it's not worth setting up
+     * the line-clear code.  */
+    cmp	count, #128
+    b.lt	.Lnot_short
+    mrs	tmp1, dczid_el0
+    tbnz	tmp1, #4, .Lnot_short
+    mov	tmp3w, #4
+    and	zva_len, tmp1w, #15	/* Safety: other bits reserved.  */
+    lsl	zva_len, tmp3w, zva_len
+.Lzero_by_line:
+    /* Compute how far we need to go to become suitably aligned.  We're
+     * already at quad-word alignment.  */
+    cmp	count, zva_len_x
+    b.lt	.Lnot_short		/* Not enough to reach alignment.  */
+    sub	zva_bits_x, zva_len_x, #1
+    neg	tmp2, dst
+    ands	tmp2, tmp2, zva_bits_x
+    b.eq	1f			/* Already aligned.  */
+    /* Not aligned, check that there's enough to copy after alignment.  */
+    sub	tmp1, count, tmp2
+    cmp	tmp1, #64
+    ccmp	tmp1, zva_len_x, #8, ge	/* NZCV=0b1000 */
+    b.lt	.Lnot_short
+    /* We know that there's at least 64 bytes to zero and that it's safe
+     * to overrun by 64 bytes.  */
+    mov	count, tmp1
+2:
+    stp	A_l, A_l, [dst]
+    stp	A_l, A_l, [dst, #16]
+    stp	A_l, A_l, [dst, #32]
+    subs	tmp2, tmp2, #64
+    stp	A_l, A_l, [dst, #48]
+    add	dst, dst, #64
+    b.ge	2b
+    /* We've overrun a bit, so adjust dst downwards.  */
+    add	dst, dst, tmp2
+1:
+    sub	count, count, zva_len_x
+3:
+    dc	zva, dst
+    add	dst, dst, zva_len_x
+    subs	count, count, zva_len_x
+    b.ge	3b
+    ands	count, count, zva_bits_x
+    b.ne	.Ltail_maybe_long
+    ret
+END(__memset_aarch64_nt)
diff --git a/libc/arch-arm64/static_function_dispatch.S b/libc/arch-arm64/static_function_dispatch.S
index c7557f8..18c3783 100644
--- a/libc/arch-arm64/static_function_dispatch.S
+++ b/libc/arch-arm64/static_function_dispatch.S
@@ -37,6 +37,8 @@
 FUNCTION_DELEGATE(memcmp, __memcmp_aarch64)
 FUNCTION_DELEGATE(memcpy, __memcpy_aarch64)
 FUNCTION_DELEGATE(memmove, __memmove_aarch64)
+FUNCTION_DELEGATE(memrchr, __memrchr_aarch64)
+FUNCTION_DELEGATE(memset, __memset_aarch64)
 FUNCTION_DELEGATE(stpcpy, __stpcpy_aarch64)
 FUNCTION_DELEGATE(strchr, __strchr_aarch64_mte)
 FUNCTION_DELEGATE(strchrnul, __strchrnul_aarch64_mte)
diff --git a/libc/bionic/bionic_elf_tls.cpp b/libc/bionic/bionic_elf_tls.cpp
index 077f310..a053c27 100644
--- a/libc/bionic/bionic_elf_tls.cpp
+++ b/libc/bionic/bionic_elf_tls.cpp
@@ -60,11 +60,18 @@
   for (size_t i = 0; i < phdr_count; ++i) {
     const ElfW(Phdr)& phdr = phdr_table[i];
     if (phdr.p_type == PT_TLS) {
-      *out = TlsSegment {
-        phdr.p_memsz,
-        phdr.p_align,
-        reinterpret_cast<void*>(load_bias + phdr.p_vaddr),
-        phdr.p_filesz,
+      *out = TlsSegment{
+          .aligned_size =
+              TlsAlignedSize{
+                  .size = phdr.p_memsz,
+                  .align =
+                      TlsAlign{
+                          .value = phdr.p_align ?: 1,  // 0 means "no alignment requirement"
+                          .skew = phdr.p_vaddr % MAX(1, phdr.p_align),
+                      },
+              },
+          .init_ptr = reinterpret_cast<void*>(load_bias + phdr.p_vaddr),
+          .init_size = phdr.p_filesz,
       };
       return true;
     }
@@ -72,114 +79,171 @@
   return false;
 }
 
-// Return true if the alignment of a TLS segment is a valid power-of-two. Also
-// cap the alignment if it's too high.
-bool __bionic_check_tls_alignment(size_t* alignment) {
-  // N.B. The size does not need to be a multiple of the alignment. With
-  // ld.bfd (or after using binutils' strip), the TLS segment's size isn't
-  // rounded up.
-  if (*alignment == 0 || !powerof2(*alignment)) {
-    return false;
-  }
-  // Bionic only respects TLS alignment up to one page.
-  *alignment = MIN(*alignment, page_size());
-  return true;
+// Return true if the alignment of a TLS segment is a valid power-of-two.
+bool __bionic_check_tls_align(size_t align) {
+  // Note: The size does not need to be a multiple of the alignment. With ld.bfd
+  // (or after using binutils' strip), the TLS segment's size isn't rounded up.
+  return powerof2(align);
+}
+
+static void static_tls_layout_overflow() {
+  async_safe_fatal("error: TLS segments in static TLS overflowed");
+}
+
+static size_t align_checked(size_t value, TlsAlign tls_align) {
+  const size_t align = tls_align.value;
+  const size_t skew = tls_align.skew;
+  CHECK(align != 0 && powerof2(align + 0) && skew < align);
+  const size_t result = ((value - skew + align - 1) & ~(align - 1)) + skew;
+  if (result < value) static_tls_layout_overflow();
+  return result;
 }
 
 size_t StaticTlsLayout::offset_thread_pointer() const {
   return offset_bionic_tcb_ + (-MIN_TLS_SLOT * sizeof(void*));
 }
 
-// Reserves space for the Bionic TCB and the executable's TLS segment. Returns
-// the offset of the executable's TLS segment.
-size_t StaticTlsLayout::reserve_exe_segment_and_tcb(const TlsSegment* exe_segment,
+// Allocates the Bionic TCB and the executable's TLS segment in the static TLS
+// layout, satisfying alignment requirements for both.
+//
+// For an executable's TLS accesses (using the LocalExec model), the static
+// linker bakes TLS offsets directly into the .text section, so the loader must
+// place the executable segment at the same offset relative to the TP.
+// Similarly, the Bionic TLS slots (bionic_tcb) must also be allocated at the
+// correct offset relative to the TP.
+//
+// Returns the offset of the executable's TLS segment.
+//
+// Note: This function has unit tests, but they are in bionic-unit-tests-static,
+// not bionic-unit-tests.
+size_t StaticTlsLayout::reserve_exe_segment_and_tcb(const TlsSegment* seg,
                                                     const char* progname __attribute__((unused))) {
   // Special case: if the executable has no TLS segment, then just allocate a
   // TCB and skip the minimum alignment check on ARM.
-  if (exe_segment == nullptr) {
+  if (seg == nullptr) {
     offset_bionic_tcb_ = reserve_type<bionic_tcb>();
     return 0;
   }
 
 #if defined(__arm__) || defined(__aarch64__)
+  // ARM uses a "variant 1" TLS layout. The ABI specifies that the TP points at
+  // a 2-word TCB, followed by the executable's segment. In practice, libc
+  // implementations actually allocate a larger TCB at negative offsets from the
+  // TP.
+  //
+  // Historically, Bionic allocated an 8-word TCB starting at TP+0, so to keep
+  // the executable's TLS segment from overlapping the last 6 slots, Bionic
+  // requires that executables have an 8-word PT_TLS alignment to ensure that
+  // the TCB fits in the alignment padding, which it accomplishes using
+  // crtbegin.c. Bionic uses negative offsets for new TLS slots to avoid this
+  // problem.
 
-  // First reserve enough space for the TCB before the executable segment.
-  reserve(sizeof(bionic_tcb), 1);
+  static_assert(MIN_TLS_SLOT <= 0 && MAX_TLS_SLOT >= 1);
+  static_assert(sizeof(bionic_tcb) == (MAX_TLS_SLOT - MIN_TLS_SLOT + 1) * sizeof(void*));
+  static_assert(alignof(bionic_tcb) == sizeof(void*));
+  const size_t max_align = MAX(alignof(bionic_tcb), seg->aligned_size.align.value);
 
-  // Then reserve the segment itself.
-  const size_t result = reserve(exe_segment->size, exe_segment->alignment);
+  // Allocate the TCB first. Split it into negative and non-negative slots and
+  // ensure that TP (i.e. the first non-negative slot) is aligned to max_align.
+  const size_t tcb_size_pre = -MIN_TLS_SLOT * sizeof(void*);
+  const size_t tcb_size_post = (MAX_TLS_SLOT + 1) * sizeof(void*);
+  const auto pair =
+      reserve_tp_pair(TlsAlignedSize{.size = tcb_size_pre},
+                      TlsAlignedSize{.size = tcb_size_post, .align = TlsAlign{.value = max_align}});
+  offset_bionic_tcb_ = pair.before;
+  const size_t offset_tp = pair.tp;
 
-  // The variant 1 ABI that ARM linkers follow specifies a 2-word TCB between
-  // the thread pointer and the start of the executable's TLS segment, but both
-  // the thread pointer and the TLS segment are aligned appropriately for the
-  // TLS segment. Calculate the distance between the thread pointer and the
-  // EXE's segment.
-  const size_t exe_tpoff = __BIONIC_ALIGN(sizeof(void*) * 2, exe_segment->alignment);
+  // Allocate the segment.
+  offset_exe_ = reserve(seg->aligned_size);
 
-  const size_t min_bionic_alignment = BIONIC_ROUND_UP_POWER_OF_2(MAX_TLS_SLOT) * sizeof(void*);
-  if (exe_tpoff < min_bionic_alignment) {
-    async_safe_fatal("error: \"%s\": executable's TLS segment is underaligned: "
-                     "alignment is %zu, needs to be at least %zu for %s Bionic",
-                     progname, exe_segment->alignment, min_bionic_alignment,
-                     (sizeof(void*) == 4 ? "ARM" : "ARM64"));
+  // Verify that the ABI and Bionic tpoff values are equal, which is equivalent
+  // to checking whether the segment is sufficiently aligned.
+  const size_t abi_tpoff = align_checked(2 * sizeof(void*), seg->aligned_size.align);
+  const size_t actual_tpoff = align_checked(tcb_size_post, seg->aligned_size.align);
+  CHECK(actual_tpoff == offset_exe_ - offset_tp);
+
+  if (abi_tpoff != actual_tpoff) {
+    async_safe_fatal(
+        "error: \"%s\": executable's TLS segment is underaligned: "
+        "alignment is %zu (skew %zu), needs to be at least %zu for %s Bionic",
+        progname, seg->aligned_size.align.value, seg->aligned_size.align.skew, tcb_size_post,
+        (sizeof(void*) == 4 ? "ARM" : "ARM64"));
   }
 
-  offset_bionic_tcb_ = result - exe_tpoff - (-MIN_TLS_SLOT * sizeof(void*));
-  return result;
-
 #elif defined(__i386__) || defined(__x86_64__)
 
-  // x86 uses variant 2 TLS layout. The executable's segment is located just
-  // before the TCB.
-  static_assert(MIN_TLS_SLOT == 0, "First slot of bionic_tcb must be slot #0 on x86");
-  const size_t exe_size = round_up_with_overflow_check(exe_segment->size, exe_segment->alignment);
-  reserve(exe_size, 1);
-  const size_t max_align = MAX(alignof(bionic_tcb), exe_segment->alignment);
-  offset_bionic_tcb_ = reserve(sizeof(bionic_tcb), max_align);
-  return offset_bionic_tcb_ - exe_size;
+  auto pair = reserve_tp_pair(seg->aligned_size, TlsAlignedSize::of_type<bionic_tcb>());
+  offset_exe_ = pair.before;
+  offset_bionic_tcb_ = pair.after;
 
 #elif defined(__riscv)
+  static_assert(MAX_TLS_SLOT == -1, "Last slot of bionic_tcb must be slot #(-1) on riscv");
 
-  // First reserve enough space for the TCB before the executable segment.
-  offset_bionic_tcb_ = reserve(sizeof(bionic_tcb), 1);
-
-  // Then reserve the segment itself.
-  const size_t exe_size = round_up_with_overflow_check(exe_segment->size, exe_segment->alignment);
-  return reserve(exe_size, 1);
+  auto pair = reserve_tp_pair(TlsAlignedSize::of_type<bionic_tcb>(), seg->aligned_size);
+  offset_bionic_tcb_ = pair.before;
+  offset_exe_ = pair.after;
 
 #else
 #error "Unrecognized architecture"
 #endif
+
+  return offset_exe_;
 }
 
-void StaticTlsLayout::reserve_bionic_tls() {
+size_t StaticTlsLayout::reserve_bionic_tls() {
   offset_bionic_tls_ = reserve_type<bionic_tls>();
+  return offset_bionic_tls_;
 }
 
 void StaticTlsLayout::finish_layout() {
   // Round the offset up to the alignment.
-  offset_ = round_up_with_overflow_check(offset_, alignment_);
-
-  if (overflowed_) {
-    async_safe_fatal("error: TLS segments in static TLS overflowed");
-  }
+  cursor_ = align_checked(cursor_, TlsAlign{.value = align_});
 }
 
-// The size is not required to be a multiple of the alignment. The alignment
-// must be a positive power-of-two.
-size_t StaticTlsLayout::reserve(size_t size, size_t alignment) {
-  offset_ = round_up_with_overflow_check(offset_, alignment);
-  const size_t result = offset_;
-  if (__builtin_add_overflow(offset_, size, &offset_)) overflowed_ = true;
-  alignment_ = MAX(alignment_, alignment);
+size_t StaticTlsLayout::align_cursor(TlsAlign align) {
+  cursor_ = align_checked(cursor_, align);
+  align_ = MAX(align_, align.value);
+  return cursor_;
+}
+
+size_t StaticTlsLayout::align_cursor_unskewed(size_t align) {
+  return align_cursor(TlsAlign{.value = align});
+}
+
+// Reserve the requested number of bytes at the requested alignment. The
+// requested size is not required to be a multiple of the alignment, nor is the
+// cursor aligned after the allocation.
+size_t StaticTlsLayout::reserve(TlsAlignedSize aligned_size) {
+  align_cursor(aligned_size.align);
+  const size_t result = cursor_;
+  if (__builtin_add_overflow(cursor_, aligned_size.size, &cursor_)) static_tls_layout_overflow();
   return result;
 }
 
-size_t StaticTlsLayout::round_up_with_overflow_check(size_t value, size_t alignment) {
-  const size_t old_value = value;
-  value = __BIONIC_ALIGN(value, alignment);
-  if (value < old_value) overflowed_ = true;
-  return value;
+// Calculate the TP offset and allocate something before it and something after
+// it. The TP will be aligned to:
+//
+//     MAX(before.align.value, after.align.value)
+//
+// The `before` and `after` allocations are each allocated as closely as
+// possible to the TP.
+StaticTlsLayout::TpAllocations StaticTlsLayout::reserve_tp_pair(TlsAlignedSize before,
+                                                                TlsAlignedSize after) {
+  // Tentative `before` allocation.
+  const size_t tentative_before = reserve(before);
+  const size_t tentative_before_end = align_cursor_unskewed(before.align.value);
+
+  const size_t offset_tp = align_cursor_unskewed(MAX(before.align.value, after.align.value));
+
+  const size_t offset_after = reserve(after);
+
+  // If the `after` allocation has higher alignment than `before`, then there
+  // may be alignment padding to remove between `before` and the TP. Shift
+  // `before` forward to remove this padding.
+  CHECK(((offset_tp - tentative_before_end) & (before.align.value - 1)) == 0);
+  const size_t offset_before = tentative_before + (offset_tp - tentative_before_end);
+
+  return TpAllocations{offset_before, offset_tp, offset_after};
 }
 
 // Copy each TLS module's initialization image into a newly-allocated block of
@@ -309,7 +373,11 @@
   void* mod_ptr = dtv->modules[module_idx];
   if (mod_ptr == nullptr) {
     const TlsSegment& segment = modules.module_table[module_idx].segment;
-    mod_ptr = __libc_shared_globals()->tls_allocator.memalign(segment.alignment, segment.size);
+    // TODO: Currently the aligned_size.align.skew property is ignored.
+    // That is, for a dynamic TLS block at addr A, (A % p_align) will be 0, not
+    // (p_vaddr % p_align).
+    mod_ptr = __libc_shared_globals()->tls_allocator.memalign(segment.aligned_size.align.value,
+                                                              segment.aligned_size.size);
     if (segment.init_size > 0) {
       memcpy(mod_ptr, segment.init_ptr, segment.init_size);
     }
@@ -317,8 +385,8 @@
 
     // Reports the allocation to the listener, if any.
     if (modules.on_creation_cb != nullptr) {
-      modules.on_creation_cb(mod_ptr,
-                             static_cast<void*>(static_cast<char*>(mod_ptr) + segment.size));
+      modules.on_creation_cb(
+          mod_ptr, static_cast<void*>(static_cast<char*>(mod_ptr) + segment.aligned_size.size));
     }
   }
 
diff --git a/libc/bionic/heap_tagging.cpp b/libc/bionic/heap_tagging.cpp
index c4347e8..4d1981c 100644
--- a/libc/bionic/heap_tagging.cpp
+++ b/libc/bionic/heap_tagging.cpp
@@ -65,7 +65,7 @@
     };
   });
 
-#if defined(USE_SCUDO)
+#if defined(USE_SCUDO) && !__has_feature(hwaddress_sanitizer)
   switch (heap_tagging_level) {
     case M_HEAP_TAGGING_LEVEL_TBI:
     case M_HEAP_TAGGING_LEVEL_NONE:
@@ -123,7 +123,7 @@
           return false;
         }
       }
-#if defined(USE_SCUDO)
+#if defined(USE_SCUDO) && !__has_feature(hwaddress_sanitizer)
       scudo_malloc_disable_memory_tagging();
 #endif
       break;
@@ -151,12 +151,12 @@
         if (!set_tcf_on_all_threads(PR_MTE_TCF_ASYNC | PR_MTE_TCF_SYNC)) {
           set_tcf_on_all_threads(PR_MTE_TCF_ASYNC);
         }
-#if defined(USE_SCUDO)
+#if defined(USE_SCUDO) && !__has_feature(hwaddress_sanitizer)
         scudo_malloc_set_track_allocation_stacks(0);
 #endif
       } else if (tag_level == M_HEAP_TAGGING_LEVEL_SYNC) {
         set_tcf_on_all_threads(PR_MTE_TCF_SYNC);
-#if defined(USE_SCUDO)
+#if defined(USE_SCUDO) && !__has_feature(hwaddress_sanitizer)
         scudo_malloc_set_track_allocation_stacks(1);
 #endif
       }
diff --git a/libc/bionic/libc_init_common.cpp b/libc/bionic/libc_init_common.cpp
index 944098f..c82c52e 100644
--- a/libc/bionic/libc_init_common.cpp
+++ b/libc/bionic/libc_init_common.cpp
@@ -96,7 +96,7 @@
   SetDefaultHeapTaggingLevel();
 
 // TODO(b/158870657) make this unconditional when all devices support SCUDO.
-#if defined(USE_SCUDO)
+#if defined(USE_SCUDO) && !__has_feature(hwaddress_sanitizer)
 #if defined(SCUDO_PATTERN_FILL_CONTENTS)
   scudo_malloc_set_pattern_fill_contents(1);
 #elif defined(SCUDO_ZERO_CONTENTS)
@@ -182,7 +182,7 @@
 extern "C" void scudo_malloc_set_add_large_allocation_slack(int add_slack);
 
 __BIONIC_WEAK_FOR_NATIVE_BRIDGE void __libc_set_target_sdk_version(int target __unused) {
-#if defined(USE_SCUDO)
+#if defined(USE_SCUDO) && !__has_feature(hwaddress_sanitizer)
   scudo_malloc_set_add_large_allocation_slack(target < __ANDROID_API_S__);
 #endif
 }
diff --git a/libc/bionic/libc_init_static.cpp b/libc/bionic/libc_init_static.cpp
index f091ff8..d86df30 100644
--- a/libc/bionic/libc_init_static.cpp
+++ b/libc/bionic/libc_init_static.cpp
@@ -138,9 +138,9 @@
   static TlsModule mod;
   TlsModules& modules = __libc_shared_globals()->tls_modules;
   if (__bionic_get_tls_segment(phdr_start, phdr_ct, 0, &mod.segment)) {
-    if (!__bionic_check_tls_alignment(&mod.segment.alignment)) {
+    if (!__bionic_check_tls_align(mod.segment.aligned_size.align.value)) {
       async_safe_fatal("error: TLS segment alignment in \"%s\" is not a power of 2: %zu\n",
-                       progname, mod.segment.alignment);
+                       progname, mod.segment.aligned_size.align.value);
     }
     mod.static_offset = layout.reserve_exe_segment_and_tcb(&mod.segment, progname);
     mod.first_generation = kTlsGenerationFirst;
diff --git a/libc/bionic/malloc_common_dynamic.cpp b/libc/bionic/malloc_common_dynamic.cpp
index 8858178..6db6251 100644
--- a/libc/bionic/malloc_common_dynamic.cpp
+++ b/libc/bionic/malloc_common_dynamic.cpp
@@ -381,7 +381,7 @@
 
   MaybeInitGwpAsanFromLibc(globals);
 
-#if defined(USE_SCUDO)
+#if defined(USE_SCUDO) && !__has_feature(hwaddress_sanitizer)
   __libc_shared_globals()->scudo_stack_depot = __scudo_get_stack_depot_addr();
   __libc_shared_globals()->scudo_region_info = __scudo_get_region_info_addr();
   __libc_shared_globals()->scudo_ring_buffer = __scudo_get_ring_buffer_addr();
diff --git a/libc/bionic/sysconf.cpp b/libc/bionic/sysconf.cpp
index 9ffb58e..571370c 100644
--- a/libc/bionic/sysconf.cpp
+++ b/libc/bionic/sysconf.cpp
@@ -240,7 +240,7 @@
     case _SC_AIO_LISTIO_MAX:    return _POSIX_AIO_LISTIO_MAX;     // Minimum requirement.
     case _SC_AIO_MAX:           return _POSIX_AIO_MAX;            // Minimum requirement.
     case _SC_AIO_PRIO_DELTA_MAX:return 0;                         // Minimum requirement.
-    case _SC_DELAYTIMER_MAX:    return INT_MAX;
+    case _SC_DELAYTIMER_MAX:    return _POSIX_DELAYTIMER_MAX;
     case _SC_MQ_OPEN_MAX:       return _POSIX_MQ_OPEN_MAX;        // Minimum requirement.
     case _SC_MQ_PRIO_MAX:       return _POSIX_MQ_PRIO_MAX;        // Minimum requirement.
     case _SC_RTSIG_MAX:         return RTSIG_MAX;
@@ -308,11 +308,11 @@
     case _SC_THREAD_ROBUST_PRIO_PROTECT:  return _POSIX_THREAD_ROBUST_PRIO_PROTECT;
     case _SC_THREAD_SPORADIC_SERVER:      return _POSIX_THREAD_SPORADIC_SERVER;
     case _SC_TIMEOUTS:          return _POSIX_TIMEOUTS;
-    case _SC_TRACE:             return -1;             // Obsolescent in POSIX.1-2008.
-    case _SC_TRACE_EVENT_FILTER:      return -1;       // Obsolescent in POSIX.1-2008.
+    case _SC_TRACE:             return -1;
+    case _SC_TRACE_EVENT_FILTER:      return -1;
     case _SC_TRACE_EVENT_NAME_MAX:    return -1;
-    case _SC_TRACE_INHERIT:     return -1;             // Obsolescent in POSIX.1-2008.
-    case _SC_TRACE_LOG:         return -1;             // Obsolescent in POSIX.1-2008.
+    case _SC_TRACE_INHERIT:     return -1;
+    case _SC_TRACE_LOG:         return -1;
     case _SC_TRACE_NAME_MAX:    return -1;
     case _SC_TRACE_SYS_MAX:     return -1;
     case _SC_TRACE_USER_EVENT_MAX:    return -1;
@@ -321,7 +321,7 @@
     case _SC_V7_ILP32_OFFBIG:   return _POSIX_V7_ILP32_OFFBIG;
     case _SC_V7_LP64_OFF64:     return _POSIX_V7_LP64_OFF64;
     case _SC_V7_LPBIG_OFFBIG:   return _POSIX_V7_LPBIG_OFFBIG;
-    case _SC_XOPEN_STREAMS:     return -1;            // Obsolescent in POSIX.1-2008.
+    case _SC_XOPEN_STREAMS:     return -1;
     case _SC_XOPEN_UUCP:        return -1;
 
     case _SC_LEVEL1_ICACHE_SIZE:      return __sysconf_caches()->l1_i.size;
diff --git a/libc/include/android/crash_detail.h b/libc/include/android/crash_detail.h
index 1889f9f..946a3ab 100644
--- a/libc/include/android/crash_detail.h
+++ b/libc/include/android/crash_detail.h
@@ -69,7 +69,7 @@
  * Introduced in API 35.
  *
  * \param name identifying name for this extra data.
- *             this should generally be a human-readable debug string, but we are treating
+ *             this should generally be a human-readable UTF-8 string, but we are treating
  *             it as arbitrary bytes because it could be corrupted by the crash.
  * \param name_size number of bytes of the buffer pointed to by name
  * \param data a buffer containing the extra detail bytes, if null the crash detail
diff --git a/libc/include/bits/elf_common.h b/libc/include/bits/elf_common.h
index 0856f45..13d4fbf 100644
--- a/libc/include/bits/elf_common.h
+++ b/libc/include/bits/elf_common.h
@@ -1,5 +1,5 @@
 /*-
- * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+ * SPDX-License-Identifier: BSD-2-Clause
  *
  * Copyright (c) 2017, 2018 Dell EMC
  * Copyright (c) 2000, 2001, 2008, 2011, David E. O'Brien
@@ -26,8 +26,6 @@
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
- *
- * $FreeBSD$
  */
 
 #ifndef _SYS_ELF_COMMON_H_
@@ -38,6 +36,26 @@
  */
 
 /*
+ * Note header.  The ".note" section contains an array of notes.  Each
+ * begins with this header, aligned to a word boundary.  Immediately
+ * following the note header is n_namesz bytes of name, padded to the
+ * next word boundary.  Then comes n_descsz bytes of descriptor, again
+ * padded to a word boundary.  The values of n_namesz and n_descsz do
+ * not include the padding.
+ */
+
+#if 0 // android-added
+#if !defined(LOCORE) && !defined(__ASSEMBLER__)
+typedef struct {
+	u_int32_t	n_namesz;	/* Length of name. */
+	u_int32_t	n_descsz;	/* Length of descriptor. */
+	u_int32_t	n_type;		/* Type of this note. */
+} Elf_Note;
+typedef Elf_Note Elf_Nhdr;
+#endif
+#endif // android-added
+
+/*
  * Option kinds.
  */
 #define	ODK_NULL	0	/* undefined */
@@ -92,6 +110,21 @@
 #define	OGP_GROUP	0x0000ffff	/* GP group number */
 #define	OGP_SELF	0x00010000	/* GP group is self-contained */
 
+/*
+ * The header for GNU-style hash sections.
+ */
+
+#if 0 // android-added
+#if !defined(LOCORE) && !defined(__ASSEMBLER__)
+typedef struct {
+	u_int32_t	gh_nbuckets;	/* Number of hash buckets. */
+	u_int32_t	gh_symndx;	/* First visible symbol in .dynsym. */
+	u_int32_t	gh_maskwords;	/* #maskwords used in bloom filter. */
+	u_int32_t	gh_shift2;	/* Bloom filter shift count. */
+} Elf_GNU_Hash_Header;
+#endif
+#endif
+
 /* Indexes into the e_ident array.  Keep synced with
    http://www.sco.com/developers/gabi/latest/ch4.eheader.html */
 #define	EI_MAG0		0	/* Magic number, byte 0. */
@@ -153,7 +186,9 @@
 #define	ELFOSABI_ARM		97	/* ARM */
 #define	ELFOSABI_STANDALONE	255	/* Standalone (embedded) application */
 
+#define	ELFOSABI_SYSV		ELFOSABI_NONE	/* symbol used in old spec */
 #define	ELFOSABI_MONTEREY	ELFOSABI_AIX	/* Monterey */
+#define	ELFOSABI_GNU		ELFOSABI_LINUX
 
 /* e_ident */
 #define	IS_ELF(ehdr)	((ehdr).e_ident[EI_MAG0] == ELFMAG0 && \
@@ -299,6 +334,7 @@
 #define	EF_ARM_EABI_VER3	0x03000000
 #define	EF_ARM_EABI_VER4	0x04000000
 #define	EF_ARM_EABI_VER5	0x05000000
+#define	EF_ARM_EABI_VERSION(x)	((x) & EF_ARM_EABIMASK)
 #define	EF_ARM_INTERWORK	0x00000004
 #define	EF_ARM_APCS_26		0x00000008
 #define	EF_ARM_APCS_FLOAT	0x00000010
@@ -418,12 +454,12 @@
 #define	SHT_HIOS		0x6fffffff	/* Last of OS specific semantics */
 #define	SHT_LOPROC		0x70000000	/* reserved range for processor */
 #define	SHT_X86_64_UNWIND	0x70000001	/* unwind information */
-#define	SHT_AMD64_UNWIND	SHT_X86_64_UNWIND
+#define	SHT_AMD64_UNWIND	SHT_X86_64_UNWIND 
 
 #define	SHT_ARM_EXIDX		0x70000001	/* Exception index table. */
-#define	SHT_ARM_PREEMPTMAP	0x70000002	/* BPABI DLL dynamic linking
+#define	SHT_ARM_PREEMPTMAP	0x70000002	/* BPABI DLL dynamic linking 
 						   pre-emption map. */
-#define	SHT_ARM_ATTRIBUTES	0x70000003	/* Object file compatibility
+#define	SHT_ARM_ATTRIBUTES	0x70000003	/* Object file compatibility 
 						   attributes. */
 #define	SHT_ARM_DEBUGOVERLAY	0x70000004	/* See DBGOVL for details. */
 #define	SHT_ARM_OVERLAYSECTION	0x70000005	/* See DBGOVL for details. */
@@ -499,6 +535,9 @@
 #define	PT_TLS		7	/* Thread local storage segment */
 #define	PT_LOOS		0x60000000	/* First OS-specific. */
 #define	PT_SUNW_UNWIND	0x6464e550	/* amd64 UNWIND program header */
+// android-removed: #define	PT_GNU_EH_FRAME	0x6474e550
+// android-removed: #define	PT_GNU_STACK	0x6474e551
+// android-removed: #define	PT_GNU_RELRO	0x6474e552
 #define	PT_DUMP_DELTA	0x6fb5d000	/* va->pa map for kernel dumps
 					   (currently arm). */
 #define	PT_LOSUNW	0x6ffffffa
@@ -648,11 +687,6 @@
 #define	DT_AARCH64_BTI_PLT		0x70000001
 #define	DT_AARCH64_PAC_PLT		0x70000003
 #define	DT_AARCH64_VARIANT_PCS		0x70000005
-#define DT_AARCH64_MEMTAG_MODE		0x70000009
-#define DT_AARCH64_MEMTAG_HEAP		0x7000000b
-#define DT_AARCH64_MEMTAG_STACK		0x7000000c
-#define DT_AARCH64_MEMTAG_GLOBALS	0x7000000d
-#define DT_AARCH64_MEMTAG_GLOBALSSZ	0x7000000f
 
 #define	DT_ARM_SYMTABSZ			0x70000001
 #define	DT_ARM_PREEMPTMAP		0x70000002
@@ -810,6 +844,7 @@
 
 #define	GNU_PROPERTY_AARCH64_FEATURE_1_AND	0xc0000000
 
+// android-removed: #define	GNU_PROPERTY_AARCH64_FEATURE_1_BTI	0x00000001
 #define	GNU_PROPERTY_AARCH64_FEATURE_1_PAC	0x00000002
 
 #define	GNU_PROPERTY_X86_FEATURE_1_AND		0xc0000002
@@ -918,6 +953,49 @@
 #define	ELFCOMPRESS_LOPROC	0x70000000	/* Processor-specific */
 #define	ELFCOMPRESS_HIPROC	0x7fffffff
 
+#if 0 // android-added
+/* Values for a_type. */
+#define	AT_NULL		0	/* Terminates the vector. */
+#define	AT_IGNORE	1	/* Ignored entry. */
+#define	AT_EXECFD	2	/* File descriptor of program to load. */
+#define	AT_PHDR		3	/* Program header of program already loaded. */
+#define	AT_PHENT	4	/* Size of each program header entry. */
+#define	AT_PHNUM	5	/* Number of program header entries. */
+#define	AT_PAGESZ	6	/* Page size in bytes. */
+#define	AT_BASE		7	/* Interpreter's base address. */
+#define	AT_FLAGS	8	/* Flags. */
+#define	AT_ENTRY	9	/* Where interpreter should transfer control. */
+#define	AT_NOTELF	10	/* Program is not ELF ?? */
+#define	AT_UID		11	/* Real uid. */
+#define	AT_EUID		12	/* Effective uid. */
+#define	AT_GID		13	/* Real gid. */
+#define	AT_EGID		14	/* Effective gid. */
+#define	AT_EXECPATH	15	/* Path to the executable. */
+#define	AT_CANARY	16	/* Canary for SSP. */
+#define	AT_CANARYLEN	17	/* Length of the canary. */
+#define	AT_OSRELDATE	18	/* OSRELDATE. */
+#define	AT_NCPUS	19	/* Number of CPUs. */
+#define	AT_PAGESIZES	20	/* Pagesizes. */
+#define	AT_PAGESIZESLEN	21	/* Number of pagesizes. */
+#define	AT_TIMEKEEP	22	/* Pointer to timehands. */
+#define	AT_STACKPROT	23	/* Initial stack protection. */
+#define	AT_EHDRFLAGS	24	/* e_flags field from elf hdr */
+#define	AT_HWCAP	25	/* CPU feature flags. */
+#define	AT_HWCAP2	26	/* CPU feature flags 2. */
+#define	AT_BSDFLAGS	27	/* ELF BSD Flags. */
+#define	AT_ARGC		28	/* Argument count */
+#define	AT_ARGV		29	/* Argument vector */
+#define	AT_ENVC		30	/* Environment count */
+#define	AT_ENVV		31	/* Environment vector */
+#define	AT_PS_STRINGS	32	/* struct ps_strings */
+#define	AT_FXRNG	33	/* Pointer to root RNG seed version. */
+#define	AT_KPRELOAD	34	/* Base of vdso, preloaded by rtld */
+#define	AT_USRSTACKBASE	35	/* Top of user stack */
+#define	AT_USRSTACKLIM	36	/* Grow limit of user stack */
+
+#define	AT_COUNT	37	/* Count of defined aux entry types. */
+#endif // android-added
+
 /*
  * Relocation types.
  *
@@ -1087,7 +1165,7 @@
 #define	R_IA_64_PCREL22		0x7a	/* immediate22	S + A - P */
 #define	R_IA_64_PCREL64I	0x7b	/* immediate64	S + A - P */
 #define	R_IA_64_IPLTMSB		0x80	/* function descriptor MSB special */
-#define	R_IA_64_IPLTLSB		0x81	/* function descriptor LSB speciaal */
+#define	R_IA_64_IPLTLSB		0x81	/* function descriptor LSB special */
 #define	R_IA_64_SUB		0x85	/* immediate64	A - S */
 #define	R_IA_64_LTOFF22X	0x86	/* immediate22	special */
 #define	R_IA_64_LDXMOV		0x87	/* immediate22	special */
@@ -1248,7 +1326,6 @@
 
 /*
  * RISC-V relocation types.
- * https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#relocations
  */
 
 /* Relocation types used by the dynamic linker. */
@@ -1264,7 +1341,6 @@
 #define	R_RISCV_TLS_DTPREL64	9
 #define	R_RISCV_TLS_TPREL32	10
 #define	R_RISCV_TLS_TPREL64	11
-#define	R_RISCV_TLSDESC    	12
 
 /* Relocation types not used by the dynamic linker. */
 #define	R_RISCV_BRANCH		16
@@ -1292,8 +1368,6 @@
 #define	R_RISCV_SUB16		38
 #define	R_RISCV_SUB32		39
 #define	R_RISCV_SUB64		40
-#define	R_RISCV_GNU_VTINHERIT	41
-#define	R_RISCV_GNU_VTENTRY	42
 #define	R_RISCV_ALIGN		43
 #define	R_RISCV_RVC_BRANCH	44
 #define	R_RISCV_RVC_JUMP	45
@@ -1306,13 +1380,6 @@
 #define	R_RISCV_SET32		56
 #define	R_RISCV_32_PCREL	57
 #define	R_RISCV_IRELATIVE	58
-#define	R_RISCV_PLT32		59
-#define	R_RISCV_SET_ULEB128	60
-#define	R_RISCV_SUB_ULEB128	61
-#define	R_RISCV_TLSDESC_HI20	62
-#define	R_RISCV_TLSDESC_LOAD_LO12 63
-#define	R_RISCV_TLSDESC_ADD_LO12 64
-#define	R_RISCV_TLSDESC_CALL	65
 
 #define	R_SPARC_NONE		0
 #define	R_SPARC_8		1
diff --git a/libc/include/bits/sysconf.h b/libc/include/bits/sysconf.h
index 8607adf..ecf26ba 100644
--- a/libc/include/bits/sysconf.h
+++ b/libc/include/bits/sysconf.h
@@ -26,172 +26,324 @@
  * SUCH DAMAGE.
  */
 
-#ifndef _BITS_SYSCONF_H_
-#define _BITS_SYSCONF_H_
+#pragma once
 
 #include <sys/cdefs.h>
 
-/* as listed by Posix sysconf() description */
-/* most of these will return -1 and ENOSYS  */
-
-#define _SC_ARG_MAX             0x0000
-#define _SC_BC_BASE_MAX         0x0001
-#define _SC_BC_DIM_MAX          0x0002
-#define _SC_BC_SCALE_MAX        0x0003
-#define _SC_BC_STRING_MAX       0x0004
-#define _SC_CHILD_MAX           0x0005
-#define _SC_CLK_TCK             0x0006
-#define _SC_COLL_WEIGHTS_MAX    0x0007
-#define _SC_EXPR_NEST_MAX       0x0008
-#define _SC_LINE_MAX            0x0009
-#define _SC_NGROUPS_MAX         0x000a
-#define _SC_OPEN_MAX            0x000b
-#define _SC_PASS_MAX            0x000c
-#define _SC_2_C_BIND            0x000d
-#define _SC_2_C_DEV             0x000e
-#define _SC_2_C_VERSION         0x000f  /* Obsolescent in POSIX.1-2008, TODO: remove it. */
-#define _SC_2_CHAR_TERM         0x0010
-#define _SC_2_FORT_DEV          0x0011
-#define _SC_2_FORT_RUN          0x0012
-#define _SC_2_LOCALEDEF         0x0013
-#define _SC_2_SW_DEV            0x0014
-#define _SC_2_UPE               0x0015
-#define _SC_2_VERSION           0x0016
-#define _SC_JOB_CONTROL         0x0017
-#define _SC_SAVED_IDS           0x0018
-#define _SC_VERSION             0x0019
-#define _SC_RE_DUP_MAX          0x001a
-#define _SC_STREAM_MAX          0x001b
-#define _SC_TZNAME_MAX          0x001c
-#define _SC_XOPEN_CRYPT         0x001d
-#define _SC_XOPEN_ENH_I18N      0x001e
-#define _SC_XOPEN_SHM           0x001f
-#define _SC_XOPEN_VERSION       0x0020
-#define _SC_XOPEN_XCU_VERSION   0x0021  /* Obsolescent in POSIX.1-2008, TODO: remove it. */
-#define _SC_XOPEN_REALTIME      0x0022
-#define _SC_XOPEN_REALTIME_THREADS  0x0023
-#define _SC_XOPEN_LEGACY        0x0024
-#define _SC_ATEXIT_MAX          0x0025
-#define _SC_IOV_MAX             0x0026
+/** sysconf() query for the maximum number of bytes of exec() arguments. */
+#define _SC_ARG_MAX 0x0000
+/** sysconf() query for bc(1) behavior equivalent to _POSIX2_BC_BASE_MAX. */
+#define _SC_BC_BASE_MAX 0x0001
+/** sysconf() query for bc(1) behavior equivalent to _POSIX2_BC_DIM_MAX. */
+#define _SC_BC_DIM_MAX 0x0002
+/** sysconf() query for bc(1) behavior equivalent to _POSIX2_BC_SCALE_MAX. */
+#define _SC_BC_SCALE_MAX 0x0003
+/** sysconf() query for bc(1) behavior equivalent to _POSIX2_BC_STRING_MAX. */
+#define _SC_BC_STRING_MAX 0x0004
+/** sysconf() query equivalent to RLIMIT_NPROC. */
+#define _SC_CHILD_MAX 0x0005
+/** sysconf() query equivalent to AT_CLKTCK. */
+#define _SC_CLK_TCK 0x0006
+/** sysconf() query for collation behavior equivalent to _POSIX2_COLL_WEIGHTS_MAX. */
+#define _SC_COLL_WEIGHTS_MAX 0x0007
+/** sysconf() query for expr(1) behavior equivalent to _POSIX2_EXPR_NEST_MAX. */
+#define _SC_EXPR_NEST_MAX 0x0008
+/** sysconf() query for command-line tool behavior equivalent to _POSIX2_LINE_MAX. */
+#define _SC_LINE_MAX 0x0009
+/** sysconf() query equivalent to NGROUPS_MAX. */
+#define _SC_NGROUPS_MAX 0x000a
+/** sysconf() query equivalent to RLIMIT_NOFILE. */
+#define _SC_OPEN_MAX 0x000b
+/** sysconf() query equivalent to PASS_MAX. */
+#define _SC_PASS_MAX 0x000c
+/** sysconf() query equivalent to _POSIX2_C_BIND. */
+#define _SC_2_C_BIND 0x000d
+/** sysconf() query equivalent to _POSIX2_C_DEV. */
+#define _SC_2_C_DEV 0x000e
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_2_C_VERSION 0x000f
+/** sysconf() query equivalent to _POSIX2_CHAR_TERM. */
+#define _SC_2_CHAR_TERM 0x0010
+/** sysconf() query equivalent to _POSIX2_FORT_DEV. */
+#define _SC_2_FORT_DEV 0x0011
+/** sysconf() query equivalent to _POSIX2_FORT_RUN. */
+#define _SC_2_FORT_RUN 0x0012
+/** sysconf() query equivalent to _POSIX2_LOCALEDEF. */
+#define _SC_2_LOCALEDEF 0x0013
+/** sysconf() query equivalent to _POSIX2_SW_DEV. */
+#define _SC_2_SW_DEV 0x0014
+/** sysconf() query equivalent to _POSIX2_UPE. */
+#define _SC_2_UPE 0x0015
+/** sysconf() query equivalent to _POSIX2_VERSION. */
+#define _SC_2_VERSION 0x0016
+/** sysconf() query equivalent to _POSIX_JOB_CONTROL. */
+#define _SC_JOB_CONTROL 0x0017
+/** sysconf() query equivalent to _POSIX_SAVED_IDS. */
+#define _SC_SAVED_IDS 0x0018
+/** sysconf() query equivalent to _POSIX_VERSION. */
+#define _SC_VERSION 0x0019
+/** sysconf() query equivalent to _POSIX_RE_DUP_MAX. */
+#define _SC_RE_DUP_MAX 0x001a
+/** sysconf() query equivalent to FOPEN_MAX. */
+#define _SC_STREAM_MAX 0x001b
+/** sysconf() query equivalent to _POSIX_TZNAME_MAX. */
+#define _SC_TZNAME_MAX 0x001c
+/** sysconf() query equivalent to _XOPEN_CRYPT. */
+#define _SC_XOPEN_CRYPT 0x001d
+/** sysconf() query equivalent to _XOPEN_ENH_I18N. */
+#define _SC_XOPEN_ENH_I18N 0x001e
+/** sysconf() query equivalent to _XOPEN_SHM. */
+#define _SC_XOPEN_SHM 0x001f
+/** sysconf() query equivalent to _XOPEN_VERSION. */
+#define _SC_XOPEN_VERSION 0x0020
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_XOPEN_XCU_VERSION 0x0021
+/** sysconf() query equivalent to _XOPEN_REALTIME. */
+#define _SC_XOPEN_REALTIME 0x0022
+/** sysconf() query equivalent to _XOPEN_REALTIME_THREADS. */
+#define _SC_XOPEN_REALTIME_THREADS 0x0023
+/** sysconf() query equivalent to _XOPEN_LEGACY. */
+#define _SC_XOPEN_LEGACY 0x0024
+/** sysconf() query for the maximum number of atexit() handlers. Unlimited on Android. */
+#define _SC_ATEXIT_MAX 0x0025
+/** sysconf() query equivalent to IOV_MAX. */
+#define _SC_IOV_MAX 0x0026
+/** Same as _SC_IOV_MAX. */
 #define _SC_UIO_MAXIOV _SC_IOV_MAX
-#define _SC_PAGESIZE            0x0027
-#define _SC_PAGE_SIZE           0x0028
-#define _SC_XOPEN_UNIX          0x0029
-#define _SC_XBS5_ILP32_OFF32    0x002a  /* Obsolescent in POSIX.1-2008, TODO: remove it. */
-#define _SC_XBS5_ILP32_OFFBIG   0x002b  /* Obsolescent in POSIX.1-2008, TODO: remove it. */
-#define _SC_XBS5_LP64_OFF64     0x002c  /* Obsolescent in POSIX.1-2008, TODO: remove it. */
-#define _SC_XBS5_LPBIG_OFFBIG   0x002d  /* Obsolescent in POSIX.1-2008, TODO: remove it. */
-#define _SC_AIO_LISTIO_MAX      0x002e
-#define _SC_AIO_MAX             0x002f
+/** Same as _SC_PAGE_SIZE. */
+#define _SC_PAGESIZE 0x0027
+/** sysconf() query equivalent to getpagesize(). */
+#define _SC_PAGE_SIZE 0x0028
+/** sysconf() query equivalent to _XOPEN_UNIX. */
+#define _SC_XOPEN_UNIX 0x0029
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_XBS5_ILP32_OFF32 0x002a
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_XBS5_ILP32_OFFBIG 0x002b
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_XBS5_LP64_OFF64 0x002c
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_XBS5_LPBIG_OFFBIG 0x002d
+/** sysconf() query equivalent to _POSIX_AIO_LISTIO_MAX. */
+#define _SC_AIO_LISTIO_MAX 0x002e
+/** sysconf() query equivalent to _POSIX_AIO_MAX. */
+#define _SC_AIO_MAX 0x002f
+/** Unimplemented on Android. */
 #define _SC_AIO_PRIO_DELTA_MAX  0x0030
-#define _SC_DELAYTIMER_MAX      0x0031
-#define _SC_MQ_OPEN_MAX         0x0032
-#define _SC_MQ_PRIO_MAX         0x0033
-#define _SC_RTSIG_MAX           0x0034
-#define _SC_SEM_NSEMS_MAX       0x0035
-#define _SC_SEM_VALUE_MAX       0x0036
-#define _SC_SIGQUEUE_MAX        0x0037
-#define _SC_TIMER_MAX           0x0038
-#define _SC_ASYNCHRONOUS_IO     0x0039
-#define _SC_FSYNC               0x003a
-#define _SC_MAPPED_FILES        0x003b
-#define _SC_MEMLOCK             0x003c
-#define _SC_MEMLOCK_RANGE       0x003d
-#define _SC_MEMORY_PROTECTION   0x003e
-#define _SC_MESSAGE_PASSING     0x003f
-#define _SC_PRIORITIZED_IO      0x0040
+/** sysconf() query equivalent to _POSIX_DELAYTIMER_MAX. */
+#define _SC_DELAYTIMER_MAX 0x0031
+/** sysconf() query equivalent to _POSIX_MQ_OPEN_MAX. */
+#define _SC_MQ_OPEN_MAX 0x0032
+/** sysconf() query equivalent to _POSIX_MQ_PRIO_MAX. */
+#define _SC_MQ_PRIO_MAX 0x0033
+/** sysconf() query equivalent to RTSIG_MAX. Constant on Android. */
+#define _SC_RTSIG_MAX 0x0034
+/** sysconf() query equivalent to _POSIX_SEM_NSEMS_MAX. Constant on Android. */
+#define _SC_SEM_NSEMS_MAX 0x0035
+/** sysconf() query equivalent to SEM_VALUE_MAX. Constant on Android. */
+#define _SC_SEM_VALUE_MAX 0x0036
+/** sysconf() query equivalent to _POSIX_SIGQUEUE_MAX. */
+#define _SC_SIGQUEUE_MAX 0x0037
+/** sysconf() query equivalent to _POSIX_TIMER_MAX. */
+#define _SC_TIMER_MAX 0x0038
+/** sysconf() query equivalent to _POSIX_ASYNCHRONOUS_IO. */
+#define _SC_ASYNCHRONOUS_IO 0x0039
+/** sysconf() query equivalent to _POSIX_FSYNC. */
+#define _SC_FSYNC 0x003a
+/** sysconf() query equivalent to _POSIX_MAPPED_FILES. */
+#define _SC_MAPPED_FILES 0x003b
+/** sysconf() query equivalent to _POSIX_MEMLOCK. */
+#define _SC_MEMLOCK 0x003c
+/** sysconf() query equivalent to _POSIX_MEMLOCK_RANGE. */
+#define _SC_MEMLOCK_RANGE 0x003d
+/** sysconf() query equivalent to _POSIX_MEMORY_PROTECTION. */
+#define _SC_MEMORY_PROTECTION 0x003e
+/** sysconf() query equivalent to _POSIX_MESSAGE_PASSING. */
+#define _SC_MESSAGE_PASSING 0x003f
+/** sysconf() query equivalent to _POSIX_PRIORITIZED_IO. */
+#define _SC_PRIORITIZED_IO 0x0040
+/** sysconf() query equivalent to _POSIX_PRIORITY_SCHEDULING. */
 #define _SC_PRIORITY_SCHEDULING 0x0041
-#define _SC_REALTIME_SIGNALS    0x0042
-#define _SC_SEMAPHORES          0x0043
-#define _SC_SHARED_MEMORY_OBJECTS  0x0044
-#define _SC_SYNCHRONIZED_IO     0x0045
-#define _SC_TIMERS              0x0046
-#define _SC_GETGR_R_SIZE_MAX    0x0047
-#define _SC_GETPW_R_SIZE_MAX    0x0048
-#define _SC_LOGIN_NAME_MAX      0x0049
-#define _SC_THREAD_DESTRUCTOR_ITERATIONS  0x004a
-#define _SC_THREAD_KEYS_MAX     0x004b
-#define _SC_THREAD_STACK_MIN    0x004c
-#define _SC_THREAD_THREADS_MAX  0x004d
-#define _SC_TTY_NAME_MAX        0x004e
-
-#define _SC_THREADS                     0x004f
-#define _SC_THREAD_ATTR_STACKADDR       0x0050
-#define _SC_THREAD_ATTR_STACKSIZE       0x0051
-#define _SC_THREAD_PRIORITY_SCHEDULING  0x0052
-#define _SC_THREAD_PRIO_INHERIT         0x0053
-#define _SC_THREAD_PRIO_PROTECT         0x0054
-#define _SC_THREAD_SAFE_FUNCTIONS       0x0055
-
-#define _SC_NPROCESSORS_CONF            0x0060
-#define _SC_NPROCESSORS_ONLN            0x0061
-#define _SC_PHYS_PAGES                  0x0062
-#define _SC_AVPHYS_PAGES                0x0063
-#define _SC_MONOTONIC_CLOCK             0x0064
-
-#define _SC_2_PBS               0x0065
-#define _SC_2_PBS_ACCOUNTING    0x0066
-#define _SC_2_PBS_CHECKPOINT    0x0067
-#define _SC_2_PBS_LOCATE        0x0068
-#define _SC_2_PBS_MESSAGE       0x0069
-#define _SC_2_PBS_TRACK         0x006a
-#define _SC_ADVISORY_INFO       0x006b
-#define _SC_BARRIERS            0x006c
-#define _SC_CLOCK_SELECTION     0x006d
-#define _SC_CPUTIME             0x006e
-#define _SC_HOST_NAME_MAX       0x006f
-#define _SC_IPV6                0x0070
-#define _SC_RAW_SOCKETS         0x0071
+/** sysconf() query equivalent to _POSIX_REALTIME_SIGNALS. */
+#define _SC_REALTIME_SIGNALS 0x0042
+/** sysconf() query equivalent to _POSIX_SEMAPHORES. */
+#define _SC_SEMAPHORES 0x0043
+/** sysconf() query equivalent to _POSIX_SHARED_MEMORY_OBJECTS. */
+#define _SC_SHARED_MEMORY_OBJECTS 0x0044
+/** sysconf() query equivalent to _POSIX_SYNCHRONIZED_IO. */
+#define _SC_SYNCHRONIZED_IO 0x0045
+/** sysconf() query equivalent to _POSIX_TIMERS. */
+#define _SC_TIMERS 0x0046
+/** sysconf() query for an initial size for getgrgid_r() and getgrnam_r() buffers. */
+#define _SC_GETGR_R_SIZE_MAX 0x0047
+/** sysconf() query for an initial size for getpwuid_r() and getpwnam_r() buffers. */
+#define _SC_GETPW_R_SIZE_MAX 0x0048
+/** sysconf() query equivalent to LOGIN_NAME_MAX. */
+#define _SC_LOGIN_NAME_MAX 0x0049
+/** sysconf() query equivalent to PTHREAD_DESTRUCTOR_ITERATIONS. */
+#define _SC_THREAD_DESTRUCTOR_ITERATIONS 0x004a
+/** sysconf() query equivalent to PTHREAD_KEYS_MAX. */
+#define _SC_THREAD_KEYS_MAX 0x004b
+/** sysconf() query equivalent to PTHREAD_STACK_MIN. */
+#define _SC_THREAD_STACK_MIN 0x004c
+/** sysconf() query for a maximum number of threads. Unlimited on Android. */
+#define _SC_THREAD_THREADS_MAX 0x004d
+/** sysconf() query equivalent to TTY_NAME_MAX. */
+#define _SC_TTY_NAME_MAX 0x004e
+/** sysconf() query equivalent to _POSIX_THREADS. */
+#define _SC_THREADS 0x004f
+/** sysconf() query equivalent to _POSIX_THREAD_ATTR_STACKADDR. */
+#define _SC_THREAD_ATTR_STACKADDR 0x0050
+/** sysconf() query equivalent to _POSIX_THREAD_ATTR_STACKSIZE. */
+#define _SC_THREAD_ATTR_STACKSIZE 0x0051
+/** sysconf() query equivalent to _POSIX_THREAD_PRIORITY_SCHEDULING. */
+#define _SC_THREAD_PRIORITY_SCHEDULING 0x0052
+/** sysconf() query equivalent to _POSIX_THREAD_PRIO_INHERIT. */
+#define _SC_THREAD_PRIO_INHERIT 0x0053
+/** sysconf() query equivalent to _POSIX_THREAD_PRIO_PROTECT. */
+#define _SC_THREAD_PRIO_PROTECT 0x0054
+/** sysconf() query equivalent to _POSIX_THREAD_SAFE_FUNCTIONS. */
+#define _SC_THREAD_SAFE_FUNCTIONS 0x0055
+/** sysconf() query equivalent to get_nprocs_conf(). */
+#define _SC_NPROCESSORS_CONF 0x0060
+/** sysconf() query equivalent to get_nprocs(). */
+#define _SC_NPROCESSORS_ONLN 0x0061
+/** sysconf() query equivalent to get_phys_pages(). */
+#define _SC_PHYS_PAGES 0x0062
+/** sysconf() query equivalent to get_avphys_pages(). */
+#define _SC_AVPHYS_PAGES 0x0063
+/** sysconf() query equivalent to _POSIX_MONOTONIC_CLOCK. */
+#define _SC_MONOTONIC_CLOCK 0x0064
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_2_PBS 0x0065
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_2_PBS_ACCOUNTING 0x0066
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_2_PBS_CHECKPOINT 0x0067
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_2_PBS_LOCATE 0x0068
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_2_PBS_MESSAGE 0x0069
+/** Obsolescent in POSIX.1-2008. */
+#define _SC_2_PBS_TRACK 0x006a
+/** sysconf() query equivalent to _POSIX_ADVISORY_INFO. */
+#define _SC_ADVISORY_INFO 0x006b
+/** sysconf() query equivalent to _POSIX_BARRIERS. */
+#define _SC_BARRIERS 0x006c
+/** sysconf() query equivalent to _POSIX_CLOCK_SELECTION. */
+#define _SC_CLOCK_SELECTION 0x006d
+/** sysconf() query equivalent to _POSIX_CPUTIME. */
+#define _SC_CPUTIME 0x006e
+/** sysconf() query equivalent to _POSIX_HOST_NAME_MAX. */
+#define _SC_HOST_NAME_MAX 0x006f
+/** sysconf() query equivalent to _POSIX_IPV6. */
+#define _SC_IPV6 0x0070
+/** sysconf() query equivalent to _POSIX_RAW_SOCKETS. */
+#define _SC_RAW_SOCKETS 0x0071
+/** sysconf() query equivalent to _POSIX_READER_WRITER_LOCKS. */
 #define _SC_READER_WRITER_LOCKS 0x0072
-#define _SC_REGEXP              0x0073
-#define _SC_SHELL               0x0074
-#define _SC_SPAWN               0x0075
-#define _SC_SPIN_LOCKS          0x0076
-#define _SC_SPORADIC_SERVER     0x0077
-#define _SC_SS_REPL_MAX         0x0078
-#define _SC_SYMLOOP_MAX         0x0079
-#define _SC_THREAD_CPUTIME      0x007a
-#define _SC_THREAD_PROCESS_SHARED       0x007b
-#define _SC_THREAD_ROBUST_PRIO_INHERIT  0x007c
-#define _SC_THREAD_ROBUST_PRIO_PROTECT  0x007d
-#define _SC_THREAD_SPORADIC_SERVER      0x007e
-#define _SC_TIMEOUTS            0x007f
-#define _SC_TRACE               0x0080
-#define _SC_TRACE_EVENT_FILTER  0x0081
-#define _SC_TRACE_EVENT_NAME_MAX  0x0082
-#define _SC_TRACE_INHERIT       0x0083
-#define _SC_TRACE_LOG           0x0084
-#define _SC_TRACE_NAME_MAX      0x0085
-#define _SC_TRACE_SYS_MAX       0x0086
-#define _SC_TRACE_USER_EVENT_MAX  0x0087
-#define _SC_TYPED_MEMORY_OBJECTS  0x0088
-#define _SC_V7_ILP32_OFF32      0x0089
-#define _SC_V7_ILP32_OFFBIG     0x008a
-#define _SC_V7_LP64_OFF64       0x008b
-#define _SC_V7_LPBIG_OFFBIG     0x008c
-#define _SC_XOPEN_STREAMS       0x008d
-#define _SC_XOPEN_UUCP          0x008e
-
-#define _SC_LEVEL1_ICACHE_SIZE      0x008f
-#define _SC_LEVEL1_ICACHE_ASSOC     0x0090
-#define _SC_LEVEL1_ICACHE_LINESIZE  0x0091
-#define _SC_LEVEL1_DCACHE_SIZE      0x0092
-#define _SC_LEVEL1_DCACHE_ASSOC     0x0093
-#define _SC_LEVEL1_DCACHE_LINESIZE  0x0094
-#define _SC_LEVEL2_CACHE_SIZE       0x0095
-#define _SC_LEVEL2_CACHE_ASSOC      0x0096
-#define _SC_LEVEL2_CACHE_LINESIZE   0x0097
-#define _SC_LEVEL3_CACHE_SIZE       0x0098
-#define _SC_LEVEL3_CACHE_ASSOC      0x0099
-#define _SC_LEVEL3_CACHE_LINESIZE   0x009a
-#define _SC_LEVEL4_CACHE_SIZE       0x009b
-#define _SC_LEVEL4_CACHE_ASSOC      0x009c
-#define _SC_LEVEL4_CACHE_LINESIZE   0x009d
+/** sysconf() query equivalent to _POSIX_REGEXP. */
+#define _SC_REGEXP 0x0073
+/** sysconf() query equivalent to _POSIX_SHELL. */
+#define _SC_SHELL 0x0074
+/** sysconf() query equivalent to _POSIX_SPAWN. */
+#define _SC_SPAWN 0x0075
+/** sysconf() query equivalent to _POSIX_SPIN_LOCKS. */
+#define _SC_SPIN_LOCKS 0x0076
+/** sysconf() query equivalent to _POSIX_SPORADIC_SERVER. */
+#define _SC_SPORADIC_SERVER 0x0077
+/** sysconf() query equivalent to _POSIX_SS_REPL_MAX. */
+#define _SC_SS_REPL_MAX 0x0078
+/** sysconf() query equivalent to _POSIX_SYMLOOP_MAX. */
+#define _SC_SYMLOOP_MAX 0x0079
+/** sysconf() query equivalent to _POSIX_THREAD_CPUTIME. */
+#define _SC_THREAD_CPUTIME 0x007a
+/** sysconf() query equivalent to _POSIX_THREAD_PROCESS_SHARED. */
+#define _SC_THREAD_PROCESS_SHARED 0x007b
+/** sysconf() query equivalent to _POSIX_THREAD_ROBUST_PRIO_INHERIT. */
+#define _SC_THREAD_ROBUST_PRIO_INHERIT 0x007c
+/** sysconf() query equivalent to _POSIX_THREAD_ROBUST_PRIO_PROTECT. */
+#define _SC_THREAD_ROBUST_PRIO_PROTECT 0x007d
+/** sysconf() query equivalent to _POSIX_THREAD_SPORADIC_SERVER. */
+#define _SC_THREAD_SPORADIC_SERVER 0x007e
+/** sysconf() query equivalent to _POSIX_TIMEOUTS. */
+#define _SC_TIMEOUTS 0x007f
+/** Unimplemented. */
+#define _SC_TRACE 0x0080
+/** Unimplemented. */
+#define _SC_TRACE_EVENT_FILTER 0x0081
+/** Unimplemented. */
+#define _SC_TRACE_EVENT_NAME_MAX 0x0082
+/** Unimplemented. */
+#define _SC_TRACE_INHERIT 0x0083
+/** Unimplemented. */
+#define _SC_TRACE_LOG 0x0084
+/** Unimplemented. */
+#define _SC_TRACE_NAME_MAX 0x0085
+/** Unimplemented. */
+#define _SC_TRACE_SYS_MAX 0x0086
+/** Unimplemented. */
+#define _SC_TRACE_USER_EVENT_MAX 0x0087
+/** sysconf() query equivalent to _POSIX_TYPED_MEMORY_OBJECTS. */
+#define _SC_TYPED_MEMORY_OBJECTS 0x0088
+/** sysconf() query equivalent to _POSIX_V7_ILP32_OFF32. */
+#define _SC_V7_ILP32_OFF32 0x0089
+/** sysconf() query equivalent to _POSIX_V7_ILP32_OFFBIG. */
+#define _SC_V7_ILP32_OFFBIG 0x008a
+/** sysconf() query equivalent to _POSIX_V7_ILP64_OFF64. */
+#define _SC_V7_LP64_OFF64 0x008b
+/** sysconf() query equivalent to _POSIX_V7_ILP64_OFFBIG. */
+#define _SC_V7_LPBIG_OFFBIG 0x008c
+/** Unimplemented. */
+#define _SC_XOPEN_STREAMS 0x008d
+/** Meaningless in Android, unsupported in every other libc (but defined by POSIX). */
+#define _SC_XOPEN_UUCP 0x008e
+/** sysconf() query for the L1 instruction cache size. Not available on all architectures. */
+#define _SC_LEVEL1_ICACHE_SIZE 0x008f
+/** sysconf() query for the L1 instruction cache associativity. Not available on all architectures. */
+#define _SC_LEVEL1_ICACHE_ASSOC 0x0090
+/** sysconf() query for the L1 instruction cache line size. Not available on all architectures. */
+#define _SC_LEVEL1_ICACHE_LINESIZE 0x0091
+/** sysconf() query for the L1 data cache size. Not available on all architectures. */
+#define _SC_LEVEL1_DCACHE_SIZE 0x0092
+/** sysconf() query for the L1 data cache associativity. Not available on all architectures. */
+#define _SC_LEVEL1_DCACHE_ASSOC 0x0093
+/** sysconf() query for the L1 data cache line size. Not available on all architectures. */
+#define _SC_LEVEL1_DCACHE_LINESIZE 0x0094
+/** sysconf() query for the L2 cache size. Not available on all architectures. */
+#define _SC_LEVEL2_CACHE_SIZE 0x0095
+/** sysconf() query for the L2 cache associativity. Not available on all architectures. */
+#define _SC_LEVEL2_CACHE_ASSOC 0x0096
+/** sysconf() query for the L2 cache line size. Not available on all architectures. */
+#define _SC_LEVEL2_CACHE_LINESIZE 0x0097
+/** sysconf() query for the L3 cache size. Not available on all architectures. */
+#define _SC_LEVEL3_CACHE_SIZE 0x0098
+/** sysconf() query for the L3 cache associativity. Not available on all architectures. */
+#define _SC_LEVEL3_CACHE_ASSOC 0x0099
+/** sysconf() query for the L3 cache line size. Not available on all architectures. */
+#define _SC_LEVEL3_CACHE_LINESIZE 0x009a
+/** sysconf() query for the L4 cache size. Not available on all architectures. */
+#define _SC_LEVEL4_CACHE_SIZE 0x009b
+/** sysconf() query for the L4 cache associativity. Not available on all architectures. */
+#define _SC_LEVEL4_CACHE_ASSOC 0x009c
+/** sysconf() query for the L4 cache line size. Not available on all architectures. */
+#define _SC_LEVEL4_CACHE_LINESIZE 0x009d
 
 __BEGIN_DECLS
 
+/**
+ * [sysconf(3)](https://man7.org/linux/man-pages/man3/sysconf.3.html)
+ * gets system configuration at runtime, corresponding to the given
+ * `_SC_` constant. See the man page for details on how to interpret
+ * the results.
+ *
+ * For `_SC_` constants where an equivalent is given, it's cheaper on Android
+ * to go straight to that function call --- sysconf() is just a multiplexer.
+ * This may not be true on other systems, and other systems may not support the
+ * direct function, so sysconf() can be useful for portability, though despite
+ * POSIX's best efforts, the exact set of constants that return useful results
+ * will also vary by system.
+ */
 long sysconf(int __name);
 
 __END_DECLS
-
-#endif /* _SYS_SYSCONF_H_ */
diff --git a/libc/include/dlfcn.h b/libc/include/dlfcn.h
index a506dc1..a90c4f8 100644
--- a/libc/include/dlfcn.h
+++ b/libc/include/dlfcn.h
@@ -99,7 +99,8 @@
 /**
  * [dlsym(3)](http://man7.org/linux/man-pages/man3/dlsym.3.html)
  * returns a pointer to the symbol with the given name in the shared
- * library represented by the given handle.
+ * library represented by the given handle. The handle may have been
+ * returned from dlopen(), or can be RTLD_DEFAULT or RTLD_NEXT.
  *
  * Returns the address of the symbol on success, and returns NULL on failure,
  * in which case dlerror() can be used to retrieve the specific error.
@@ -109,7 +110,8 @@
 /**
  * [dlvsym(3)](http://man7.org/linux/man-pages/man3/dlvsym.3.html)
  * returns a pointer to the symbol with the given name and version in the shared
- * library represented by the given handle.
+ * library represented by the given handle. The handle may have been
+ * returned from dlopen(), or can be RTLD_DEFAULT or RTLD_NEXT.
  *
  * Returns the address of the symbol on success, and returns NULL on failure,
  * in which case dlerror() can be used to retrieve the specific error.
diff --git a/libc/include/elf.h b/libc/include/elf.h
index 81a50db..1275f2e 100644
--- a/libc/include/elf.h
+++ b/libc/include/elf.h
@@ -202,17 +202,11 @@
 #define DF_1_SINGLETON  0x02000000
 #define DF_1_STUB       0x04000000
 
-/* http://www.sco.com/developers/gabi/latest/ch4.eheader.html */
-#define ELFOSABI_SYSV 0 /* Synonym for ELFOSABI_NONE used by valgrind. */
-#define ELFOSABI_GNU 3 /* Synonym for ELFOSABI_LINUX. */
-
 /* http://www.sco.com/developers/gabi/latest/ch4.reloc.html */
 #define ELF32_R_INFO(sym, type) ((((Elf32_Word)sym) << 8) | ((type) & 0xff))
 #define ELF64_R_INFO(sym, type) ((((Elf64_Xword)sym) << 32) | ((type) & 0xffffffff))
 
 /* http://www.sco.com/developers/gabi/latest/ch4.symtab.html */
-#undef ELF_ST_TYPE
-#define ELF_ST_TYPE(x) ((x) & 0xf)
 #define ELF_ST_INFO(b,t) (((b) << 4) + ((t) & 0xf))
 #define ELF32_ST_INFO(b,t) ELF_ST_INFO(b,t)
 #define ELF64_ST_INFO(b,t) ELF_ST_INFO(b,t)
@@ -260,6 +254,13 @@
 #define DT_ANDROID_RELA 0x60000011 // DT_LOOS + 4
 #define DT_ANDROID_RELASZ 0x60000012 // DT_LOOS + 5
 
+/* TODO: upstreamed to FreeBSD as https://github.com/freebsd/freebsd-src/pull/1141/. */
+#define DT_AARCH64_MEMTAG_MODE 0x70000009
+#define DT_AARCH64_MEMTAG_HEAP 0x7000000b
+#define DT_AARCH64_MEMTAG_STACK 0x7000000c
+#define DT_AARCH64_MEMTAG_GLOBALS 0x7000000d
+#define DT_AARCH64_MEMTAG_GLOBALSSZ 0x7000000f
+
 /* Linux traditionally doesn't have the trailing 64 that BSD has on these. */
 #define R_AARCH64_TLS_DTPREL R_AARCH64_TLS_DTPREL64
 #define R_AARCH64_TLS_DTPMOD R_AARCH64_TLS_DTPMOD64
@@ -269,5 +270,24 @@
 #define R_ARM_TLS_DESC 13
 #define R_ARM_IRELATIVE 160
 
-/* BSD spells this slightly differently to Linux. */
+/* FreeBSD is missing these, found in
+ * https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#relocations
+ * so I've sent https://github.com/freebsd/freebsd-src/pull/1141 upstream.
+ */
+#define R_RISCV_TLSDESC 12
+#define R_RISCV_PLT32 59
+#define R_RISCV_SET_ULEB128 60
+#define R_RISCV_SUB_ULEB128 61
+#define R_RISCV_TLSDESC_HI20 62
+#define R_RISCV_TLSDESC_LOAD_LO12 63
+#define R_RISCV_TLSDESC_ADD_LO12 64
+#define R_RISCV_TLSDESC_CALL 65
+
+/* FreeBSD spells this slightly differently to Linux. Linux is correct according to
+ * https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#file-header
+ * so I've sent https://github.com/freebsd/freebsd-src/pull/1148 upstream.
+ */
+#define EF_RISCV_FLOAT_ABI EF_RISCV_FLOAT_ABI_MASK
+
+/* FreeBSD spells this slightly differently to Linux. */
 #define R_X86_64_JUMP_SLOT R_X86_64_JMP_SLOT
diff --git a/libc/include/unistd.h b/libc/include/unistd.h
index ee772a5..c69db61 100644
--- a/libc/include/unistd.h
+++ b/libc/include/unistd.h
@@ -336,6 +336,13 @@
 
 int acct(const char* _Nullable __path);
 
+/**
+ * [getpagesize(2)](https://man7.org/linux/man-pages/man2/getpagesize.2.html)
+ * returns the system's page size. This is slightly faster than going via
+ * sysconf().
+ *
+ * Returns the system's page size in bytes.
+ */
 int getpagesize(void) __attribute_const__;
 
 long syscall(long __number, ...);
diff --git a/libc/private/bionic_elf_tls.h b/libc/private/bionic_elf_tls.h
index 79ffcc4..3a7b381 100644
--- a/libc/private/bionic_elf_tls.h
+++ b/libc/private/bionic_elf_tls.h
@@ -36,9 +36,28 @@
 
 __LIBC_HIDDEN__ extern _Atomic(size_t) __libc_tls_generation_copy;
 
-struct TlsSegment {
+struct TlsAlign {
+  size_t value = 1;
+  size_t skew = 0;  // p_vaddr % p_align
+
+  template <typename T>
+  static constexpr TlsAlign of_type() {
+    return TlsAlign{.value = alignof(T)};
+  }
+};
+
+struct TlsAlignedSize {
   size_t size = 0;
-  size_t alignment = 1;
+  TlsAlign align;
+
+  template <typename T>
+  static constexpr TlsAlignedSize of_type() {
+    return TlsAlignedSize{.size = sizeof(T), .align = TlsAlign::of_type<T>()};
+  }
+};
+
+struct TlsSegment {
+  TlsAlignedSize aligned_size;
   const void* init_ptr = "";    // Field is non-null even when init_size is 0.
   size_t init_size = 0;
 };
@@ -46,44 +65,50 @@
 __LIBC_HIDDEN__ bool __bionic_get_tls_segment(const ElfW(Phdr)* phdr_table, size_t phdr_count,
                                               ElfW(Addr) load_bias, TlsSegment* out);
 
-__LIBC_HIDDEN__ bool __bionic_check_tls_alignment(size_t* alignment);
+__LIBC_HIDDEN__ bool __bionic_check_tls_align(size_t align);
 
 struct StaticTlsLayout {
   constexpr StaticTlsLayout() {}
 
-private:
-  size_t offset_ = 0;
-  size_t alignment_ = 1;
-  bool overflowed_ = false;
-
-  // Offsets to various Bionic TLS structs from the beginning of static TLS.
-  size_t offset_bionic_tcb_ = SIZE_MAX;
-  size_t offset_bionic_tls_ = SIZE_MAX;
-
 public:
   size_t offset_bionic_tcb() const { return offset_bionic_tcb_; }
   size_t offset_bionic_tls() const { return offset_bionic_tls_; }
   size_t offset_thread_pointer() const;
+  size_t offset_exe() const { return offset_exe_; }
 
-  size_t size() const { return offset_; }
-  size_t alignment() const { return alignment_; }
-  bool overflowed() const { return overflowed_; }
+  size_t size() const { return cursor_; }
 
   size_t reserve_exe_segment_and_tcb(const TlsSegment* exe_segment, const char* progname);
-  void reserve_bionic_tls();
-  size_t reserve_solib_segment(const TlsSegment& segment) {
-    return reserve(segment.size, segment.alignment);
-  }
+  size_t reserve_bionic_tls();
+  size_t reserve_solib_segment(const TlsSegment& segment) { return reserve(segment.aligned_size); }
   void finish_layout();
 
-private:
-  size_t reserve(size_t size, size_t alignment);
+#if !defined(STATIC_TLS_LAYOUT_TEST)
+ private:
+#endif
+  size_t cursor_ = 0;
+  size_t align_ = 1;
+
+  // Offsets to various Bionic TLS structs from the beginning of static TLS.
+  size_t offset_bionic_tcb_ = SIZE_MAX;
+  size_t offset_bionic_tls_ = SIZE_MAX;
+
+  size_t offset_exe_ = SIZE_MAX;
+
+  struct TpAllocations {
+    size_t before;
+    size_t tp;
+    size_t after;
+  };
+
+  size_t align_cursor(TlsAlign align);
+  size_t align_cursor_unskewed(size_t align);
+  size_t reserve(TlsAlignedSize aligned_size);
+  TpAllocations reserve_tp_pair(TlsAlignedSize before, TlsAlignedSize after);
 
   template <typename T> size_t reserve_type() {
-    return reserve(sizeof(T), alignof(T));
+    return reserve(TlsAlignedSize::of_type<T>());
   }
-
-  size_t round_up_with_overflow_check(size_t value, size_t alignment);
 };
 
 static constexpr size_t kTlsGenerationNone = 0;
diff --git a/linker/Android.bp b/linker/Android.bp
index f87a92e..e1a5a91 100644
--- a/linker/Android.bp
+++ b/linker/Android.bp
@@ -334,6 +334,13 @@
         "-Wl,-Bsymbolic",
         "-Wl,--exclude-libs,ALL",
         "-Wl,-soname,ld-android.so",
+        // When the linker applies its own IRELATIVE relocations, it will only read DT_REL[A] and
+        // DT_JMPREL, not DT_ANDROID_REL[A], which can also theoretically contain IRELATIVE
+        // relocations. lld has been taught to not store them there as a bug workaround (see
+        // https://llvm.org/pr86751) but the workaround could be removed at some point in the
+        // future. So we explicitly prevent it from doing so by disabling DT_ANDROID_REL[A] when
+        // linking the linker (DT_RELR cannot encode IRELATIVE relocations).
+        "-Wl,--pack-dyn-relocs=relr",
     ],
 
     // we are going to link libc++_static manually because
diff --git a/linker/linker.cpp b/linker/linker.cpp
index 81869b3..8b467a3 100644
--- a/linker/linker.cpp
+++ b/linker/linker.cpp
@@ -2788,7 +2788,7 @@
   return true;
 }
 
-void soinfo::apply_relr_reloc(ElfW(Addr) offset) {
+static void apply_relr_reloc(ElfW(Addr) offset, ElfW(Addr) load_bias) {
   ElfW(Addr) address = offset + load_bias;
   *reinterpret_cast<ElfW(Addr)*>(address) += load_bias;
 }
@@ -2796,20 +2796,18 @@
 // Process relocations in SHT_RELR section (experimental).
 // Details of the encoding are described in this post:
 //   https://groups.google.com/d/msg/generic-abi/bX460iggiKg/Pi9aSwwABgAJ
-bool soinfo::relocate_relr() {
-  ElfW(Relr)* begin = relr_;
-  ElfW(Relr)* end = relr_ + relr_count_;
+bool relocate_relr(const ElfW(Relr)* begin, const ElfW(Relr)* end, ElfW(Addr) load_bias) {
   constexpr size_t wordsize = sizeof(ElfW(Addr));
 
   ElfW(Addr) base = 0;
-  for (ElfW(Relr)* current = begin; current < end; ++current) {
+  for (const ElfW(Relr)* current = begin; current < end; ++current) {
     ElfW(Relr) entry = *current;
     ElfW(Addr) offset;
 
     if ((entry&1) == 0) {
       // Even entry: encodes the offset for next relocation.
       offset = static_cast<ElfW(Addr)>(entry);
-      apply_relr_reloc(offset);
+      apply_relr_reloc(offset, load_bias);
       // Set base offset for subsequent bitmap entries.
       base = offset + wordsize;
       continue;
@@ -2820,7 +2818,7 @@
     while (entry != 0) {
       entry >>= 1;
       if ((entry&1) != 0) {
-        apply_relr_reloc(offset);
+        apply_relr_reloc(offset, load_bias);
       }
       offset += wordsize;
     }
@@ -2869,9 +2867,9 @@
     // The loader does not (currently) support ELF TLS, so it shouldn't have
     // a TLS segment.
     CHECK(!relocating_linker && "TLS not supported in loader");
-    if (!__bionic_check_tls_alignment(&tls_segment.alignment)) {
+    if (!__bionic_check_tls_align(tls_segment.aligned_size.align.value)) {
       DL_ERR("TLS segment alignment in \"%s\" is not a power of 2: %zu", get_realpath(),
-             tls_segment.alignment);
+             tls_segment.aligned_size.align.value);
       return false;
     }
     tls_ = std::make_unique<soinfo_tls>();
diff --git a/linker/linker.h b/linker/linker.h
index 275182f..ac2222d 100644
--- a/linker/linker.h
+++ b/linker/linker.h
@@ -179,6 +179,7 @@
 int get_application_target_sdk_version();
 ElfW(Versym) find_verdef_version_index(const soinfo* si, const version_info* vi);
 bool validate_verdef_section(const soinfo* si);
+bool relocate_relr(const ElfW(Relr)* begin, const ElfW(Relr)* end, ElfW(Addr) load_bias);
 
 struct platform_properties {
 #if defined(__aarch64__)
diff --git a/linker/linker_main.cpp b/linker/linker_main.cpp
index c9dcfa3..77769f5 100644
--- a/linker/linker_main.cpp
+++ b/linker/linker_main.cpp
@@ -635,9 +635,10 @@
   }
 }
 
-static void call_ifunc_resolvers() {
-  // Find the IRELATIVE relocations using the DT_JMPREL and DT_PLTRELSZ, or DT_RELA? and DT_RELA?SZ
-  // dynamic tags.
+static void relocate_linker() {
+  // The linker should only have relative relocations (in RELR) and IRELATIVE
+  // relocations. Find the IRELATIVE relocations using the DT_JMPREL and
+  // DT_PLTRELSZ, or DT_RELA/DT_RELASZ (DT_REL/DT_RELSZ on ILP32).
   auto ehdr = reinterpret_cast<ElfW(Addr)>(&__ehdr_start);
   auto* phdr = reinterpret_cast<ElfW(Phdr)*>(ehdr + __ehdr_start.e_phoff);
   for (size_t i = 0; i != __ehdr_start.e_phnum; ++i) {
@@ -645,18 +646,33 @@
       continue;
     }
     auto *dyn = reinterpret_cast<ElfW(Dyn)*>(ehdr + phdr[i].p_vaddr);
-    ElfW(Addr) pltrel = 0, pltrelsz = 0, rel = 0, relsz = 0;
+    ElfW(Addr) relr = 0, relrsz = 0, pltrel = 0, pltrelsz = 0, rel = 0, relsz = 0;
     for (size_t j = 0, size = phdr[i].p_filesz / sizeof(ElfW(Dyn)); j != size; ++j) {
-      if (dyn[j].d_tag == DT_JMPREL) {
-        pltrel = dyn[j].d_un.d_ptr;
-      } else if (dyn[j].d_tag == DT_PLTRELSZ) {
-        pltrelsz = dyn[j].d_un.d_ptr;
-      } else if (dyn[j].d_tag == kRelTag) {
-        rel = dyn[j].d_un.d_ptr;
-      } else if (dyn[j].d_tag == kRelSzTag) {
-        relsz = dyn[j].d_un.d_ptr;
+      const auto tag = dyn[j].d_tag;
+      const auto val = dyn[j].d_un.d_ptr;
+      // We don't currently handle IRELATIVE relocations in DT_ANDROID_REL[A].
+      // We disabled DT_ANDROID_REL[A] at build time; verify that it was actually disabled.
+      CHECK(tag != DT_ANDROID_REL && tag != DT_ANDROID_RELA);
+      if (tag == DT_RELR || tag == DT_ANDROID_RELR) {
+        relr = val;
+      } else if (tag == DT_RELRSZ || tag == DT_ANDROID_RELRSZ) {
+        relrsz = val;
+      } else if (tag == DT_JMPREL) {
+        pltrel = val;
+      } else if (tag == DT_PLTRELSZ) {
+        pltrelsz = val;
+      } else if (tag == kRelTag) {
+        rel = val;
+      } else if (tag == kRelSzTag) {
+        relsz = val;
       }
     }
+    // Apply RELR relocations first so that the GOT is initialized for ifunc
+    // resolvers.
+    if (relr && relrsz) {
+      relocate_relr(reinterpret_cast<ElfW(Relr*)>(ehdr + relr),
+                    reinterpret_cast<ElfW(Relr*)>(ehdr + relr + relrsz), ehdr);
+    }
     if (pltrel && pltrelsz) {
       call_ifunc_resolvers_for_section(reinterpret_cast<RelType*>(ehdr + pltrel),
                                        reinterpret_cast<RelType*>(ehdr + pltrel + pltrelsz));
@@ -734,8 +750,12 @@
   ElfW(Ehdr)* elf_hdr = reinterpret_cast<ElfW(Ehdr)*>(linker_addr);
   ElfW(Phdr)* phdr = reinterpret_cast<ElfW(Phdr)*>(linker_addr + elf_hdr->e_phoff);
 
-  // string.h functions must not be used prior to calling the linker's ifunc resolvers.
-  call_ifunc_resolvers();
+  // Relocate the linker. This step will initialize the GOT, which is needed for
+  // accessing non-hidden global variables. (On some targets, the stack
+  // protector uses GOT accesses rather than TLS.) Relocating the linker will
+  // also call the linker's ifunc resolvers so that string.h functions can be
+  // used.
+  relocate_linker();
 
   soinfo tmp_linker_so(nullptr, nullptr, nullptr, 0, 0);
 
@@ -747,7 +767,6 @@
   tmp_linker_so.phnum = elf_hdr->e_phnum;
   tmp_linker_so.set_linker_flag();
 
-  // Prelink the linker so we can access linker globals.
   if (!tmp_linker_so.prelink_image()) __linker_cannot_link(args.argv[0]);
   if (!tmp_linker_so.link_image(SymbolLookupList(&tmp_linker_so), &tmp_linker_so, nullptr, nullptr)) __linker_cannot_link(args.argv[0]);
 
diff --git a/linker/linker_phdr.cpp b/linker/linker_phdr.cpp
index 074012d..ef7671c 100644
--- a/linker/linker_phdr.cpp
+++ b/linker/linker_phdr.cpp
@@ -724,6 +724,16 @@
       continue;
     }
 
+    // If the PT_NOTE extends beyond the file. The ELF is doing something
+    // strange -- obfuscation, embedding hidden loaders, ...
+    //
+    // It doesn't contain the pad_segment note. Skip it to avoid SIGBUS
+    // by accesses beyond the file.
+    off64_t note_end_off = file_offset_ + phdr->p_offset + phdr->p_filesz;
+    if (note_end_off > file_size_) {
+      continue;
+    }
+
     // note_fragment is scoped to within the loop so that there is
     // at most 1 PT_NOTE mapped at anytime during this search.
     MappedFileFragment note_fragment;
@@ -1270,11 +1280,6 @@
 
 
 #if defined(__arm__)
-
-#  ifndef PT_ARM_EXIDX
-#    define PT_ARM_EXIDX    0x70000001      /* .ARM.exidx segment */
-#  endif
-
 /* Return the address and size of the .ARM.exidx section in memory,
  * if present.
  *
diff --git a/linker/linker_relocate.cpp b/linker/linker_relocate.cpp
index 080570d..85f7b3a 100644
--- a/linker/linker_relocate.cpp
+++ b/linker/linker_relocate.cpp
@@ -609,6 +609,17 @@
   relocator.tlsdesc_args = &tlsdesc_args_;
   relocator.tls_tp_base = __libc_shared_globals()->static_tls_layout.offset_thread_pointer();
 
+  // The linker already applied its RELR relocations in an earlier pass, so
+  // skip the RELR relocations for the linker.
+  if (relr_ != nullptr && !is_linker()) {
+    DEBUG("[ relocating %s relr ]", get_realpath());
+    const ElfW(Relr)* begin = relr_;
+    const ElfW(Relr)* end = relr_ + relr_count_;
+    if (!relocate_relr(begin, end, load_bias)) {
+      return false;
+    }
+  }
+
   if (android_relocs_ != nullptr) {
     // check signature
     if (android_relocs_size_ > 3 &&
@@ -630,13 +641,6 @@
     }
   }
 
-  if (relr_ != nullptr) {
-    DEBUG("[ relocating %s relr ]", get_realpath());
-    if (!relocate_relr()) {
-      return false;
-    }
-  }
-
 #if defined(USE_RELA)
   if (rela_ != nullptr) {
     DEBUG("[ relocating %s rela ]", get_realpath());
diff --git a/linker/linker_soinfo.h b/linker/linker_soinfo.h
index a5d31d5..9a13af2 100644
--- a/linker/linker_soinfo.h
+++ b/linker/linker_soinfo.h
@@ -384,8 +384,6 @@
 
  private:
   bool relocate(const SymbolLookupList& lookup_list);
-  bool relocate_relr();
-  void apply_relr_reloc(ElfW(Addr) offset);
 
   // This part of the structure is only available
   // when FLAG_NEW_SOINFO is set in this->flags.
diff --git a/tests/Android.bp b/tests/Android.bp
index 89d2267..528ccb8 100644
--- a/tests/Android.bp
+++ b/tests/Android.bp
@@ -578,6 +578,9 @@
     include_dirs: [
         "bionic/libc",
     ],
+    static_libs: [
+        "libbase",
+    ],
     shared: {
         enabled: false,
     },
@@ -834,8 +837,10 @@
     data_bins: [
         "cfi_test_helper",
         "cfi_test_helper2",
+        "elftls_align_test_helper",
         "elftls_dlopen_ie_error_helper",
         "elftls_dtv_resize_helper",
+        "elftls_skew_align_test_helper",
         "exec_linker_helper",
         "exec_linker_helper_lib",
         "heap_tagging_async_helper",
@@ -1189,9 +1194,9 @@
         "gtest_globals.cpp",
         "gtest_main.cpp",
 
-        // The Bionic allocator has its own C++ API. It isn't packaged into its
-        // own library, so it can only be tested when it's part of libc.a.
+        // Test internal parts of Bionic that aren't exposed via libc.so.
         "bionic_allocator_test.cpp",
+        "static_tls_layout_test.cpp",
     ],
     include_dirs: [
         "bionic/libc",
@@ -1221,6 +1226,8 @@
         never: true,
     },
     data_bins: [
+        "elftls_align_test_helper",
+        "elftls_skew_align_test_helper",
         "heap_tagging_async_helper",
         "heap_tagging_disabled_helper",
         "heap_tagging_static_async_helper",
diff --git a/tests/elftls_test.cpp b/tests/elftls_test.cpp
index 7c072b6..b3f511e 100644
--- a/tests/elftls_test.cpp
+++ b/tests/elftls_test.cpp
@@ -30,6 +30,9 @@
 
 #include <thread>
 
+#include "gtest_globals.h"
+#include "utils.h"
+
 // Specify the LE access model explicitly. This file is compiled into the
 // bionic-unit-tests executable, but the compiler sees an -fpic object file
 // output into a static library, so it defaults to dynamic TLS accesses.
@@ -87,3 +90,17 @@
     ASSERT_EQ(31, ++tlsvar_general);
   }).join();
 }
+
+TEST(elftls, align_test) {
+  std::string helper = GetTestLibRoot() + "/elftls_align_test_helper";
+  ExecTestHelper eth;
+  eth.SetArgs({helper.c_str(), nullptr});
+  eth.Run([&]() { execve(helper.c_str(), eth.GetArgs(), eth.GetEnv()); }, 0, nullptr);
+}
+
+TEST(elftls, skew_align_test) {
+  std::string helper = GetTestLibRoot() + "/elftls_skew_align_test_helper";
+  ExecTestHelper eth;
+  eth.SetArgs({helper.c_str(), nullptr});
+  eth.Run([&]() { execve(helper.c_str(), eth.GetArgs(), eth.GetEnv()); }, 0, nullptr);
+}
diff --git a/tests/libs/Android.bp b/tests/libs/Android.bp
index f640552..fc7fd40 100644
--- a/tests/libs/Android.bp
+++ b/tests/libs/Android.bp
@@ -156,6 +156,20 @@
     ],
 }
 
+cc_test {
+    name: "elftls_align_test_helper",
+    defaults: ["bionic_testlib_defaults"],
+    srcs: ["elftls_align_test_helper.cpp"],
+    stl: "none", // avoid including extra TLS variables in the executable
+}
+
+cc_test {
+    name: "elftls_skew_align_test_helper",
+    defaults: ["bionic_testlib_defaults"],
+    srcs: ["elftls_skew_align_test_helper.cpp"],
+    stl: "none", // avoid including extra TLS variables in the executable
+}
+
 // -----------------------------------------------------------------------------
 // Library to test gnu-styled hash
 // -----------------------------------------------------------------------------
diff --git a/tests/libs/elftls_align_test_helper.cpp b/tests/libs/elftls_align_test_helper.cpp
new file mode 100644
index 0000000..72e81da
--- /dev/null
+++ b/tests/libs/elftls_align_test_helper.cpp
@@ -0,0 +1,63 @@
+/*
+ * Copyright (C) 2024 The Android Open Source Project
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *  * Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  * Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+ * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
+ * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
+ * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include "CHECK.h"
+
+struct AlignedVar {
+  int field;
+  char buffer[0x1000 - sizeof(int)];
+} __attribute__((aligned(0x400)));
+
+struct SmallVar {
+  int field;
+  char buffer[0xeee - sizeof(int)];
+};
+
+// The single .tdata section should have a size that isn't a multiple of its
+// alignment.
+__thread struct AlignedVar var1 = {13};
+__thread struct AlignedVar var2 = {17};
+__thread struct SmallVar var3 = {19};
+
+static uintptr_t var_addr(void* value) {
+  // Maybe the optimizer would assume that the variable has the alignment it is
+  // declared with.
+  asm volatile("" : "+r,m"(value) : : "memory");
+  return reinterpret_cast<uintptr_t>(value);
+}
+
+int main() {
+  CHECK((var_addr(&var1) & 0x3ff) == 0);
+  CHECK((var_addr(&var2) & 0x3ff) == 0);
+  CHECK(var1.field == 13);
+  CHECK(var2.field == 17);
+  CHECK(var3.field == 19);
+  return 0;
+}
diff --git a/tests/libs/elftls_skew_align_test_helper.cpp b/tests/libs/elftls_skew_align_test_helper.cpp
new file mode 100644
index 0000000..f7f082d
--- /dev/null
+++ b/tests/libs/elftls_skew_align_test_helper.cpp
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2024 The Android Open Source Project
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *  * Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  * Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+ * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
+ * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
+ * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+// LLD tries not to generate a PT_TLS segment where (p_vaddr % p_align) is
+// non-zero. It can still do so if the p_align values are greater than a page.
+
+#include <stdint.h>
+#include <unistd.h>
+
+#include "CHECK.h"
+
+struct SmallVar {
+  int field;
+  char buffer[0x100 - sizeof(int)];
+};
+
+struct AlignedVar {
+  int field;
+  char buffer[0x20000 - sizeof(int)];
+} __attribute__((aligned(0x20000)));
+
+__thread struct SmallVar var1 = {13};
+__thread struct SmallVar var2 = {17};
+__thread struct AlignedVar var3;
+__thread struct AlignedVar var4;
+
+static uintptr_t var_addr(void* value) {
+  // Maybe the optimizer would assume that the variable has the alignment it is
+  // declared with.
+  asm volatile("" : "+r,m"(value) : : "memory");
+  return reinterpret_cast<uintptr_t>(value);
+}
+
+int main() {
+  // Bionic only allocates ELF TLS blocks with up to page alignment.
+  CHECK((var_addr(&var3) & (getpagesize() - 1)) == 0);
+  CHECK((var_addr(&var4) & (getpagesize() - 1)) == 0);
+
+  // TODO: These TLS accesses are broken with the current version of LLD. See
+  // https://github.com/llvm/llvm-project/issues/84743.
+#if !defined(__riscv)
+  CHECK(var1.field == 13);
+  CHECK(var2.field == 17);
+#endif
+
+  CHECK(var3.field == 0);
+  CHECK(var4.field == 0);
+  return 0;
+}
diff --git a/tests/static_tls_layout_test.cpp b/tests/static_tls_layout_test.cpp
new file mode 100644
index 0000000..bf508e8
--- /dev/null
+++ b/tests/static_tls_layout_test.cpp
@@ -0,0 +1,213 @@
+/*
+ * Copyright (C) 2024 The Android Open Source Project
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *  * Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  * Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+ * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
+ * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
+ * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#define STATIC_TLS_LAYOUT_TEST
+
+#include "private/bionic_elf_tls.h"
+
+#include <string>
+#include <tuple>
+
+#include <gtest/gtest.h>
+
+#include "private/bionic_tls.h"
+
+using namespace std::string_literals;
+
+struct AlignedSizeFlat {
+  size_t size = 0;
+  size_t align = 1;
+  size_t skew = 0;
+};
+
+static TlsAlignedSize unflatten_size(AlignedSizeFlat flat) {
+  return TlsAlignedSize{.size = flat.size,
+                        .align = TlsAlign{
+                            .value = flat.align,
+                            .skew = flat.skew,
+                        }};
+}
+
+TEST(static_tls_layout, reserve_tp_pair) {
+  auto reserve_tp = [](const AlignedSizeFlat& before, const AlignedSizeFlat& after,
+                       StaticTlsLayout layout = {}) {
+    auto allocs = layout.reserve_tp_pair(unflatten_size(before), unflatten_size(after));
+    return std::make_tuple(layout, allocs);
+  };
+
+  StaticTlsLayout layout;
+  StaticTlsLayout::TpAllocations allocs;
+
+  // Simple case.
+  std::tie(layout, allocs) = reserve_tp({.size = 8, .align = 2}, {.size = 16, .align = 2});
+  EXPECT_EQ(0u, allocs.before);
+  EXPECT_EQ(8u, allocs.tp);
+  EXPECT_EQ(8u, allocs.after);
+  EXPECT_EQ(24u, layout.size());
+  EXPECT_EQ(2u, layout.align_);
+
+  // Zero-sized `before`
+  std::tie(layout, allocs) = reserve_tp({.size = 0}, {.size = 64, .align = 8});
+  EXPECT_EQ(0u, allocs.before);
+  EXPECT_EQ(0u, allocs.tp);
+  EXPECT_EQ(0u, allocs.after);
+
+  // Zero-sized `after`
+  std::tie(layout, allocs) = reserve_tp({.size = 64, .align = 8}, {.size = 0});
+  EXPECT_EQ(0u, allocs.before);
+  EXPECT_EQ(64u, allocs.tp);
+  EXPECT_EQ(64u, allocs.after);
+
+  // The `before` allocation is shifted forward to the TP.
+  std::tie(layout, allocs) = reserve_tp({.size = 1}, {.size = 64, .align = 8});
+  EXPECT_EQ(7u, allocs.before);
+  EXPECT_EQ(8u, allocs.tp);
+  EXPECT_EQ(8u, allocs.after);
+
+  // Alignment gap between `before` and TP.
+  std::tie(layout, allocs) = reserve_tp({.size = 9, .align = 4}, {.size = 1});
+  EXPECT_EQ(0u, allocs.before);
+  EXPECT_EQ(12u, allocs.tp);
+  EXPECT_EQ(12u, allocs.after);
+  EXPECT_EQ(13u, layout.size());
+  EXPECT_EQ(4u, layout.align_);
+
+  // Alignment gap between `before` and TP.
+  std::tie(layout, allocs) = reserve_tp({.size = 9, .align = 4}, {.size = 128, .align = 64});
+  EXPECT_EQ(52u, allocs.before);
+  EXPECT_EQ(64u, allocs.tp);
+  EXPECT_EQ(64u, allocs.after);
+  EXPECT_EQ(192u, layout.size());
+  EXPECT_EQ(64u, layout.align_);
+
+  // Skew-aligned `before` with low alignment.
+  std::tie(layout, allocs) =
+      reserve_tp({.size = 1, .align = 4, .skew = 1}, {.size = 64, .align = 8});
+  EXPECT_EQ(5u, allocs.before);
+  EXPECT_EQ(8u, allocs.tp);
+
+  // Skew-aligned `before` with high alignment.
+  std::tie(layout, allocs) = reserve_tp({.size = 48, .align = 64, .skew = 17}, {.size = 1});
+  EXPECT_EQ(17u, allocs.before);
+  EXPECT_EQ(128u, allocs.tp);
+
+  // An unrelated byte precedes the pair in the layout. Make sure `before` is
+  // still aligned.
+  layout = {};
+  layout.reserve_type<char>();
+  std::tie(layout, allocs) = reserve_tp({.size = 12, .align = 16}, {.size = 1}, layout);
+  EXPECT_EQ(16u, allocs.before);
+  EXPECT_EQ(32u, allocs.tp);
+
+  // Skew-aligned `after`.
+  std::tie(layout, allocs) =
+      reserve_tp({.size = 32, .align = 8}, {.size = 16, .align = 4, .skew = 3});
+  EXPECT_EQ(0u, allocs.before);
+  EXPECT_EQ(32u, allocs.tp);
+  EXPECT_EQ(35u, allocs.after);
+  EXPECT_EQ(51u, layout.size());
+}
+
+// A "NUM_words" literal is the size in bytes of NUM words of memory.
+static size_t operator""_words(unsigned long long i) {
+  return i * sizeof(void*);
+}
+
+TEST(static_tls_layout, arm) {
+#if !defined(__arm__) && !defined(__aarch64__)
+  GTEST_SKIP() << "test only applies to arm32/arm64 targets";
+#endif
+
+  auto reserve_exe = [](const AlignedSizeFlat& config) {
+    StaticTlsLayout layout;
+    TlsSegment seg = {.aligned_size = unflatten_size(config)};
+    layout.reserve_exe_segment_and_tcb(&seg, "prog");
+    return layout;
+  };
+
+  auto underalign_error = [](size_t align, size_t offset) {
+    return R"(error: "prog": executable's TLS segment is underaligned: )"s
+           R"(alignment is )"s +
+           std::to_string(align) + R"( \(skew )" + std::to_string(offset) +
+           R"(\), needs to be at least (32 for ARM|64 for ARM64) Bionic)"s;
+  };
+
+  // Amount of memory needed for negative TLS slots, given a segment p_align of
+  // 8 or 16 words.
+  const size_t base8 = __BIONIC_ALIGN(-MIN_TLS_SLOT, 8) * sizeof(void*);
+  const size_t base16 = __BIONIC_ALIGN(-MIN_TLS_SLOT, 16) * sizeof(void*);
+
+  StaticTlsLayout layout;
+
+  // An executable with a single word.
+  layout = reserve_exe({.size = 1_words, .align = 8_words});
+  EXPECT_EQ(base8 + MIN_TLS_SLOT * sizeof(void*), layout.offset_bionic_tcb());
+  EXPECT_EQ(base8, layout.offset_thread_pointer());
+  EXPECT_EQ(base8 + 8_words, layout.offset_exe());
+  EXPECT_EQ(base8 + 9_words, layout.size());
+  EXPECT_EQ(8_words, layout.align_);
+
+  // Simple underalignment case.
+  EXPECT_DEATH(reserve_exe({.size = 1_words, .align = 1_words}), underalign_error(1_words, 0));
+
+  // Skewed by 1 word is OK.
+  layout = reserve_exe({.size = 1_words, .align = 8_words, .skew = 1_words});
+  EXPECT_EQ(base8, layout.offset_thread_pointer());
+  EXPECT_EQ(base8 + 9_words, layout.offset_exe());
+  EXPECT_EQ(base8 + 10_words, layout.size());
+  EXPECT_EQ(8_words, layout.align_);
+
+  // Skewed by 2 words would overlap Bionic slots, regardless of the p_align
+  // value.
+  EXPECT_DEATH(reserve_exe({.size = 1_words, .align = 8_words, .skew = 2_words}),
+               underalign_error(8_words, 2_words));
+  EXPECT_DEATH(reserve_exe({.size = 1_words, .align = 0x1000, .skew = 2_words}),
+               underalign_error(0x1000, 2_words));
+
+  // Skewed by 8 words is OK again.
+  layout = reserve_exe({.size = 1_words, .align = 16_words, .skew = 8_words});
+  EXPECT_EQ(base16, layout.offset_thread_pointer());
+  EXPECT_EQ(base16 + 8_words, layout.offset_exe());
+  EXPECT_EQ(base16 + 9_words, layout.size());
+  EXPECT_EQ(16_words, layout.align_);
+
+  // Skewed by 9 words is also OK. (The amount of skew doesn't need to be a
+  // multiple of anything.)
+  layout = reserve_exe({.size = 1_words, .align = 16_words, .skew = 9_words});
+  EXPECT_EQ(base16, layout.offset_thread_pointer());
+  EXPECT_EQ(base16 + 9_words, layout.offset_exe());
+  EXPECT_EQ(base16 + 10_words, layout.size());
+  EXPECT_EQ(16_words, layout.align_);
+
+  // Skew with large alignment.
+  layout = reserve_exe({.size = 1_words, .align = 256_words, .skew = 8_words});
+  EXPECT_EQ(256_words, layout.offset_thread_pointer());
+  EXPECT_EQ(264_words, layout.offset_exe());
+  EXPECT_EQ(265_words, layout.size());
+  EXPECT_EQ(256_words, layout.align_);
+}