Merge "Remove the i386-specific use of ipc(2)." into main
diff --git a/android-changes-for-ndk-developers.md b/android-changes-for-ndk-developers.md
index 8d507d1..e9cfbac 100644
--- a/android-changes-for-ndk-developers.md
+++ b/android-changes-for-ndk-developers.md
@@ -445,6 +445,18 @@
 | No `dlclose`      | Works                      | Works   | Works |
 
 
+## ELF TLS (Available for API level >= 29)
+
+Android supports [ELF TLS](docs/elf-tls.md) starting at API level 29. Since
+NDK r26, clang will automatically enable ELF TLS for `minSdkVersion 29` or
+higher. Otherwise, the existing emutls implementation (which uses
+`pthread_key_create()` behind the scenes) will continue to be used. This
+means that convenient C/C++ thread-local syntax is available at any API level;
+at worst it will perform similarly to "roll your own" thread locals using
+`pthread_key_create()` but at best you'll get the performance benefit of
+ELF TLS, and the NDK will take care of the details.
+
+
 ## Use of IFUNC in libc (True for all API levels on devices running Android 10)
 
 On devices running API level 29, libc uses
diff --git a/benchmarks/bionic_benchmarks.cpp b/benchmarks/bionic_benchmarks.cpp
index 81f1842..b88c6e5 100644
--- a/benchmarks/bionic_benchmarks.cpp
+++ b/benchmarks/bionic_benchmarks.cpp
@@ -372,7 +372,7 @@
 
 void RegisterGoogleBenchmarks(bench_opts_t primary_opts, bench_opts_t secondary_opts,
                               const std::string& fn_name, args_vector_t* run_args) {
-  if (g_str_to_func.find(fn_name) == g_str_to_func.end()) {
+  if (!g_str_to_func.contains(fn_name)) {
     errx(1, "ERROR: No benchmark for function %s", fn_name.c_str());
   }
   long iterations_to_use = primary_opts.num_iterations ? primary_opts.num_iterations :
diff --git a/docs/elf-tls.md b/docs/elf-tls.md
index d408b3f..450f362 100644
--- a/docs/elf-tls.md
+++ b/docs/elf-tls.md
@@ -1,9 +1,10 @@
-# Android ELF TLS (Draft)
+# Android ELF TLS
 
-Internal links:
- * [go/android-elf-tls](http://go/android-elf-tls)
- * [One-pager](https://docs.google.com/document/d/1leyPTnwSs24P2LGiqnU6HetnN5YnDlZkihigi6qdf_M)
- * Tracking bugs: http://b/110100012, http://b/78026329
+App developers probably just want to read the
+[quick ELS TLS status summary](../android-changes-for-ndk-developers.md#elf-tls-available-for-api-level-29)
+instead.
+
+This document covers the detailed design and implementation choices.
 
 [TOC]
 
@@ -215,7 +216,7 @@
  * https://bugzilla.redhat.com/show_bug.cgi?id=1124987
  * web search: [`"dlopen: cannot load any more object with static TLS"`][glibc-static-tls-error]
 
-Neither musl nor the Bionic TLS prototype currently allocate any surplus TLS memory.
+Neither bionic nor musl currently allocate any surplus TLS memory.
 
 In general, supporting surplus TLS memory probably requires maintaining a thread list so that
 `dlopen` can initialize the new static TLS memory in all existing threads. A thread list could be
@@ -489,19 +490,6 @@
 [quietly ignored]: https://android.googlesource.com/platform/bionic/+/android-8.1.0_r48/linker/linker.cpp#2784
 [added compatibility checks]: https://android-review.googlesource.com/c/platform/bionic/+/648760
 
-# Bionic Prototype Notes
-
-There is an [ELF TLS prototype] uploaded on Gerrit. It implements:
- * Static TLS Block allocation for static and dynamic executables
- * TLS for dynamically loaded and unloaded modules (`__tls_get_addr`)
- * TLSDESC for arm64 only
-
-Missing:
- * `dlsym` of a TLS variable
- * debugger support
-
-[ELF TLS prototype]: https://android-review.googlesource.com/q/topic:%22elf-tls-prototype%22+(status:open%20OR%20status:merged)
-
 ## Loader/libc Communication
 
 The loader exposes a list of TLS modules ([`struct TlsModules`][TlsModules]) to `libc.so` using the
@@ -515,13 +503,14 @@
 
 ## TLS Allocator
 
-The prototype currently allocates a `pthread_internal_t` object and static TLS in a single mmap'ed
+bionic currently allocates a `pthread_internal_t` object and static TLS in a single mmap'ed
 region, along with a thread's stack if it needs one allocated. It doesn't place TLS memory on a
 preallocated stack (either the main thread's stack or one provided with `pthread_attr_setstack`).
 
 The DTV and blocks for dlopen'ed modules are instead allocated using the Bionic loader's
-`LinkerMemoryAllocator`, adapted to avoid the STL and to provide `memalign`. The prototype tries to
-achieve async-signal safety by blocking signals and acquiring a lock.
+`LinkerMemoryAllocator`, adapted to avoid the STL and to provide `memalign`.
+The implementation tries to achieve async-signal safety by blocking signals and
+acquiring a lock.
 
 There are three "entry points" to dynamically locate a TLS variable's address:
  * libc.so: `__tls_get_addr`
@@ -529,10 +518,10 @@
  * loader: dlsym
 
 The loader's entry points need to call `__tls_get_addr`, which needs to allocate memory. Currently,
-the prototype uses a [special function pointer] to call libc.so's `__tls_get_addr` from the loader.
+the implementation uses a [special function pointer] to call libc.so's `__tls_get_addr` from the loader.
 (This should probably be removed.)
 
-The prototype currently allows for arbitrarily-large TLS variable alignment. IIRC, different
+The implementation currently allows for arbitrarily-large TLS variable alignment. IIRC, different
 implementations (glibc, musl, FreeBSD) vary in their level of respect for TLS alignment. It looks
 like the Bionic loader ignores segments' alignment and aligns loaded libraries to 256 KiB. See
 `ReserveAligned`.
@@ -541,7 +530,7 @@
 
 ## Async-Signal Safety
 
-The prototype's `__tls_get_addr` might be async-signal safe. Making it AS-safe is a good idea if
+The implementation's `__tls_get_addr` might be async-signal safe. Making it AS-safe is a good idea if
 it's feasible. musl's function is AS-safe, but glibc's isn't (or wasn't). Google had a patch to make
 glibc AS-safe back in 2012-2013. See:
  * https://sourceware.org/glibc/wiki/TLSandSignals
@@ -550,7 +539,7 @@
 
 ## Out-of-Memory Handling (abort)
 
-The prototype lazily allocates TLS memory for dlopen'ed modules (see `__tls_get_addr`), and an
+The implementation lazily allocates TLS memory for dlopen'ed modules (see `__tls_get_addr`), and an
 out-of-memory error on a TLS access aborts the process. musl, on the other hand, preallocates TLS
 memory on `pthread_create` and `dlopen`, so either function can return out-of-memory. Both functions
 probably need to acquire the same lock.
@@ -572,7 +561,7 @@
 
 FWIW: emutls also aborts on out-of-memory.
 
-## ELF TLS Not Usable in libc
+## ELF TLS Not Usable in libc Itself
 
 The dynamic loader currently can't use ELF TLS, so any part of libc linked into the loader (i.e.
 most of it) also can't use ELF TLS. It might be possible to lift this restriction, perhaps with
@@ -649,7 +638,7 @@
 It seems easy to fix the incompatibility for variant 2 (x86 and x86_64) by splitting out the Bionic
 slots into a new data structure. Variant 1 is a harder problem.
 
-The TLS prototype currently uses a patched LLD that uses a variant 1 TLS layout with a 16-word TCB
+The TLS prototype used a patched LLD that uses a variant 1 TLS layout with a 16-word TCB
 on all architectures.
 
 Aside: gcc's arm64ilp32 target uses a 32-bit unsigned offset for a TLS IE access
@@ -821,8 +810,8 @@
 
 ### Workaround for Go: place pthread keys after the executable's TLS
 
-Most Android executables do not use any `thread_local` variables. In the current prototype, with the
-AOSP hikey960 build, only `/system/bin/netd` has a TLS segment, and it's only 32 bytes. As long as
+Most Android executables do not use any `thread_local` variables. In the prototype, with the
+AOSP hikey960 build, only `/system/bin/netd` had a TLS segment, and it was only 32 bytes. As long as
 `/system/bin/app_process{32,64}` limits its use of TLS memory, then the pthread keys could be
 allocated after `app_process`' TLS segment, and Go will still find them.
 
@@ -847,6 +836,12 @@
  * It looks like glibc's ld.so re-relocates itself after loading a program, so a program's symbols
    can interpose call in the loader: https://sourceware.org/ml/libc-alpha/2014-01/msg00501.html
 
+## TODO: Other
+
+Missing:
+ * `dlsym` of a TLS variable
+ * debugger support
+
 # References
 
 General (and x86/x86-64)
diff --git a/docs/fdsan.md b/docs/fdsan.md
index f5d1ab5..5aeb7de 100644
--- a/docs/fdsan.md
+++ b/docs/fdsan.md
@@ -62,7 +62,9 @@
  - fatal (`ANDROID_FDSAN_ERROR_LEVEL_FATAL`)
    - Abort upon detecting an error.
 
-In Android Q, fdsan has a global default of warn-once. fdsan can be made more or less strict at runtime via the `android_fdsan_set_error_level` function in [`<android/fdsan.h>`](https://android.googlesource.com/platform/bionic/+/main/libc/include/android/fdsan.h).
+In API level 29, fdsan had a global default of warn-once.
+In API level 30 and higher, fdsan has a global default of fatal.
+fdsan can be made more or less strict at runtime via the `android_fdsan_set_error_level` function in [`<android/fdsan.h>`](https://android.googlesource.com/platform/bionic/+/main/libc/include/android/fdsan.h).
 
 The likelihood of fdsan catching a file descriptor error is proportional to the percentage of file descriptors in your process that are tagged with an owner.
 
@@ -344,7 +346,8 @@
 
     // These functions are marked with __attribute__((weak)), so that their
     // availability can be determined at runtime. These wrappers will use them
-    // if available, and fall back to no-ops or regular close on pre-Q devices.
+    // if available, and fall back to no-ops or regular close on devices older
+    // than API level 29.
     static void exchange_tag(int fd, uint64_t old_tag, uint64_t new_tag) {
         if (android_fdsan_exchange_owner_tag) {
             android_fdsan_exchange_owner_tag(fd, old_tag, new_tag);
diff --git a/docs/status.md b/docs/status.md
index bc8ab6a..e0364a8 100644
--- a/docs/status.md
+++ b/docs/status.md
@@ -125,6 +125,7 @@
   * `getloadavg` (BSD/GNU extension in <stdlib.h>)
 
 New libc behavior in Q (API level 29):
+  * Support for [ELF TLS](elf-tls.md).
   * Whole printf family now supports the GNU `%m` extension, rather than a
     special-case hack in `syslog`.
   * `popen` now always uses `O_CLOEXEC`, not just with the `e` extension.
diff --git a/libc/Android.bp b/libc/Android.bp
index 40215a7..5063364 100644
--- a/libc/Android.bp
+++ b/libc/Android.bp
@@ -120,15 +120,15 @@
 // ========================================================
 cc_defaults {
     name: "bug_24465209_workaround",
-    arch: {
-        arm: {
+    target: {
+        android_arm: {
             pack_relocations: false,
             ldflags: ["-Wl,--hash-style=both"],
         },
-        x86: {
+        android_x86: {
             pack_relocations: false,
             ldflags: ["-Wl,--hash-style=both"],
-        }
+        },
     },
 }
 
@@ -1585,14 +1585,6 @@
     // Do not pack libc.so relocations; see http://b/20645321 for details.
     pack_relocations: false,
 
-    // WARNING: The only libraries libc.so should depend on are libdl.so and ld-android.so!
-    // If you add other libraries, make sure to add -Wl,--exclude-libs=libgcc.a to the
-    // LOCAL_LDFLAGS for those libraries.  This ensures that symbols that are pulled into
-    // those new libraries from libgcc.a are not declared external; if that were the case,
-    // then libc would not pull those symbols from libgcc.a as it should, instead relying
-    // on the external symbols from the dependent libraries.  That would create a "cloaked"
-    // dependency on libgcc.a in libc though the libraries, which is not what you wanted!
-
     shared_libs: [
         "ld-android",
         "libdl",
diff --git a/libc/bionic/NetdClientDispatch.cpp b/libc/bionic/NetdClientDispatch.cpp
index e6f4a97..be5fb11 100644
--- a/libc/bionic/NetdClientDispatch.cpp
+++ b/libc/bionic/NetdClientDispatch.cpp
@@ -20,18 +20,12 @@
 
 #include "private/bionic_fdtrack.h"
 
-#ifdef __i386__
-#define __socketcall __attribute__((__cdecl__))
-#else
-#define __socketcall
-#endif
-
-extern "C" __socketcall int __accept4(int, sockaddr*, socklen_t*, int);
-extern "C" __socketcall int __connect(int, const sockaddr*, socklen_t);
-extern "C" __socketcall int __sendmmsg(int, const mmsghdr*, unsigned int, int);
-extern "C" __socketcall ssize_t __sendmsg(int, const msghdr*, unsigned int);
-extern "C" __socketcall int __sendto(int, const void*, size_t, int, const sockaddr*, socklen_t);
-extern "C" __socketcall int __socket(int, int, int);
+extern "C" int __accept4(int, sockaddr*, socklen_t*, int);
+extern "C" int __connect(int, const sockaddr*, socklen_t);
+extern "C" int __sendmmsg(int, const mmsghdr*, unsigned int, int);
+extern "C" ssize_t __sendmsg(int, const msghdr*, unsigned int);
+extern "C" int __sendto(int, const void*, size_t, int, const sockaddr*, socklen_t);
+extern "C" int __socket(int, int, int);
 
 static unsigned fallBackNetIdForResolv(unsigned netId) {
     return netId;
diff --git a/libc/bionic/heap_tagging.cpp b/libc/bionic/heap_tagging.cpp
index cadab3c..6741be3 100644
--- a/libc/bionic/heap_tagging.cpp
+++ b/libc/bionic/heap_tagging.cpp
@@ -53,6 +53,8 @@
   heap_tagging_level = __libc_shared_globals()->initial_heap_tagging_level;
 #endif
 
+  __libc_memtag_stack_abi = __libc_shared_globals()->initial_memtag_stack_abi;
+
   __libc_globals.mutate([](libc_globals* globals) {
     switch (heap_tagging_level) {
       case M_HEAP_TAGGING_LEVEL_TBI:
@@ -184,6 +186,9 @@
 
 #ifdef __aarch64__
 static inline __attribute__((no_sanitize("memtag"))) void untag_memory(void* from, void* to) {
+  if (from == to) {
+    return;
+  }
   __asm__ __volatile__(
       ".arch_extension mte\n"
       "1:\n"
diff --git a/libc/bionic/libc_init_common.cpp b/libc/bionic/libc_init_common.cpp
index c82c52e..939e4e1 100644
--- a/libc/bionic/libc_init_common.cpp
+++ b/libc/bionic/libc_init_common.cpp
@@ -58,6 +58,7 @@
 
 __LIBC_HIDDEN__ constinit WriteProtected<libc_globals> __libc_globals;
 __LIBC_HIDDEN__ constinit _Atomic(bool) __libc_memtag_stack;
+__LIBC_HIDDEN__ constinit bool __libc_memtag_stack_abi;
 
 // Not public, but well-known in the BSDs.
 __BIONIC_WEAK_VARIABLE_FOR_NATIVE_BRIDGE
diff --git a/libc/bionic/libc_init_dynamic.cpp b/libc/bionic/libc_init_dynamic.cpp
index 2dde2f1..541e71c 100644
--- a/libc/bionic/libc_init_dynamic.cpp
+++ b/libc/bionic/libc_init_dynamic.cpp
@@ -61,8 +61,9 @@
 };
 
 void memtag_stack_dlopen_callback() {
-  async_safe_format_log(ANDROID_LOG_DEBUG, "libc", "remapping stacks as PROT_MTE");
-  __pthread_internal_remap_stack_with_mte();
+  if (__pthread_internal_remap_stack_with_mte()) {
+    async_safe_format_log(ANDROID_LOG_DEBUG, "libc", "remapped stacks as PROT_MTE");
+  }
 }
 
 // Use an initializer so __libc_sysinfo will have a fallback implementation
diff --git a/libc/bionic/libc_init_static.cpp b/libc/bionic/libc_init_static.cpp
index 3da0a92..ac97376 100644
--- a/libc/bionic/libc_init_static.cpp
+++ b/libc/bionic/libc_init_static.cpp
@@ -289,11 +289,7 @@
 
   // We can't short-circuit the environment override, as `stack` is still inherited from the
   // binary's settings.
-  if (get_environment_memtag_setting(&level)) {
-    if (level == M_HEAP_TAGGING_LEVEL_NONE || level == M_HEAP_TAGGING_LEVEL_TBI) {
-      *stack = false;
-    }
-  }
+  get_environment_memtag_setting(&level);
   return level;
 }
 
@@ -329,13 +325,14 @@
   bool memtag_stack = false;
   HeapTaggingLevel level =
       __get_tagging_level(memtag_dynamic_entries, phdr_start, phdr_ct, load_bias, &memtag_stack);
-  // This is used by the linker (in linker.cpp) to communicate than any library linked by this
-  // executable enables memtag-stack.
-  if (__libc_shared_globals()->initial_memtag_stack) {
-    if (!memtag_stack) {
-      async_safe_format_log(ANDROID_LOG_INFO, "libc", "enabling PROT_MTE as requested by linker");
-    }
+  // initial_memtag_stack is used by the linker (in linker.cpp) to communicate than any library
+  // linked by this executable enables memtag-stack.
+  // memtag_stack is also set for static executables if they request memtag stack via the note,
+  // in which case it will differ from initial_memtag_stack.
+  if (__libc_shared_globals()->initial_memtag_stack || memtag_stack) {
     memtag_stack = true;
+    __libc_shared_globals()->initial_memtag_stack_abi = true;
+    __get_bionic_tcb()->tls_slot(TLS_SLOT_STACK_MTE) = __allocate_stack_mte_ringbuffer(0, nullptr);
   }
   if (int64_t timed_upgrade = __get_memtag_upgrade_secs()) {
     if (level == M_HEAP_TAGGING_LEVEL_ASYNC) {
diff --git a/libc/bionic/locale.cpp b/libc/bionic/locale.cpp
index 2f4d206..a1d6909 100644
--- a/libc/bionic/locale.cpp
+++ b/libc/bionic/locale.cpp
@@ -35,17 +35,8 @@
 #include <time.h>
 #include <wchar.h>
 
-#include "platform/bionic/macros.h"
-
-#if defined(__BIONIC_BUILD_FOR_ANDROID_SUPPORT)
-#define USE_TLS_SLOT 0
-#else
-#define USE_TLS_SLOT 1
-#endif
-
-#if USE_TLS_SLOT
 #include "bionic/pthread_internal.h"
-#endif
+#include "platform/bionic/macros.h"
 
 // We only support two locales, the "C" locale (also known as "POSIX"),
 // and the "C.UTF-8" locale (also known as "en_US.UTF-8").
@@ -82,10 +73,6 @@
   return get_locale_mb_cur_max(uselocale(nullptr));
 }
 
-#if !USE_TLS_SLOT
-static thread_local locale_t g_current_locale;
-#endif
-
 static pthread_once_t g_locale_once = PTHREAD_ONCE_INIT;
 static lconv g_locale;
 
@@ -180,11 +167,7 @@
 }
 
 static locale_t* get_current_locale_ptr() {
-#if USE_TLS_SLOT
   return &__get_bionic_tls().locale;
-#else
-  return &g_current_locale;
-#endif
 }
 
 locale_t uselocale(locale_t new_locale) {
diff --git a/libc/bionic/pthread_create.cpp b/libc/bionic/pthread_create.cpp
index 5bd4f16..a8d09eb 100644
--- a/libc/bionic/pthread_create.cpp
+++ b/libc/bionic/pthread_create.cpp
@@ -65,6 +65,7 @@
 }
 
 void __init_bionic_tls_ptrs(bionic_tcb* tcb, bionic_tls* tls) {
+  tcb->thread()->bionic_tcb = tcb;
   tcb->thread()->bionic_tls = tls;
   tcb->tls_slot(TLS_SLOT_BIONIC_TLS) = tls;
 }
@@ -443,6 +444,14 @@
 
   ScopedReadLock locker(&g_thread_creation_lock);
 
+// This has to be done under g_thread_creation_lock or g_thread_list_lock to avoid racing with
+// __pthread_internal_remap_stack_with_mte.
+#ifdef __aarch64__
+  if (__libc_memtag_stack_abi) {
+    tcb->tls_slot(TLS_SLOT_STACK_MTE) = __allocate_stack_mte_ringbuffer(0, thread);
+  }
+#endif
+
   sigset64_t block_all_mask;
   sigfillset64(&block_all_mask);
   __rt_sigprocmask(SIG_SETMASK, &block_all_mask, &thread->start_mask, sizeof(thread->start_mask));
diff --git a/libc/bionic/pthread_internal.cpp b/libc/bionic/pthread_internal.cpp
index 2342aff..14cc7da 100644
--- a/libc/bionic/pthread_internal.cpp
+++ b/libc/bionic/pthread_internal.cpp
@@ -33,9 +33,12 @@
 #include <stdlib.h>
 #include <string.h>
 #include <sys/mman.h>
+#include <sys/prctl.h>
 
 #include <async_safe/log.h>
+#include <bionic/mte.h>
 #include <bionic/reserved_signals.h>
+#include <bionic/tls_defines.h>
 
 #include "private/ErrnoRestorer.h"
 #include "private/ScopedRWLock.h"
@@ -73,6 +76,15 @@
 }
 
 static void __pthread_internal_free(pthread_internal_t* thread) {
+#ifdef __aarch64__
+  if (void* stack_mte_tls = thread->bionic_tcb->tls_slot(TLS_SLOT_STACK_MTE)) {
+    size_t size =
+        stack_mte_ringbuffer_size_from_pointer(reinterpret_cast<uintptr_t>(stack_mte_tls));
+    void* ptr = reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(stack_mte_tls) &
+                                        ((1ULL << 56ULL) - 1ULL));
+    munmap(ptr, size);
+  }
+#endif
   if (thread->mmap_size != 0) {
     // Free mapped space, including thread stack and pthread_internal_t.
     munmap(thread->mmap_base, thread->mmap_size);
@@ -176,12 +188,40 @@
   async_safe_fatal("stack not found in /proc/self/maps");
 }
 
-void __pthread_internal_remap_stack_with_mte() {
 #if defined(__aarch64__)
-  // If process doesn't have MTE enabled, we don't need to do anything.
-  if (!atomic_load(&__libc_globals->memtag)) return;
-  bool prev = atomic_exchange(&__libc_memtag_stack, true);
-  if (prev) return;
+__LIBC_HIDDEN__ void* __allocate_stack_mte_ringbuffer(size_t n, pthread_internal_t* thread) {
+  const char* name;
+  if (thread == nullptr) {
+    name = "stack_mte_ring:main";
+  } else {
+    // The kernel doesn't copy the name string, but this variable will last at least as long as the
+    // mapped area. We unmap the ring buffer before unmapping the rest of the thread storage.
+    auto& name_buffer = thread->stack_mte_ringbuffer_vma_name_buffer;
+    static_assert(arraysize(name_buffer) >= arraysize("stack_mte_ring:") + 11 + 1);
+    async_safe_format_buffer(name_buffer, arraysize(name_buffer), "stack_mte_ring:%d", thread->tid);
+    name = name_buffer;
+  }
+  void* ret = stack_mte_ringbuffer_allocate(n, name);
+  if (!ret) async_safe_fatal("error: failed to allocate stack mte ring buffer");
+  return ret;
+}
+#endif
+
+bool __pthread_internal_remap_stack_with_mte() {
+#if defined(__aarch64__)
+  ScopedWriteLock creation_locker(&g_thread_creation_lock);
+  ScopedReadLock list_locker(&g_thread_list_lock);
+  // If process already uses memtag-stack ABI, we don't need to do anything.
+  if (__libc_memtag_stack_abi) return false;
+  __libc_memtag_stack_abi = true;
+
+  for (pthread_internal_t* t = g_thread_list; t != nullptr; t = t->next) {
+    if (t->terminating) continue;
+    t->bionic_tcb->tls_slot(TLS_SLOT_STACK_MTE) =
+        __allocate_stack_mte_ringbuffer(0, t->is_main() ? nullptr : t);
+  }
+  if (!atomic_load(&__libc_globals->memtag)) return false;
+  if (atomic_exchange(&__libc_memtag_stack, true)) return false;
   uintptr_t lo, hi;
   __find_main_stack_limits(&lo, &hi);
 
@@ -189,8 +229,6 @@
                PROT_READ | PROT_WRITE | PROT_MTE | PROT_GROWSDOWN)) {
     async_safe_fatal("error: failed to set PROT_MTE on main thread");
   }
-  ScopedWriteLock creation_locker(&g_thread_creation_lock);
-  ScopedReadLock list_locker(&g_thread_list_lock);
   for (pthread_internal_t* t = g_thread_list; t != nullptr; t = t->next) {
     if (t->terminating || t->is_main()) continue;
     if (mprotect(t->mmap_base_unguarded, t->mmap_size_unguarded,
@@ -198,7 +236,10 @@
       async_safe_fatal("error: failed to set PROT_MTE on thread: %d", t->tid);
     }
   }
-#endif
+  return true;
+#else
+  return false;
+#endif  // defined(__aarch64__)
 }
 
 bool android_run_on_all_threads(bool (*func)(void*), void* arg) {
diff --git a/libc/bionic/pthread_internal.h b/libc/bionic/pthread_internal.h
index c2abdea..5db42ab 100644
--- a/libc/bionic/pthread_internal.h
+++ b/libc/bionic/pthread_internal.h
@@ -178,6 +178,10 @@
   bionic_tls* bionic_tls;
 
   int errno_value;
+
+  bionic_tcb* bionic_tcb;
+  char stack_mte_ringbuffer_vma_name_buffer[32];
+
   bool is_main() { return start_routine == nullptr; }
 };
 
@@ -209,6 +213,9 @@
 __LIBC_HIDDEN__ void __pthread_internal_remove(pthread_internal_t* thread);
 __LIBC_HIDDEN__ void __pthread_internal_remove_and_free(pthread_internal_t* thread);
 __LIBC_HIDDEN__ void __find_main_stack_limits(uintptr_t* low, uintptr_t* high);
+#if defined(__aarch64__)
+__LIBC_HIDDEN__ void* __allocate_stack_mte_ringbuffer(size_t n, pthread_internal_t* thread);
+#endif
 
 static inline __always_inline bionic_tcb* __get_bionic_tcb() {
   return reinterpret_cast<bionic_tcb*>(&__get_tls()[MIN_TLS_SLOT]);
@@ -268,8 +275,9 @@
 __LIBC_HIDDEN__ extern void __bionic_atfork_run_child();
 __LIBC_HIDDEN__ extern void __bionic_atfork_run_parent();
 
-// Re-map all threads and successively launched threads with PROT_MTE.
-__LIBC_HIDDEN__ void __pthread_internal_remap_stack_with_mte();
+// Re-map all threads and successively launched threads with PROT_MTE. Returns 'true' if remapping
+// took place, 'false' on error or if the stacks were already remapped in the past.
+__LIBC_HIDDEN__ bool __pthread_internal_remap_stack_with_mte();
 
 extern "C" bool android_run_on_all_threads(bool (*func)(void*), void* arg);
 
diff --git a/libc/dns/include/resolv_private.h b/libc/dns/include/resolv_private.h
index 3054555..1593aca 100644
--- a/libc/dns/include/resolv_private.h
+++ b/libc/dns/include/resolv_private.h
@@ -504,15 +504,7 @@
 // ...but NetBSD calls it res_randomid.
 #define res_randomid __res_randomid
 
-#ifdef __i386__
-# define __socketcall extern __attribute__((__cdecl__))
-#else
-# define __socketcall extern
-#endif
-
-__socketcall int __connect(int, const struct sockaddr*, socklen_t);
-
-#undef __socketcall
+int __connect(int, const struct sockaddr*, socklen_t);
 
 // Symbols that are supposed to be in resolv.h, but that we aren't exporting.
 int ns_parserr2(ns_msg*, ns_sect, int, ns_rr2*);
diff --git a/libc/include/math.h b/libc/include/math.h
index fc6c228..343ab98 100644
--- a/libc/include/math.h
+++ b/libc/include/math.h
@@ -68,10 +68,7 @@
 
 #define isnormal(x) __builtin_isnormal(x)
 
-#define signbit(x) \
-    ((sizeof(x) == sizeof(float)) ? __builtin_signbitf(x) \
-    : (sizeof(x) == sizeof(double)) ? __builtin_signbit(x) \
-    : __builtin_signbitl(x))
+#define signbit(x) __builtin_signbit(x)
 
 double acos(double __x);
 float acosf(float __x);
@@ -308,20 +305,6 @@
 #define islessgreater(x, y) __builtin_islessgreater((x), (y))
 #define isunordered(x, y) __builtin_isunordered((x), (y))
 
-/*
- * https://code.google.com/p/android/issues/detail?id=271629
- * To be fully compliant with C++, we need to not define these (C doesn't
- * specify them either). Exposing these means that isinf and isnan will have a
- * return type of int in C++ rather than bool like they're supposed to be.
- *
- * GNU libstdc++ 4.9 isn't able to handle a standard compliant C library. Its
- * <cmath> will `#undef isnan` from math.h and only adds the function overloads
- * to the std namespace, making it impossible to use both <cmath> (which gets
- * included by a lot of other standard headers) and ::isnan.
- */
-int (isinf)(double __x) __attribute_const__;
-int (isnan)(double __x) __attribute_const__;
-
 /* POSIX extensions. */
 
 extern int signgam;
@@ -362,6 +345,7 @@
 double scalb(double __x, double __exponent);
 double drem(double __x, double __y);
 int finite(double __x) __attribute_const__;
+int isinff(float __x) __attribute_const__;
 int isnanf(float __x) __attribute_const__;
 double gamma_r(double __x, int* _Nonnull __sign);
 double lgamma_r(double __x, int* _Nonnull __sign);
@@ -402,6 +386,8 @@
 #define M_2_SQRTPIl     1.128379167095512573896158903121545172L /* 2/sqrt(pi) */
 #define M_SQRT2l        1.414213562373095048801688724209698079L /* sqrt(2) */
 #define M_SQRT1_2l      0.707106781186547524400844362104849039L /* 1/sqrt(2) */
+int isinfl(long double __x) __attribute_const__;
+int isnanl(long double __x) __attribute_const__;
 #endif
 
 __END_DECLS
diff --git a/libc/include/sys/socket.h b/libc/include/sys/socket.h
index 9402e70..47ddce0 100644
--- a/libc/include/sys/socket.h
+++ b/libc/include/sys/socket.h
@@ -277,41 +277,33 @@
 
 #define IPX_TYPE 1
 
-#ifdef __i386__
-# define __socketcall extern __attribute__((__cdecl__))
-#else
-# define __socketcall extern
-#endif
-
-__socketcall int accept(int __fd, struct sockaddr* _Nullable __addr, socklen_t* _Nullable __addr_length);
-__socketcall int accept4(int __fd, struct sockaddr* _Nullable __addr, socklen_t* _Nullable __addr_length, int __flags);
-__socketcall int bind(int __fd, const struct sockaddr* _Nonnull __addr, socklen_t __addr_length);
-__socketcall int connect(int __fd, const struct sockaddr* _Nonnull __addr, socklen_t __addr_length);
-__socketcall int getpeername(int __fd, struct sockaddr* _Nonnull __addr, socklen_t* _Nonnull __addr_length);
-__socketcall int getsockname(int __fd, struct sockaddr* _Nonnull __addr, socklen_t* _Nonnull __addr_length);
-__socketcall int getsockopt(int __fd, int __level, int __option, void* _Nullable __value, socklen_t* _Nonnull __value_length);
-__socketcall int listen(int __fd, int __backlog);
-__socketcall int recvmmsg(int __fd, struct mmsghdr* _Nonnull __msgs, unsigned int __msg_count, int __flags, const struct timespec* _Nullable __timeout);
-__socketcall ssize_t recvmsg(int __fd, struct msghdr* _Nonnull __msg, int __flags);
-__socketcall int sendmmsg(int __fd, const struct mmsghdr* _Nonnull __msgs, unsigned int __msg_count, int __flags);
-__socketcall ssize_t sendmsg(int __fd, const struct msghdr* _Nonnull __msg, int __flags);
-__socketcall int setsockopt(int __fd, int __level, int __option, const void* _Nullable __value, socklen_t __value_length);
-__socketcall int shutdown(int __fd, int __how);
-__socketcall int socket(int __af, int __type, int __protocol);
-__socketcall int socketpair(int __af, int __type, int __protocol, int __fds[_Nonnull 2]);
+int accept(int __fd, struct sockaddr* _Nullable __addr, socklen_t* _Nullable __addr_length);
+int accept4(int __fd, struct sockaddr* _Nullable __addr, socklen_t* _Nullable __addr_length, int __flags);
+int bind(int __fd, const struct sockaddr* _Nonnull __addr, socklen_t __addr_length);
+int connect(int __fd, const struct sockaddr* _Nonnull __addr, socklen_t __addr_length);
+int getpeername(int __fd, struct sockaddr* _Nonnull __addr, socklen_t* _Nonnull __addr_length);
+int getsockname(int __fd, struct sockaddr* _Nonnull __addr, socklen_t* _Nonnull __addr_length);
+int getsockopt(int __fd, int __level, int __option, void* _Nullable __value, socklen_t* _Nonnull __value_length);
+int listen(int __fd, int __backlog);
+int recvmmsg(int __fd, struct mmsghdr* _Nonnull __msgs, unsigned int __msg_count, int __flags, const struct timespec* _Nullable __timeout);
+ssize_t recvmsg(int __fd, struct msghdr* _Nonnull __msg, int __flags);
+int sendmmsg(int __fd, const struct mmsghdr* _Nonnull __msgs, unsigned int __msg_count, int __flags);
+ssize_t sendmsg(int __fd, const struct msghdr* _Nonnull __msg, int __flags);
+int setsockopt(int __fd, int __level, int __option, const void* _Nullable __value, socklen_t __value_length);
+int shutdown(int __fd, int __how);
+int socket(int __af, int __type, int __protocol);
+int socketpair(int __af, int __type, int __protocol, int __fds[_Nonnull 2]);
 
 ssize_t recv(int __fd, void* _Nullable __buf, size_t __n, int __flags);
 ssize_t send(int __fd, const void* _Nonnull __buf, size_t __n, int __flags);
 
-__socketcall ssize_t sendto(int __fd, const void* _Nonnull __buf, size_t __n, int __flags, const struct sockaddr* _Nullable __dst_addr, socklen_t __dst_addr_length);
-__socketcall ssize_t recvfrom(int __fd, void* _Nullable __buf, size_t __n, int __flags, struct sockaddr* _Nullable __src_addr, socklen_t* _Nullable __src_addr_length);
+ssize_t sendto(int __fd, const void* _Nonnull __buf, size_t __n, int __flags, const struct sockaddr* _Nullable __dst_addr, socklen_t __dst_addr_length);
+ssize_t recvfrom(int __fd, void* _Nullable __buf, size_t __n, int __flags, struct sockaddr* _Nullable __src_addr, socklen_t* _Nullable __src_addr_length);
 
 #if defined(__BIONIC_INCLUDE_FORTIFY_HEADERS)
 #include <bits/fortify/socket.h>
 #endif
 
-#undef __socketcall
-
 __END_DECLS
 
 #endif
diff --git a/libc/malloc_debug/Config.cpp b/libc/malloc_debug/Config.cpp
index 89a7ce7..0d442b4 100644
--- a/libc/malloc_debug/Config.cpp
+++ b/libc/malloc_debug/Config.cpp
@@ -187,6 +187,10 @@
         "record_allocs_file",
         {0, &Config::SetRecordAllocsFile},
     },
+    {
+        "record_allocs_on_exit",
+        {0, &Config::SetRecordAllocsOnExit},
+    },
 
     {
         "verify_pointers",
@@ -401,6 +405,14 @@
   return true;
 }
 
+bool Config::SetRecordAllocsOnExit(const std::string& option, const std::string& value) {
+  if (Config::VerifyValueEmpty(option, value)) {
+    record_allocs_on_exit_ = true;
+    return true;
+  }
+  return false;
+}
+
 bool Config::VerifyValueEmpty(const std::string& option, const std::string& value) {
   if (!value.empty()) {
     // This is not valid.
diff --git a/libc/malloc_debug/Config.h b/libc/malloc_debug/Config.h
index 754970f..8551712 100644
--- a/libc/malloc_debug/Config.h
+++ b/libc/malloc_debug/Config.h
@@ -98,6 +98,7 @@
   int record_allocs_signal() const { return record_allocs_signal_; }
   size_t record_allocs_num_entries() const { return record_allocs_num_entries_; }
   const std::string& record_allocs_file() const { return record_allocs_file_; }
+  bool record_allocs_on_exit() const { return record_allocs_on_exit_; }
 
   int check_unreachable_signal() const { return check_unreachable_signal_; }
 
@@ -139,6 +140,7 @@
 
   bool SetRecordAllocs(const std::string& option, const std::string& value);
   bool SetRecordAllocsFile(const std::string& option, const std::string& value);
+  bool SetRecordAllocsOnExit(const std::string& option, const std::string& value);
 
   bool VerifyValueEmpty(const std::string& option, const std::string& value);
 
@@ -170,6 +172,7 @@
   int record_allocs_signal_ = 0;
   size_t record_allocs_num_entries_ = 0;
   std::string record_allocs_file_;
+  bool record_allocs_on_exit_ = false;
 
   uint64_t options_ = 0;
   uint8_t fill_alloc_value_;
diff --git a/libc/malloc_debug/README.md b/libc/malloc_debug/README.md
index 4e39bed..750a469 100644
--- a/libc/malloc_debug/README.md
+++ b/libc/malloc_debug/README.md
@@ -456,6 +456,19 @@
 
 **NOTE**: This option is not available until the O release of Android.
 
+### record\_allocs\_on\_exit
+This option only has meaning if record\_allocs is set. It indicates that
+when the process terminates, the record file should be created
+automatically.
+
+The only caveat to this option is that when the process terminates,
+the file that will contain the records will be the normal file name
+with **.PID** appended. Where PID is the pid of the process that has
+terminated. This is to avoid cases where a number of processes exit
+at the same time and attempt to write to the same file.
+
+**NOTE**: This option is not available until the V release of Android.
+
 ### verify\_pointers
 Track all live allocations to determine if a pointer is used that does not
 exist. This option is a lightweight way to verify that all
diff --git a/libc/malloc_debug/RecordData.cpp b/libc/malloc_debug/RecordData.cpp
index 8a77170..79e051b 100644
--- a/libc/malloc_debug/RecordData.cpp
+++ b/libc/malloc_debug/RecordData.cpp
@@ -131,17 +131,30 @@
   record_obj_->WriteEntries();
 }
 
+void RecordData::WriteEntriesOnExit() {
+  if (record_obj_ == nullptr) return;
+
+  // Append the current pid to the file name to avoid multiple processes
+  // writing to the same file.
+  std::string file(record_obj_->file());
+  file += "." + std::to_string(getpid());
+  record_obj_->WriteEntries(file);
+}
+
 void RecordData::WriteEntries() {
+  WriteEntries(file_);
+}
+
+void RecordData::WriteEntries(const std::string& file) {
   std::lock_guard<std::mutex> entries_lock(entries_lock_);
   if (cur_index_ == 0) {
     info_log("No alloc entries to write.");
     return;
   }
 
-  int dump_fd =
-      open(dump_file_.c_str(), O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NOFOLLOW, 0755);
+  int dump_fd = open(file.c_str(), O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NOFOLLOW, 0755);
   if (dump_fd == -1) {
-    error_log("Cannot create record alloc file %s: %s", dump_file_.c_str(), strerror(errno));
+    error_log("Cannot create record alloc file %s: %s", file.c_str(), strerror(errno));
     return;
   }
 
@@ -179,7 +192,7 @@
 
   entries_.resize(config.record_allocs_num_entries());
   cur_index_ = 0U;
-  dump_file_ = config.record_allocs_file();
+  file_ = config.record_allocs_file();
 
   return true;
 }
diff --git a/libc/malloc_debug/RecordData.h b/libc/malloc_debug/RecordData.h
index a02c956..7efa1f7 100644
--- a/libc/malloc_debug/RecordData.h
+++ b/libc/malloc_debug/RecordData.h
@@ -162,19 +162,23 @@
   void AddEntry(const RecordEntry* entry);
   void AddEntryOnly(const RecordEntry* entry);
 
+  const std::string& file() { return file_; }
   pthread_key_t key() { return key_; }
 
+  static void WriteEntriesOnExit();
+
  private:
   static void WriteData(int, siginfo_t*, void*);
   static RecordData* record_obj_;
 
   void WriteEntries();
+  void WriteEntries(const std::string& file);
 
   std::mutex entries_lock_;
   pthread_key_t key_;
   std::vector<std::unique_ptr<const RecordEntry>> entries_;
   size_t cur_index_;
-  std::string dump_file_;
+  std::string file_;
 
   BIONIC_DISALLOW_COPY_AND_ASSIGN(RecordData);
 };
diff --git a/libc/malloc_debug/malloc_debug.cpp b/libc/malloc_debug/malloc_debug.cpp
index b66b8e2..6d88092 100644
--- a/libc/malloc_debug/malloc_debug.cpp
+++ b/libc/malloc_debug/malloc_debug.cpp
@@ -451,6 +451,10 @@
     PointerData::LogLeaks();
   }
 
+  if ((g_debug->config().options() & RECORD_ALLOCS) && g_debug->config().record_allocs_on_exit()) {
+    RecordData::WriteEntriesOnExit();
+  }
+
   if ((g_debug->config().options() & BACKTRACE) && g_debug->config().backtrace_dump_on_exit()) {
     debug_dump_heap(android::base::StringPrintf("%s.%d.exit.txt",
                                                 g_debug->config().backtrace_dump_prefix().c_str(),
diff --git a/libc/malloc_debug/tests/malloc_debug_config_tests.cpp b/libc/malloc_debug/tests/malloc_debug_config_tests.cpp
index 84c9145..c79d052 100644
--- a/libc/malloc_debug/tests/malloc_debug_config_tests.cpp
+++ b/libc/malloc_debug/tests/malloc_debug_config_tests.cpp
@@ -571,6 +571,25 @@
   ASSERT_STREQ("", getFakeLogPrint().c_str());
 }
 
+TEST_F(MallocDebugConfigTest, record_allocs_on_exit) {
+  ASSERT_TRUE(InitConfig("record_allocs_on_exit")) << getFakeLogPrint();
+  ASSERT_EQ(0U, config->options());
+  ASSERT_TRUE(config->record_allocs_on_exit());
+
+  ASSERT_STREQ("", getFakeLogBuf().c_str());
+  ASSERT_STREQ("", getFakeLogPrint().c_str());
+}
+
+TEST_F(MallocDebugConfigTest, record_allocs_on_exit_error) {
+  ASSERT_FALSE(InitConfig("record_allocs_on_exit=something")) << getFakeLogPrint();
+
+  ASSERT_STREQ("", getFakeLogBuf().c_str());
+  std::string log_msg(
+      "6 malloc_debug malloc_testing: value set for option 'record_allocs_on_exit' "
+      "which does not take a value\n");
+  ASSERT_STREQ((log_msg + usage_string).c_str(), getFakeLogPrint().c_str());
+}
+
 TEST_F(MallocDebugConfigTest, guard_min_error) {
   ASSERT_FALSE(InitConfig("guard=0"));
 
diff --git a/libc/malloc_debug/tests/malloc_debug_unit_tests.cpp b/libc/malloc_debug/tests/malloc_debug_unit_tests.cpp
index 334dada..ef8d235 100644
--- a/libc/malloc_debug/tests/malloc_debug_unit_tests.cpp
+++ b/libc/malloc_debug/tests/malloc_debug_unit_tests.cpp
@@ -185,6 +185,7 @@
 }
 
 static void VerifyRecords(std::vector<std::string>& expected, std::string& actual) {
+  ASSERT_TRUE(expected.size() != 0);
   size_t offset = 0;
   for (std::string& str : expected) {
     ASSERT_STREQ(str.c_str(), actual.substr(offset, str.size()).c_str());
@@ -1512,7 +1513,7 @@
 
     // Call the exit function manually.
     debug_finalize();
-    exit(0);
+    _exit(0);
   }
   ASSERT_NE(-1, pid);
   ASSERT_EQ(pid, TEMP_FAILURE_RETRY(waitpid(pid, nullptr, 0)));
@@ -1561,7 +1562,7 @@
 
     // Call the exit function manually.
     debug_finalize();
-    exit(0);
+    _exit(0);
   }
   ASSERT_NE(-1, pid);
   ASSERT_EQ(pid, TEMP_FAILURE_RETRY(waitpid(pid, nullptr, 0)));
@@ -1619,7 +1620,7 @@
 
     // Call the exit function manually.
     debug_finalize();
-    exit(0);
+    _exit(0);
   }
   ASSERT_NE(-1, pid);
   ASSERT_EQ(pid, TEMP_FAILURE_RETRY(waitpid(pid, nullptr, 0)));
@@ -2429,6 +2430,33 @@
   ASSERT_STREQ("", getFakeLogPrint().c_str());
 }
 
+TEST_F(MallocDebugTest, record_allocs_on_exit) {
+  InitRecordAllocs("record_allocs record_allocs_on_exit");
+
+  // The filename created on exit always appends the pid.
+  // Modify the variable so the file is deleted at the end of the test.
+  record_filename += '.' + std::to_string(getpid());
+
+  std::vector<std::string> expected;
+  void* ptr = debug_malloc(100);
+  expected.push_back(android::base::StringPrintf("%d: malloc %p 100", getpid(), ptr));
+  ptr = debug_malloc(200);
+  expected.push_back(android::base::StringPrintf("%d: malloc %p 200", getpid(), ptr));
+  ptr = debug_malloc(400);
+  expected.push_back(android::base::StringPrintf("%d: malloc %p 400", getpid(), ptr));
+
+  // Call the exit function manually.
+  debug_finalize();
+
+  // Read all of the contents.
+  std::string actual;
+  ASSERT_TRUE(android::base::ReadFileToString(record_filename, &actual));
+  VerifyRecords(expected, actual);
+
+  ASSERT_STREQ("", getFakeLogBuf().c_str());
+  ASSERT_STREQ("", getFakeLogPrint().c_str());
+}
+
 TEST_F(MallocDebugTest, verify_pointers) {
   Init("verify_pointers");
 
diff --git a/libc/platform/bionic/mte.h b/libc/platform/bionic/mte.h
index 73cd821..98b3d27 100644
--- a/libc/platform/bionic/mte.h
+++ b/libc/platform/bionic/mte.h
@@ -29,8 +29,11 @@
 #pragma once
 
 #include <sys/auxv.h>
+#include <sys/mman.h>
 #include <sys/prctl.h>
 
+#include "page.h"
+
 // Note: Most PR_MTE_* constants come from the upstream kernel. This tag mask
 // allows for the hardware to provision any nonzero tag. Zero tags are reserved
 // for scudo to use for the chunk headers in order to prevent linear heap
@@ -63,6 +66,65 @@
     }
   }
 };
+
+// N.B. that this is NOT the pagesize, but 4096. This is hardcoded in the codegen.
+// See
+// https://github.com/search?q=repo%3Allvm/llvm-project%20AArch64StackTagging%3A%3AinsertBaseTaggedPointer&type=code
+constexpr size_t kStackMteRingbufferSizeMultiplier = 4096;
+
+inline size_t stack_mte_ringbuffer_size(uintptr_t size_cls) {
+  return kStackMteRingbufferSizeMultiplier * (1 << size_cls);
+}
+
+inline size_t stack_mte_ringbuffer_size_from_pointer(uintptr_t ptr) {
+  // The size in the top byte is not the size_cls, but the number of "pages" (not OS pages, but
+  // kStackMteRingbufferSizeMultiplier).
+  return kStackMteRingbufferSizeMultiplier * (ptr >> 56ULL);
+}
+
+inline uintptr_t stack_mte_ringbuffer_size_add_to_pointer(uintptr_t ptr, uintptr_t size_cls) {
+  return ptr | ((1ULL << size_cls) << 56ULL);
+}
+
+inline void* stack_mte_ringbuffer_allocate(size_t n, const char* name) {
+  if (n > 7) return nullptr;
+  // Allocation needs to be aligned to 2*size to make the fancy code-gen work.
+  // So we allocate 3*size - pagesz bytes, which will always contain size bytes
+  // aligned to 2*size, and unmap the unneeded part.
+  // See
+  // https://github.com/search?q=repo%3Allvm/llvm-project%20AArch64StackTagging%3A%3AinsertBaseTaggedPointer&type=code
+  //
+  // In the worst case, we get an allocation that is one page past the properly
+  // aligned address, in which case we have to unmap the previous
+  // 2*size - pagesz bytes. In that case, we still have size properly aligned
+  // bytes left.
+  size_t size = stack_mte_ringbuffer_size(n);
+  size_t pgsize = page_size();
+
+  size_t alloc_size = __BIONIC_ALIGN(3 * size - pgsize, pgsize);
+  void* allocation_ptr =
+      mmap(nullptr, alloc_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+  if (allocation_ptr == MAP_FAILED)
+    return nullptr;
+  uintptr_t allocation = reinterpret_cast<uintptr_t>(allocation_ptr);
+
+  size_t alignment = 2 * size;
+  uintptr_t aligned_allocation = __BIONIC_ALIGN(allocation, alignment);
+  if (allocation != aligned_allocation) {
+    munmap(reinterpret_cast<void*>(allocation), aligned_allocation - allocation);
+  }
+  if (aligned_allocation + size != allocation + alloc_size) {
+    munmap(reinterpret_cast<void*>(aligned_allocation + size),
+           (allocation + alloc_size) - (aligned_allocation + size));
+  }
+
+  if (name) {
+    prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, reinterpret_cast<void*>(aligned_allocation), size, name);
+  }
+
+  // We store the size in the top byte of the pointer (which is ignored)
+  return reinterpret_cast<void*>(stack_mte_ringbuffer_size_add_to_pointer(aligned_allocation, n));
+}
 #else
 struct ScopedDisableMTE {
   // Silence unused variable warnings in non-aarch64 builds.
diff --git a/libc/platform/bionic/tls_defines.h b/libc/platform/bionic/tls_defines.h
index 8fe8701..06c6617 100644
--- a/libc/platform/bionic/tls_defines.h
+++ b/libc/platform/bionic/tls_defines.h
@@ -85,7 +85,8 @@
 // [1] "Addenda to, and Errata in, the ABI for the ARM Architecture". Section 3.
 // http://infocenter.arm.com/help/topic/com.arm.doc.ihi0045e/IHI0045E_ABI_addenda.pdf
 
-#define MIN_TLS_SLOT (-2)  // update this value when reserving a slot
+#define MIN_TLS_SLOT (-3)  // update this value when reserving a slot
+#define TLS_SLOT_STACK_MTE (-3)
 #define TLS_SLOT_NATIVE_BRIDGE_GUEST_STATE (-2)
 #define TLS_SLOT_BIONIC_TLS     (-1)
 #define TLS_SLOT_DTV              0
diff --git a/libc/private/bionic_globals.h b/libc/private/bionic_globals.h
index 0949056..a1bebda 100644
--- a/libc/private/bionic_globals.h
+++ b/libc/private/bionic_globals.h
@@ -76,10 +76,23 @@
 };
 
 __LIBC_HIDDEN__ extern WriteProtected<libc_globals> __libc_globals;
-// This cannot be in __libc_globals, because we cannot access the
+// These cannot be in __libc_globals, because we cannot access the
 // WriteProtected in a thread-safe way.
 // See b/328256432.
+//
+// __libc_memtag_stack says whether stack MTE is enabled on the process, i.e.
+// whether the stack pages are mapped with PROT_MTE. This is always false if
+// MTE is disabled for the process (i.e. libc_globals.memtag is false).
 __LIBC_HIDDEN__ extern _Atomic(bool) __libc_memtag_stack;
+// __libc_memtag_stack_abi says whether the process contains any code that was
+// compiled with memtag-stack. This is true even if the process does not have
+// MTE enabled (e.g. because it was overridden using MEMTAG_OPTIONS, or because
+// MTE is disabled for the device).
+// Code compiled with memtag-stack needs a stack history buffer in
+// TLS_SLOT_STACK_MTE, because the codegen will emit an unconditional
+// (to keep the code branchless) write to it.
+// Protected by g_heap_creation_lock.
+__LIBC_HIDDEN__ extern bool __libc_memtag_stack_abi;
 
 struct abort_msg_t;
 struct crash_detail_page_t;
@@ -133,7 +146,9 @@
   size_t scudo_stack_depot_size = 0;
 
   HeapTaggingLevel initial_heap_tagging_level = M_HEAP_TAGGING_LEVEL_NONE;
+  // See comments for __libc_memtag_stack / __libc_memtag_stack_abi above.
   bool initial_memtag_stack = false;
+  bool initial_memtag_stack_abi = false;
   int64_t heap_tagging_upgrade_timer_sec = 0;
 
   void (*memtag_stack_dlopen_callback)() = nullptr;
diff --git a/libdl/Android.bp b/libdl/Android.bp
index 95b412b..1bbd902 100644
--- a/libdl/Android.bp
+++ b/libdl/Android.bp
@@ -60,33 +60,14 @@
     native_bridge_supported: true,
     static_ndk_lib: true,
 
-    defaults: ["linux_bionic_supported"],
-
-    // NOTE: --exclude-libs=libgcc.a makes sure that any symbols libdl.so pulls from
-    // libgcc.a are made static to libdl.so.  This in turn ensures that libraries that
-    // a) pull symbols from libgcc.a and b) depend on libdl.so will not rely on libdl.so
-    // to provide those symbols, but will instead pull them from libgcc.a.  Specifically,
-    // we use this property to make sure libc.so has its own copy of the code from
-    // libgcc.a it uses.
-    //
-    // DO NOT REMOVE --exclude-libs!
-
-    ldflags: [
-        "-Wl,--exclude-libs=libgcc.a",
-        "-Wl,--exclude-libs=libgcc_stripped.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-arm-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-aarch64-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-i686-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-riscv64-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-x86_64-android.a",
+    defaults: [
+        "linux_bionic_supported",
+        "bug_24465209_workaround",
     ],
 
-    // for x86, exclude libgcc_eh.a for the same reasons as above
     arch: {
         arm: {
             version_script: ":libdl.arm.map",
-            pack_relocations: false,
-            ldflags: ["-Wl,--hash-style=both"],
         },
         arm64: {
             version_script: ":libdl.arm64.map",
@@ -95,15 +76,9 @@
             version_script: ":libdl.riscv64.map",
         },
         x86: {
-            pack_relocations: false,
-            ldflags: [
-                "-Wl,--exclude-libs=libgcc_eh.a",
-                "-Wl,--hash-style=both",
-            ],
             version_script: ":libdl.x86.map",
         },
         x86_64: {
-            ldflags: ["-Wl,--exclude-libs=libgcc_eh.a"],
             version_script: ":libdl.x86_64.map",
         },
     },
@@ -162,37 +137,6 @@
     recovery_available: true,
     native_bridge_supported: true,
 
-    // NOTE: --exclude-libs=libgcc.a makes sure that any symbols libdl.so pulls from
-    // libgcc.a are made static to libdl.so.  This in turn ensures that libraries that
-    // a) pull symbols from libgcc.a and b) depend on libdl.so will not rely on libdl.so
-    // to provide those symbols, but will instead pull them from libgcc.a.  Specifically,
-    // we use this property to make sure libc.so has its own copy of the code from
-    // libgcc.a it uses.
-    //
-    // DO NOT REMOVE --exclude-libs!
-
-    ldflags: [
-        "-Wl,--exclude-libs=libgcc.a",
-        "-Wl,--exclude-libs=libgcc_stripped.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-arm-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-aarch64-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-i686-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-riscv64-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-x86_64-android.a",
-    ],
-
-    // for x86, exclude libgcc_eh.a for the same reasons as above
-    arch: {
-        x86: {
-            ldflags: [
-                "-Wl,--exclude-libs=libgcc_eh.a",
-            ],
-        },
-        x86_64: {
-            ldflags: ["-Wl,--exclude-libs=libgcc_eh.a"],
-        },
-    },
-
     srcs: ["libdl_android.cpp"],
     version_script: "libdl_android.map.txt",
 
diff --git a/libm/Android.bp b/libm/Android.bp
index 09d8535..00d90a0 100644
--- a/libm/Android.bp
+++ b/libm/Android.bp
@@ -199,11 +199,6 @@
         "upstream-netbsd/lib/libm/complex/ctanhl.c",
         "upstream-netbsd/lib/libm/complex/ctanl.c",
 
-        // TODO: this comes from from upstream's libc, not libm, but it's an
-        // implementation detail that should have hidden visibility, so it needs
-        // to be in whatever library the math code is in.
-        "digittoint.c",
-
         // Functionality not in the BSDs.
         "significandl.c",
         "fake_long_double.c",
diff --git a/libm/digittoint.c b/libm/digittoint.c
deleted file mode 100644
index 1824788..0000000
--- a/libm/digittoint.c
+++ /dev/null
@@ -1,46 +0,0 @@
-/*-
- * Copyright (c) 2007 David Schultz
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- *    notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *    notice, this list of conditions and the following disclaimer in the
- *    documentation and/or other materials provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
- * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- *
- * $FreeBSD$
- */
-
-#include <sys/cdefs.h>
-
-/* digittoint is in the FreeBSD C library, but implemented in terms of locale stuff. */
-__LIBC_HIDDEN__ int digittoint(char ch) {
-  int d = ch - '0';
-  if ((unsigned) d < 10) {
-    return d;
-  }
-  d = ch - 'a';
-  if ((unsigned) d < 6) {
-    return d + 10;
-  }
-  d = ch - 'A';
-  if ((unsigned) d < 6) {
-    return d + 10;
-  }
-  return -1;
-}
diff --git a/libm/freebsd-compat.h b/libm/freebsd-compat.h
index 7accc55..a3c7cd4 100644
--- a/libm/freebsd-compat.h
+++ b/libm/freebsd-compat.h
@@ -27,18 +27,12 @@
 #define __strong_reference(sym,aliassym) \
     extern __typeof (sym) aliassym __attribute__ ((__alias__ (#sym)))
 
-#define __warn_references(sym,msg) /* ignored */
-
-// digittoint is in BSD's <ctype.h>, but not ours, so we have a secret
-// implementation in libm. We reuse parts of libm in the NDK's
-// libandroid_support, where it's a static library, so we want all our
-// "hidden" functions start with a double underscore --- being HIDDEN
-// in the ELF sense is not sufficient.
-#define digittoint __libm_digittoint
-int digittoint(char ch);
-
-// Similarly rename _scan_nan.
-#define _scan_nan __libm_scan_nan
+// digittoint is in BSD's <ctype.h> but not ours.
+#include <ctype.h>
+static inline int digittoint(char ch) {
+  if (!isxdigit(ch)) return -1;
+  return isdigit(ch) ? (ch - '0') : (_tolower(ch) - 'a');
+}
 
 // FreeBSD exports these in <math.h> but we don't.
 double cospi(double);
diff --git a/linker/Android.bp b/linker/Android.bp
index 78109e8..143dbd5 100644
--- a/linker/Android.bp
+++ b/linker/Android.bp
@@ -1,14 +1,3 @@
-// ========================================================
-// linker_wrapper - Linux Bionic (on the host)
-// ========================================================
-
-// This is used for bionic on (host) Linux to bootstrap our linker embedded into
-// a binary.
-//
-// Host bionic binaries do not have a PT_INTERP section, instead this gets
-// embedded as the entry point, and the linker is embedded as ELF sections in
-// each binary. There's a linker script that sets all of that up (generated by
-// extract_linker), and defines the extern symbols used in this file.
 package {
     default_team: "trendy_team_native_tools_libraries",
     default_applicable_licenses: ["bionic_linker_license"],
@@ -25,6 +14,17 @@
     ],
 }
 
+// ========================================================
+// linker_wrapper - Linux Bionic (on the host)
+// ========================================================
+
+// This is used for bionic on (host) Linux to bootstrap our linker embedded into
+// a binary.
+//
+// Host bionic binaries do not have a PT_INTERP section, instead this gets
+// embedded as the entry point, and the linker is embedded as ELF sections in
+// each binary. There's a linker script that sets all of that up (generated by
+// extract_linker), and defines the extern symbols used in this file.
 cc_object {
     name: "linker_wrapper",
     host_supported: true,
@@ -327,8 +327,10 @@
         },
     },
 
+    static_executable: true,
+
     // -shared is used to overwrite the -Bstatic and -static flags triggered by enabling
-    // static_executable. This dynamic linker is actually a shared object linked with static
+    // static_executable. The dynamic linker is actually a shared object linked with static
     // libraries.
     ldflags: [
         "-shared",
@@ -344,50 +346,38 @@
         "-Wl,--pack-dyn-relocs=relr",
     ],
 
-    // we are going to link libc++_static manually because
-    // when stl is not set to "none" build system adds libdl
-    // to the list of static libraries which needs to be
-    // avoided in the case of building loader.
+    // We link libc++_static manually because otherwise the build system will
+    // automatically add libdl to the list of static libraries.
     stl: "none",
 
-    // we don't want crtbegin.o (because we have begin.o), so unset it
-    // just for this module
+    // We don't want crtbegin.o (because we have our own arch/*/begin.o),
+    // so unset it just for this module.
     nocrt: true,
 
-    static_executable: true,
-
     // Insert an extra objcopy step to add prefix to symbols. This is needed to prevent gdb
     // looking up symbols in the linker by mistake.
     prefix_symbols: "__dl_",
 
     sanitize: {
         hwaddress: false,
+        memtag_stack: false,
     },
 
     static_libs: [
         "liblinker_main",
         "liblinker_malloc",
 
-        // Use a version of libc++ built without exceptions, because accessing EH globals uses
-        // ELF TLS, which is not supported in the loader.
+        // We use a version of libc++ built without exceptions,
+        // because accessing EH globals uses ELF TLS,
+        // which is not supported in the loader.
         "libc++_static_noexcept",
+
         "libc_nomalloc",
         "libc_dynamic_dispatch",
         "libm",
         "libunwind",
     ],
 
-    // Ensure that if the linker needs __gnu_Unwind_Find_exidx, then the linker will have a
-    // definition of the symbol. The linker links against libgcc.a, whose arm32 unwinder has a weak
-    // reference to __gnu_Unwind_Find_exidx, which isn't sufficient to pull in the strong definition
-    // of __gnu_Unwind_Find_exidx from libc. An unresolved weak reference would create a
-    // non-relative dynamic relocation in the linker binary, which complicates linker startup.
-    //
-    // This line should be unnecessary because the linker's dependency on libunwind_llvm.a should
-    // override libgcc.a, but this line provides a simpler guarantee. It can be removed once the
-    // linker stops linking against libgcc.a's arm32 unwinder.
-    whole_static_libs: ["libc_unwind_static"],
-
     system_shared_libs: [],
 
     // Opt out of native_coverage when opting out of system_shared_libs
@@ -474,35 +464,6 @@
 }
 
 cc_library {
-    // NOTE: --exclude-libs=libgcc.a makes sure that any symbols ld-android.so pulls from
-    // libgcc.a are made static to ld-android.so.  This in turn ensures that libraries that
-    // a) pull symbols from libgcc.a and b) depend on ld-android.so will not rely on ld-android.so
-    // to provide those symbols, but will instead pull them from libgcc.a.  Specifically,
-    // we use this property to make sure libc.so has its own copy of the code from
-    // libgcc.a it uses.
-    //
-    // DO NOT REMOVE --exclude-libs!
-
-    ldflags: [
-        "-Wl,--exclude-libs=libgcc.a",
-        "-Wl,--exclude-libs=libgcc_stripped.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-arm-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-aarch64-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-riscv64-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-i686-android.a",
-        "-Wl,--exclude-libs=libclang_rt.builtins-x86_64-android.a",
-    ],
-
-    // for x86, exclude libgcc_eh.a for the same reasons as above
-    arch: {
-        x86: {
-            ldflags: ["-Wl,--exclude-libs=libgcc_eh.a"],
-        },
-        x86_64: {
-            ldflags: ["-Wl,--exclude-libs=libgcc_eh.a"],
-        },
-    },
-
     srcs: ["ld_android.cpp"],
     cflags: [
         "-Wall",
diff --git a/linker/linker.cpp b/linker/linker.cpp
index 8b467a3..e13d37d 100644
--- a/linker/linker.cpp
+++ b/linker/linker.cpp
@@ -3595,7 +3595,7 @@
   // 2. Initialize other namespaces
 
   for (auto& ns_config : namespace_configs) {
-    if (namespaces.find(ns_config->name()) != namespaces.end()) {
+    if (namespaces.contains(ns_config->name())) {
       continue;
     }
 
diff --git a/linker/linker_config.cpp b/linker/linker_config.cpp
index ad40c50..70430b8 100644
--- a/linker/linker_config.cpp
+++ b/linker/linker_config.cpp
@@ -304,7 +304,7 @@
     }
 
     if (result == ConfigParser::kPropertyAssign) {
-      if (properties->find(name) != properties->end()) {
+      if (properties->contains(name)) {
         DL_WARN("%s:%zd: warning: redefining property \"%s\" (overriding previous value)",
                 ld_config_file_path,
                 cp.lineno(),
@@ -313,7 +313,7 @@
 
       (*properties)[name] = PropertyValue(std::move(value), cp.lineno());
     } else if (result == ConfigParser::kPropertyAppend) {
-      if (properties->find(name) == properties->end()) {
+      if (!properties->contains(name)) {
         DL_WARN("%s:%zd: warning: appending to undefined property \"%s\" (treating as assignment)",
                 ld_config_file_path,
                 cp.lineno(),
@@ -526,7 +526,7 @@
         properties.get_strings(property_name_prefix + ".links", &lineno);
 
     for (const auto& linked_ns_name : linked_namespaces) {
-      if (namespace_configs.find(linked_ns_name) == namespace_configs.end()) {
+      if (!namespace_configs.contains(linked_ns_name)) {
         *error_msg = create_error_msg(ld_config_file_path,
                                       lineno,
                                       std::string("undefined namespace: ") + linked_ns_name);
diff --git a/linker/linker_namespaces.cpp b/linker/linker_namespaces.cpp
index 5182129..eb9dae9 100644
--- a/linker/linker_namespaces.cpp
+++ b/linker/linker_namespaces.cpp
@@ -100,7 +100,7 @@
     // be searched.
     if (allow_secondary) {
       const android_namespace_list_t& secondary_namespaces = si->get_secondary_namespaces();
-      if (secondary_namespaces.find(this) != secondary_namespaces.end()) {
+      if (secondary_namespaces.contains(this)) {
         return true;
       }
     }
diff --git a/linker/linker_soinfo.cpp b/linker/linker_soinfo.cpp
index 802c06a..d915503 100644
--- a/linker/linker_soinfo.cpp
+++ b/linker/linker_soinfo.cpp
@@ -887,7 +887,7 @@
     handle_ = handle_ | 1;
   } while (handle_ == reinterpret_cast<uintptr_t>(RTLD_DEFAULT) ||
            handle_ == reinterpret_cast<uintptr_t>(RTLD_NEXT) ||
-           g_soinfo_handles_map.find(handle_) != g_soinfo_handles_map.end());
+           g_soinfo_handles_map.contains(handle_));
 
   g_soinfo_handles_map[handle_] = this;
 }
diff --git a/tests/dirent_test.cpp b/tests/dirent_test.cpp
index 4d21246..cde2d11 100644
--- a/tests/dirent_test.cpp
+++ b/tests/dirent_test.cpp
@@ -33,11 +33,11 @@
 
 static void CheckProcSelf(std::set<std::string>& names) {
   // We have a good idea of what should be in /proc/self.
-  ASSERT_TRUE(names.find(".") != names.end());
-  ASSERT_TRUE(names.find("..") != names.end());
-  ASSERT_TRUE(names.find("cmdline") != names.end());
-  ASSERT_TRUE(names.find("fd") != names.end());
-  ASSERT_TRUE(names.find("stat") != names.end());
+  ASSERT_TRUE(names.contains("."));
+  ASSERT_TRUE(names.contains(".."));
+  ASSERT_TRUE(names.contains("cmdline"));
+  ASSERT_TRUE(names.contains("fd"));
+  ASSERT_TRUE(names.contains("stat"));
 }
 
 template <typename DirEntT>
diff --git a/tests/ifaddrs_test.cpp b/tests/ifaddrs_test.cpp
index b3ab94d..da64770 100644
--- a/tests/ifaddrs_test.cpp
+++ b/tests/ifaddrs_test.cpp
@@ -137,7 +137,7 @@
   sockaddr_in* sock = reinterpret_cast<sockaddr_in*>(&ifr.ifr_addr);
   in_addr_t addr = sock->sin_addr.s_addr;
 
-  EXPECT_TRUE(addrs.find(addr) != addrs.end()) << if_name << ' ' << std::hex << ntohl(addr);
+  EXPECT_TRUE(addrs.contains(addr)) << if_name << ' ' << std::hex << ntohl(addr);
 }
 
 TEST(ifaddrs, getifaddrs_INET) {
diff --git a/tests/libs/testbinary_is_stack_mte.cpp b/tests/libs/testbinary_is_stack_mte.cpp
index d8074d5..0cdc466 100644
--- a/tests/libs/testbinary_is_stack_mte.cpp
+++ b/tests/libs/testbinary_is_stack_mte.cpp
@@ -36,7 +36,9 @@
 #if defined(__BIONIC__) && defined(__aarch64__)
 
 extern "C" int main(int, char**) {
-  int ret = is_stack_mte_on() ? 0 : 1;
+  void* mte_tls_ptr = mte_tls();
+  *reinterpret_cast<uintptr_t*>(mte_tls_ptr) = 1;
+  int ret = is_stack_mte_on() && mte_tls_ptr != nullptr ? 0 : 1;
   printf("RAN\n");
   return ret;
 }
diff --git a/tests/libs/testbinary_is_stack_mte_after_dlopen.cpp b/tests/libs/testbinary_is_stack_mte_after_dlopen.cpp
index 937ac4c..35af8f4 100644
--- a/tests/libs/testbinary_is_stack_mte_after_dlopen.cpp
+++ b/tests/libs/testbinary_is_stack_mte_after_dlopen.cpp
@@ -96,6 +96,7 @@
   State state = kInit;
 
   bool is_early_thread_mte_on = false;
+  void* early_thread_mte_tls = nullptr;
   std::thread early_th([&] {
     {
       std::lock_guard lk(m);
@@ -107,6 +108,8 @@
       cv.wait(lk, [&] { return state == kStackRemapped; });
     }
     is_early_thread_mte_on = is_stack_mte_on();
+    early_thread_mte_tls = mte_tls();
+    *reinterpret_cast<uintptr_t*>(early_thread_mte_tls) = 1;
   });
   {
     std::unique_lock lk(m);
@@ -120,6 +123,7 @@
   cv.notify_one();
   CHECK(handle != nullptr);
   CHECK(is_stack_mte_on());
+  CHECK(mte_tls() != nullptr);
 
   bool new_stack_page_mte_on = false;
   uintptr_t low;
@@ -129,11 +133,18 @@
   CHECK(new_stack_page_mte_on);
 
   bool is_late_thread_mte_on = false;
-  std::thread late_th([&] { is_late_thread_mte_on = is_stack_mte_on(); });
+  void* late_thread_mte_tls = nullptr;
+  std::thread late_th([&] {
+    is_late_thread_mte_on = is_stack_mte_on();
+    late_thread_mte_tls = mte_tls();
+    *reinterpret_cast<uintptr_t*>(late_thread_mte_tls) = 1;
+  });
   late_th.join();
   early_th.join();
   CHECK(is_late_thread_mte_on);
   CHECK(is_early_thread_mte_on);
+  CHECK(late_thread_mte_tls != nullptr);
+  CHECK(early_thread_mte_tls != nullptr);
   printf("RAN\n");
   return 0;
 }
diff --git a/tests/link_test.cpp b/tests/link_test.cpp
index 127a3d9..ae3a1cd 100644
--- a/tests/link_test.cpp
+++ b/tests/link_test.cpp
@@ -195,7 +195,7 @@
     }
     void AddModule(dl_phdr_info* info, size_t s) {
       ASSERT_EQ(sizeof(dl_phdr_info), s);
-      ASSERT_TRUE(dl_iter_mods.find(info->dlpi_addr) == dl_iter_mods.end());
+      ASSERT_FALSE(dl_iter_mods.contains(info->dlpi_addr));
       ASSERT_TRUE(info->dlpi_name != nullptr);
       dl_iter_mods[info->dlpi_addr] = {
         .name = info->dlpi_name,
diff --git a/tests/math_test.cpp b/tests/math_test.cpp
index 493f3af..273ef97 100644
--- a/tests/math_test.cpp
+++ b/tests/math_test.cpp
@@ -17,36 +17,37 @@
 #define _GNU_SOURCE 1
 #include <math.h>
 
-// This include (and the associated definition of __test_capture_signbit)
-// must be placed before any files that include <cmath> (gtest.h in this case).
+// <math.h> is required to define type-generic macros: fpclassify, signbit,
+// isfinite, isinf, isnan, isnormal, isgreater, isgreaterequal, isless,
+// islessequal, islessgreater, and isunordered.
 //
-// <math.h> is required to define generic macros signbit, isfinite and
-// several other such functions.
+// <cmath> is required to #undef these macros and make equivalent sets of
+// _overloaded_ functions available in namespace std. So the isnan() macro,
+// for example, is replaced by std::isnan(float), std::isnan(double),
+// and std::isnan(long double).
 //
-// <cmath> is required to undef declarations of these macros in the global
-// namespace and make equivalent functions available in namespace std. Our
-// stlport implementation does this only for signbit, isfinite, isinf and
-// isnan.
-//
-// NOTE: We don't write our test using std::signbit because we want to be
-// sure that we're testing the bionic version of signbit. The C++ libraries
-// are free to reimplement signbit or delegate to compiler builtins if they
-// please.
+// We're trying to test the bionic macros rather than whatever libc++'s
+// implementation happens to be, so we #include <math.h> and "capture" the
+// macros in our own _template_ functions in the global namespace before
+// we #include any files that include <cmath>, such as <gtest.h>.
 
-namespace {
-template<typename T> inline int test_capture_signbit(const T in) {
-  return signbit(in);
-}
-template<typename T> inline int test_capture_isfinite(const T in) {
-  return isfinite(in);
-}
-template<typename T> inline int test_capture_isnan(const T in) {
-  return isnan(in);
-}
-template<typename T> inline int test_capture_isinf(const T in) {
-  return isinf(in);
-}
-}
+#define capture_generic_macro(capture_function_name, generic_macro_name) \
+  template <typename T> inline int capture_function_name(const T in) { \
+    return generic_macro_name(in); \
+  }
+
+capture_generic_macro(test_capture_fpclassify, fpclassify)
+capture_generic_macro(test_capture_signbit, signbit)
+capture_generic_macro(test_capture_isfinite, isfinite)
+capture_generic_macro(test_capture_isinf, isinf)
+capture_generic_macro(test_capture_isnan, isnan)
+capture_generic_macro(test_capture_isnormal, isnormal)
+capture_generic_macro(test_capture_isgreater, isgreater)
+capture_generic_macro(test_capture_isgreaterequal, isgreaterequal)
+capture_generic_macro(test_capture_isless, isless)
+capture_generic_macro(test_capture_islessequal, islessequal)
+capture_generic_macro(test_capture_islessgreater, islessgreater)
+capture_generic_macro(test_capture_isunordered, isunordered)
 
 #include "math_data_test.h"
 
@@ -60,6 +61,22 @@
 
 #include <android-base/scopeguard.h>
 
+// Now we've included all the headers we need, we can redefine the generic
+// function-like macros to point to the bionic <math.h> versions we captured
+// earlier.
+#define fpclassify test_capture_fpclassify
+#define signbit test_capture_signbit
+#define isfinite test_capture_isfinite
+#define isinf test_capture_isinf
+#define isnan test_capture_isnan
+#define isnormal test_capture_isnormal
+#define isgreater test_capture_isgreater
+#define isgreaterequal test_capture_isgreaterequal
+#define isless test_capture_isless
+#define islessequal test_capture_islessequal
+#define islessgreater test_capture_islessgreater
+#define isunordered test_capture_isunordered
+
 static float float_subnormal() {
   union {
     float f;
@@ -124,36 +141,36 @@
 }
 
 TEST(math_h, isfinite) {
-  ASSERT_TRUE(test_capture_isfinite(123.0f));
-  ASSERT_TRUE(test_capture_isfinite(123.0));
-  ASSERT_TRUE(test_capture_isfinite(123.0L));
-  ASSERT_FALSE(test_capture_isfinite(HUGE_VALF));
-  ASSERT_FALSE(test_capture_isfinite(-HUGE_VALF));
-  ASSERT_FALSE(test_capture_isfinite(HUGE_VAL));
-  ASSERT_FALSE(test_capture_isfinite(-HUGE_VAL));
-  ASSERT_FALSE(test_capture_isfinite(HUGE_VALL));
-  ASSERT_FALSE(test_capture_isfinite(-HUGE_VALL));
+  ASSERT_TRUE(isfinite(123.0f));
+  ASSERT_TRUE(isfinite(123.0));
+  ASSERT_TRUE(isfinite(123.0L));
+  ASSERT_FALSE(isfinite(HUGE_VALF));
+  ASSERT_FALSE(isfinite(-HUGE_VALF));
+  ASSERT_FALSE(isfinite(HUGE_VAL));
+  ASSERT_FALSE(isfinite(-HUGE_VAL));
+  ASSERT_FALSE(isfinite(HUGE_VALL));
+  ASSERT_FALSE(isfinite(-HUGE_VALL));
 }
 
 TEST(math_h, isinf) {
-  ASSERT_FALSE(test_capture_isinf(123.0f));
-  ASSERT_FALSE(test_capture_isinf(123.0));
-  ASSERT_FALSE(test_capture_isinf(123.0L));
-  ASSERT_TRUE(test_capture_isinf(HUGE_VALF));
-  ASSERT_TRUE(test_capture_isinf(-HUGE_VALF));
-  ASSERT_TRUE(test_capture_isinf(HUGE_VAL));
-  ASSERT_TRUE(test_capture_isinf(-HUGE_VAL));
-  ASSERT_TRUE(test_capture_isinf(HUGE_VALL));
-  ASSERT_TRUE(test_capture_isinf(-HUGE_VALL));
+  ASSERT_FALSE(isinf(123.0f));
+  ASSERT_FALSE(isinf(123.0));
+  ASSERT_FALSE(isinf(123.0L));
+  ASSERT_TRUE(isinf(HUGE_VALF));
+  ASSERT_TRUE(isinf(-HUGE_VALF));
+  ASSERT_TRUE(isinf(HUGE_VAL));
+  ASSERT_TRUE(isinf(-HUGE_VAL));
+  ASSERT_TRUE(isinf(HUGE_VALL));
+  ASSERT_TRUE(isinf(-HUGE_VALL));
 }
 
 TEST(math_h, isnan) {
-  ASSERT_FALSE(test_capture_isnan(123.0f));
-  ASSERT_FALSE(test_capture_isnan(123.0));
-  ASSERT_FALSE(test_capture_isnan(123.0L));
-  ASSERT_TRUE(test_capture_isnan(nanf("")));
-  ASSERT_TRUE(test_capture_isnan(nan("")));
-  ASSERT_TRUE(test_capture_isnan(nanl("")));
+  ASSERT_FALSE(isnan(123.0f));
+  ASSERT_FALSE(isnan(123.0));
+  ASSERT_FALSE(isnan(123.0L));
+  ASSERT_TRUE(isnan(nanf("")));
+  ASSERT_TRUE(isnan(nan("")));
+  ASSERT_TRUE(isnan(nanl("")));
 }
 
 TEST(math_h, isnormal) {
@@ -167,17 +184,17 @@
 
 // TODO: isgreater, isgreaterequals, isless, islessequal, islessgreater, isunordered
 TEST(math_h, signbit) {
-  ASSERT_EQ(0, test_capture_signbit(0.0f));
-  ASSERT_EQ(0, test_capture_signbit(0.0));
-  ASSERT_EQ(0, test_capture_signbit(0.0L));
+  ASSERT_EQ(0, signbit(0.0f));
+  ASSERT_EQ(0, signbit(0.0));
+  ASSERT_EQ(0, signbit(0.0L));
 
-  ASSERT_EQ(0, test_capture_signbit(1.0f));
-  ASSERT_EQ(0, test_capture_signbit(1.0));
-  ASSERT_EQ(0, test_capture_signbit(1.0L));
+  ASSERT_EQ(0, signbit(1.0f));
+  ASSERT_EQ(0, signbit(1.0));
+  ASSERT_EQ(0, signbit(1.0L));
 
-  ASSERT_NE(0, test_capture_signbit(-1.0f));
-  ASSERT_NE(0, test_capture_signbit(-1.0));
-  ASSERT_NE(0, test_capture_signbit(-1.0L));
+  ASSERT_NE(0, signbit(-1.0f));
+  ASSERT_NE(0, signbit(-1.0));
+  ASSERT_NE(0, signbit(-1.0L));
 }
 
 // Historical BSD cruft that isn't exposed in <math.h> any more.
@@ -309,9 +326,7 @@
 // Historical BSD cruft that isn't exposed in <math.h> any more.
 extern "C" int __isinf(double);
 extern "C" int __isinff(float);
-extern "C" int isinff(float);
 extern "C" int __isinfl(long double);
-extern "C" int isinfl(long double);
 
 TEST(math_h, __isinf) {
 #if defined(ANDROID_HOST_MUSL)
@@ -367,9 +382,7 @@
 // Historical BSD cruft that isn't exposed in <math.h> any more.
 extern "C" int __isnan(double);
 extern "C" int __isnanf(float);
-extern "C" int isnanf(float);
 extern "C" int __isnanl(long double);
-extern "C" int isnanl(long double);
 
 TEST(math_h, __isnan) {
 #if defined(ANDROID_HOST_MUSL)
diff --git a/tests/mte_utils.h b/tests/mte_utils.h
index 6e8385c..020faec 100644
--- a/tests/mte_utils.h
+++ b/tests/mte_utils.h
@@ -40,4 +40,10 @@
   return p == p_cpy;
 }
 
+static void* mte_tls() {
+  void** dst;
+  __asm__("mrs %0, TPIDR_EL0" : "=r"(dst) :);
+  return dst[-3];
+}
+
 #endif
diff --git a/tests/stack_protector_test.cpp b/tests/stack_protector_test.cpp
index aea791c..5817a27 100644
--- a/tests/stack_protector_test.cpp
+++ b/tests/stack_protector_test.cpp
@@ -48,7 +48,7 @@
     printf("[thread %d] TLS stack guard = %p\n", tid, guard);
 
     // Duplicate tid. gettid(2) bug? Seeing this would be very upsetting.
-    ASSERT_TRUE(tids.find(tid) == tids.end());
+    ASSERT_FALSE(tids.contains(tid));
 
     // Uninitialized guard. Our bug. Note this is potentially flaky; we _could_
     // get four random zero bytes, but it should be vanishingly unlikely.
diff --git a/tests/struct_layout_test.cpp b/tests/struct_layout_test.cpp
index 0123ed9..1f04344 100644
--- a/tests/struct_layout_test.cpp
+++ b/tests/struct_layout_test.cpp
@@ -30,7 +30,7 @@
 #define CHECK_OFFSET(name, field, offset) \
     check_offset(#name, #field, offsetof(name, field), offset);
 #ifdef __LP64__
-  CHECK_SIZE(pthread_internal_t, 776);
+  CHECK_SIZE(pthread_internal_t, 816);
   CHECK_OFFSET(pthread_internal_t, next, 0);
   CHECK_OFFSET(pthread_internal_t, prev, 8);
   CHECK_OFFSET(pthread_internal_t, tid, 16);
@@ -55,6 +55,8 @@
   CHECK_OFFSET(pthread_internal_t, dlerror_buffer, 248);
   CHECK_OFFSET(pthread_internal_t, bionic_tls, 760);
   CHECK_OFFSET(pthread_internal_t, errno_value, 768);
+  CHECK_OFFSET(pthread_internal_t, bionic_tcb, 776);
+  CHECK_OFFSET(pthread_internal_t, stack_mte_ringbuffer_vma_name_buffer, 784);
   CHECK_SIZE(bionic_tls, 12200);
   CHECK_OFFSET(bionic_tls, key_data, 0);
   CHECK_OFFSET(bionic_tls, locale, 2080);
@@ -72,7 +74,7 @@
   CHECK_OFFSET(bionic_tls, bionic_systrace_disabled, 12193);
   CHECK_OFFSET(bionic_tls, padding, 12194);
 #else
-  CHECK_SIZE(pthread_internal_t, 668);
+  CHECK_SIZE(pthread_internal_t, 704);
   CHECK_OFFSET(pthread_internal_t, next, 0);
   CHECK_OFFSET(pthread_internal_t, prev, 4);
   CHECK_OFFSET(pthread_internal_t, tid, 8);
@@ -97,6 +99,8 @@
   CHECK_OFFSET(pthread_internal_t, dlerror_buffer, 148);
   CHECK_OFFSET(pthread_internal_t, bionic_tls, 660);
   CHECK_OFFSET(pthread_internal_t, errno_value, 664);
+  CHECK_OFFSET(pthread_internal_t, bionic_tcb, 668);
+  CHECK_OFFSET(pthread_internal_t, stack_mte_ringbuffer_vma_name_buffer, 672);
   CHECK_SIZE(bionic_tls, 11080);
   CHECK_OFFSET(bionic_tls, key_data, 0);
   CHECK_OFFSET(bionic_tls, locale, 1040);
diff --git a/tools/versioner/src/versioner.cpp b/tools/versioner/src/versioner.cpp
index 5afa00b..320c19c 100644
--- a/tools/versioner/src/versioner.cpp
+++ b/tools/versioner/src/versioner.cpp
@@ -142,7 +142,7 @@
 
   auto new_end = std::remove_if(headers.begin(), headers.end(), [&arch](llvm::StringRef header) {
     for (const auto& it : ignored_headers) {
-      if (it.second.find(arch) == it.second.end()) {
+      if (!it.second.contains(arch)) {
         continue;
       }