Implement dynamic TLS accesses and allocation

Initialize a thread's DTV to an empty zeroed DTV. Allocate the DTV and
any ELF module's TLS segment on-demand in __tls_get_addr. Use a generation
counter, incremented in the linker, to signal when threads should
update/reallocate their DTV objects.

A generation count of 0 always indicates the constant zero DTV.

Once a DTV is allocated, it isn't freed until the thread exits, because
a signal handler could interrupt the fast path of __tls_get_addr between
accessing the DTV slot and reading a field of the DTV. Bionic keeps a
linked list of DTV objects so it can free them at thread-exit.

Dynamic TLS memory is allocated using a BionicAllocator instance in
libc_shared_globals. For async-signal safety, access to the
linker/libc-shared state is protected by first blocking signals, then by
acquiring the reader-writer lock, TlsModules::rwlock. A write lock is
needed to allocate or free memory.

In pthread_exit, unconditionally block signals before freeing dynamic
TLS memory or freeing the shadow call stack.

ndk_cruft.cpp: Avoid including pthread_internal.h inside an extern "C".
(The header now includes a C++ template that doesn't compile inside
extern "C".)

Bug: http://b/78026329
Bug: http://b/123094171
Test: bionic unit tests
Change-Id: I3c9b12921c9e68b33dcc1d1dd276bff364eff5d7
diff --git a/libc/bionic/ndk_cruft.cpp b/libc/bionic/ndk_cruft.cpp
index dbacf18..2c3299f 100644
--- a/libc/bionic/ndk_cruft.cpp
+++ b/libc/bionic/ndk_cruft.cpp
@@ -355,9 +355,14 @@
   return malloc(size);
 }
 
+} // extern "C"
+
 #define __get_thread __real_get_thread
 #include "pthread_internal.h"
 #undef __get_thread
+
+extern "C" {
+
 // Various third-party apps contain a backport of our pthread_rwlock implementation that uses this.
 pthread_internal_t* __get_thread() {
   return __real_get_thread();