Use ifuncs in the linker

Using ifuncs allows the linker to select faster versions of libc functions
like strcmp, making linking faster.

The linker continues to first initialize TLS, then call the ifunc
resolvers. There are small amounts of code in Bionic that need to avoid
calling functions selected using ifuncs (generally string.h APIs). I've
tried to compile those pieces with -ffreestanding. Maybe it's unnecessary,
but maybe it could help avoid compiler-inserted memset calls, and maybe
it will be useful later on.

The ifuncs are called in a special early pass using special
__rel[a]_iplt_start / __rel[a]_iplt_end symbols. The linker will encounter
the ifuncs again as R_*_IRELATIVE dynamic relocations, so they're skipped
on the second pass.

Break linker_main.cpp into its own liblinker_main library so it can be
compiled with -ffreestanding.

On walleye, this change fixes a recent 2.3% linker64 start-up time
regression (156.6ms -> 160.2ms), but it also helps the 32-bit time by
about 1.9% on the same benchmark. I'm measuring the run-time using a
synthetic benchmark based on loading libandroid_servers.so.

Test: bionic unit tests, manual benchmarking
Bug: none
Change-Id: Ieb9446c2df13a66fc0d377596756becad0af6995
diff --git a/libc/bionic/bionic_call_ifunc_resolver.cpp b/libc/bionic/bionic_call_ifunc_resolver.cpp
index 8522835..437de78 100644
--- a/libc/bionic/bionic_call_ifunc_resolver.cpp
+++ b/libc/bionic/bionic_call_ifunc_resolver.cpp
@@ -30,14 +30,32 @@
 #include <sys/auxv.h>
 #include <sys/ifunc.h>
 
+#include "private/bionic_auxv.h"
+
+// This code is called in the linker before it has been relocated, so minimize calls into other
+// parts of Bionic. In particular, we won't ever have two ifunc resolvers called concurrently, so
+// initializing the ifunc resolver argument doesn't need to be thread-safe.
+
 ElfW(Addr) __bionic_call_ifunc_resolver(ElfW(Addr) resolver_addr) {
 #if defined(__aarch64__)
   typedef ElfW(Addr) (*ifunc_resolver_t)(uint64_t, __ifunc_arg_t*);
-  static __ifunc_arg_t arg = { sizeof(__ifunc_arg_t), getauxval(AT_HWCAP), getauxval(AT_HWCAP2) };
+  static __ifunc_arg_t arg;
+  static bool initialized = false;
+  if (!initialized) {
+    initialized = true;
+    arg._size = sizeof(__ifunc_arg_t);
+    arg._hwcap = getauxval(AT_HWCAP);
+    arg._hwcap2 = getauxval(AT_HWCAP2);
+  }
   return reinterpret_cast<ifunc_resolver_t>(resolver_addr)(arg._hwcap | _IFUNC_ARG_HWCAP, &arg);
 #elif defined(__arm__)
   typedef ElfW(Addr) (*ifunc_resolver_t)(unsigned long);
-  static unsigned long hwcap = getauxval(AT_HWCAP);
+  static unsigned long hwcap;
+  static bool initialized = false;
+  if (!initialized) {
+    initialized = true;
+    hwcap = getauxval(AT_HWCAP);
+  }
   return reinterpret_cast<ifunc_resolver_t>(resolver_addr)(hwcap);
 #else
   typedef ElfW(Addr) (*ifunc_resolver_t)(void);