Use double-precision routines from arm-optimized-routines
This patch ues exp, exp2, log, log2, and pow from arm optimized
routines. For pow on x86_64, although slight slower it simplifies
the code required on both bionic and arm-optimized-routines (so
there is no need to select and export the symbol based on
architecture).
Performance-wise the improvements are:
x86_64 throughput latency
exp 1.16x 1.16x
log 1.08x 0.95x
exp2 1.27x 1.55x
log2 1.40x 1.43x
pow 0.77x 0.89x *
* I tried to check if AVX2/FMA but without success.
aarch64 throughput latency
exp 2.33x 2.16x
exp2 1.99x 1.50x
log 1.79x 1.43x
log2 2.15x 1.80x
pow 3.81x 3.07x
Test: ran bionic tests on static mode.
Change-Id: Ib16bf3280c5329fd257a3b3f0b6c4f2f3cb34deb
diff --git a/libm/fake_long_double.c b/libm/fake_long_double.c
index d81fa39..ef031cc 100644
--- a/libm/fake_long_double.c
+++ b/libm/fake_long_double.c
@@ -45,3 +45,12 @@
// FreeBSD doesn't have an ld128 implementations of tgammal, so both LP32 and LP64 need this.
long double tgammal(long double x) { return tgamma(x); }
+
+// external/arm-optimized-routines does not provide the long double
+// wrappers for the routines it implements.
+#if (LDBL_MANT_DIG == 53)
+long double expl(long double x) { return exp(x); }
+long double exp2l(long double x) { return exp2(x); }
+long double logl(long double x) { return log(x); }
+long double log2l(long double x) { return log2(x); }
+#endif