libc: ARM: Add 32-bit Kryo memcpy
* Memcpy is based on Scorpion due to Qualcomm's
128-bit cache line size optimizations.
* PLDOFFSET and PLDSIZE are from the ARM64 Kryo memcpy routine.
Below are the results of the benchmark, tested on a OnePlus 3 with MSM8996.
Before:
BM_string_memcpy/8 1000k 8 0.934 GiB/s
BM_string_memcpy/64 1000k 11 5.785 GiB/s
BM_string_memcpy/512 1000k 25 19.918 GiB/s
BM_string_memcpy/1024 50M 42 23.938 GiB/s
BM_string_memcpy/8Ki 10M 473 17.291 GiB/s
BM_string_memcpy/16Ki 5M 565 28.976 GiB/s
BM_string_memcpy/32Ki 1000k 1105 29.631 GiB/s
BM_string_memcpy/64Ki 1000k 2194 29.864 GiB/s
After:
BM_string_memcpy/8 1000k 6 1.145 GiB/s
BM_string_memcpy/64 1000k 7 8.560 GiB/s
BM_string_memcpy/512 1000k 18 27.370 GiB/s
BM_string_memcpy/1024 50M 33 30.340 GiB/s
BM_string_memcpy/8Ki 10M 266 30.770 GiB/s
BM_string_memcpy/16Ki 5M 553 29.599 GiB/s
BM_string_memcpy/32Ki 1000k 1121 29.219 GiB/s
BM_string_memcpy/64Ki 1000k 2208 29.678 GiB/s
Test: make otapackage
Test: Ran bionic unit tests on Pixel device. Verified memcpy wins on
Test: Pixel device.
Change-Id: Id7a9c37ef75a306dd5cf8d374d79d0fe83f8a3ba
diff --git a/libc/Android.bp b/libc/Android.bp
index de270c2..e8175f4 100644
--- a/libc/Android.bp
+++ b/libc/Android.bp
@@ -1018,7 +1018,7 @@
},
kryo: {
srcs: [
- "arch-arm/krait/bionic/memcpy.S",
+ "arch-arm/kryo/bionic/memcpy.S",
"arch-arm/cortex-a7/bionic/memset.S",
"arch-arm/krait/bionic/strcmp.S",
"arch-arm/krait/bionic/__strcat_chk.S",