blob: 249b1448c2fee5c69c10fcafcb3e3938d22cc3f4 [file] [log] [blame] [view]
Christopher Ferris4316d432019-06-27 00:08:23 -07001# Native Memory Allocator Verification
2This document describes how to verify the native memory allocator on Android.
3This procedure should be followed when upgrading or moving to a new allocator.
4A small minor upgrade might not need to run all of the benchmarks, however,
5at least the
6[SQL Allocation Trace Benchmark](#sql-allocation-trace-benchmark),
7[Memory Replay Benchmarks](#memory-replay-benchmarks) and
8[Performance Trace Benchmarks](#performance-trace-benchmarks) should be run.
9
10It is important to note that there are two modes for a native allocator
11to run in on Android. The first is the normal allocator, the second is
12called the svelte config, which is designed to run on memory constrained
Christopher Ferris05197f72019-08-07 14:27:52 -070013systems and be a bit slower, but take less RSS. To enable the svelte config,
Christopher Ferris4316d432019-06-27 00:08:23 -070014add this line to the `BoardConfig.mk` for the given target:
15
16 MALLOC_SVELTE := true
17
18The `BoardConfig.mk` file is usually found in the directory
19`device/<DEVICE_NAME>/` or in a sub directory.
20
21When evaluating a native allocator, make sure that you benchmark both
22versions.
23
24## Android Extensions
25Android supports a few non-standard functions and mallopt controls that
26a native allocator needs to implement.
27
28### Iterator Functions
29These are functions that are used to implement a memory leak detector
30called `libmemunreachable`.
31
32#### malloc\_disable
33This function, when called, should pause all threads that are making a
34call to an allocation function (malloc/free/etc). When a call
35is made to `malloc_enable`, the paused threads should start running again.
36
37#### malloc\_enable
38This function, when called, does nothing unless there was a previous call
39to `malloc_disable`. This call will unpause any thread which is making
40a call to an allocation function (malloc/free/etc) when `malloc_disable`
41was called previously.
42
43#### malloc\_iterate
44This function enumerates all of the allocations currently live in the
45system. It is meant to be called after a call to `malloc_disable` to
46prevent further allocations while this call is being executed. To
47see what is expected for this function, the best description is the
48tests for this funcion in `bionic/tests/malloc_itearte_test.cpp`.
49
50### Mallopt Extensions
51These are mallopt options that Android requires for a native allocator
52to work efficiently.
53
54#### M\_DECAY\_TIME
55When set to zero, `mallopt(M_DECAY_TIME, 0)`, it is expected that an
56allocator will attempt to purge and release any unused memory back to the
57kernel on free calls. This is important in Android to avoid consuming extra
Christopher Ferris05197f72019-08-07 14:27:52 -070058RSS.
Christopher Ferris4316d432019-06-27 00:08:23 -070059
60When set to non-zero, `mallopt(M_DECAY_TIME, 1)`, an allocator can delay the
61purge and release action. The amount of delay is up to the allocator
62implementation, but it should be a reasonable amount of time. The jemalloc
63allocator was implemented to have a one second delay.
64
65The drawback to this option is that most allocators do not have a separate
66thread to handle the purge, so the decay is only handled when an
67allocation operation occurs. For server processes, this can mean that
Christopher Ferris05197f72019-08-07 14:27:52 -070068RSS is slightly higher when the server is waiting for the next connection
Christopher Ferris4316d432019-06-27 00:08:23 -070069and no other allocation calls are made. The `M_PURGE` option is used to
70force a purge in this case.
71
72For all applications on Android, the call `mallopt(M_DECAY_TIME, 1)` is
73made by default. The idea is that it allows application frees to run a
Christopher Ferris05197f72019-08-07 14:27:52 -070074bit faster, while only increasing RSS a bit.
Christopher Ferris4316d432019-06-27 00:08:23 -070075
76#### M\_PURGE
77When called, `mallopt(M_PURGE, 0)`, an allocator should purge and release
78any unused memory immediately. The argument for this call is ignored. If
79possible, this call should clear thread cached memory if it exists. The
80idea is that this can be called to purge memory that has not been
81purged when `M_DECAY_TIME` is set to one. This is useful if you have a
82server application that does a lot of native allocations and the
83application wants to purge that memory before waiting for the next connection.
84
85## Correctness Tests
86These are the tests that should be run to verify an allocator is
87working properly according to Android.
88
89### Bionic Unit Tests
90The bionic unit tests contain a small number of allocator tests. These
91tests are primarily verifying Android extensions and non-standard behavior
92of allocation routines such as what happens when a non-power of two alignment
93is passed to memalign.
94
95To run all of the compliance tests:
96
97 adb shell /data/nativetest64/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*"
98 adb shell /data/nativetest/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*"
99
100The allocation tests are not meant to be complete, so it is expected
101that a native allocator will have its own set of tests that can be run.
102
Christopher Ferris51863b32019-10-25 15:24:16 -0700103### Libmemunreachable Tests
104The libmemunreachable tests verify that the iterator functions are working
105properly.
106
107To run all of the tests:
108
109 adb shell /data/nativetest64/memunreachable_binder_test/memunreachable_binder_test
110 adb shell /data/nativetest/memunreachable_binder_test/memunreachable_binder_test
111 adb shell /data/nativetest64/memunreachable_test/memunreachable_test
112 adb shell /data/nativetest/memunreachable_test/memunreachable_test
113 adb shell /data/nativetest64/memunreachable_unit_test/memunreachable_unit_test
114 adb shell /data/nativetest/memunreachable_unit_test/memunreachable_unit_test
115
Christopher Ferris4316d432019-06-27 00:08:23 -0700116### CTS Entropy Test
117In addition to the bionic tests, there is also a CTS test that is designed
118to verify that the addresses returned by malloc are sufficiently randomized
119to help defeat potential security bugs.
120
121Run this test thusly:
122
123 atest AslrMallocTest
124
125If there are multiple devices connected to the system, use `-s <SERIAL>`
126to specify a device.
127
128## Performance
129There are multiple different ways to evaluate the performance of a native
130allocator on Android. One is allocation speed in various different scenarios,
Christopher Ferris05197f72019-08-07 14:27:52 -0700131another is total RSS taken by the allocator.
Christopher Ferris4316d432019-06-27 00:08:23 -0700132
133The last is virtual address space consumed in 32 bit applications. There is
134a limited amount of address space available in 32 bit apps, and there have
135been allocator bugs that cause memory failures when too much virtual
136address space is consumed. For 64 bit executables, this can be ignored.
137
138### Bionic Benchmarks
139These are the microbenchmarks that are part of the bionic benchmarks suite of
140benchmarks. These benchmarks can be built using this command:
141
142 mmma -j bionic/benchmarks
143
144These benchmarks are only used to verify the speed of the allocator and
Christopher Ferris05197f72019-08-07 14:27:52 -0700145ignore anything related to RSS and virtual address space consumed.
Christopher Ferris4316d432019-06-27 00:08:23 -0700146
Christopher Ferris75edf162019-11-13 13:55:17 -0800147For all of these benchmark runs, it can be useful to add these two options:
148
149 --benchmark_repetitions=XX
150 --benchmark_report_aggregates_only=true
151
152This will run the benchmark XX times and then give a mean, median, and stddev
153and helps to get a number that can be compared to the new allocator.
154
155In addition, there is another option:
156
157 --bionic_cpu=XX
158
159Which will lock the benchmark to only run on core XX. This also avoids
160any issue related to the code migrating from one core to another
161with different characteristics. For example, on a big-little cpu, if the
162benchmark moves from big to little or vice-versa, this can cause scores
163to fluctuate in indeterminte ways.
164
165For most runs, the best set of options to add is:
166
167 --benchmark_repetitions=10 --benchmark_report_aggregates_only=true --bionic_cpu=3
168
169On most phones with a big-little cpu, the third core is the little core.
170Choosing to run on the little core can tend to highlight any performance
171differences.
172
Christopher Ferris4316d432019-06-27 00:08:23 -0700173#### Allocate/Free Benchmarks
174These are the benchmarks to verify the allocation speed of a loop doing a
175single allocation, touching every page in the allocation to make it resident
176and then freeing the allocation.
177
178To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
179
180 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_default
181 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_default
182
183To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
184
185 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_decay1
186 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_decay1
187
188The last value in the output is the size of the allocation in bytes. It is
189useful to look at these kinds of benchmarks to make sure that there are
190no outliers, but these numbers should not be used to make a final decision.
191If these numbers are slightly worse than the current allocator, the
192single thread numbers from trace data is a better representative of
193real world situations.
194
195#### Multiple Allocations Retained Benchmarks
196These are the benchmarks that examine how the allocator handles multiple
197allocations of the same size at the same time.
198
199The first set of these benchmarks does a set number of 8192 byte allocations
200in one loop, and then frees all of the allocations at the end of the loop.
201Only the time it takes to do the allocations is recorded, the frees are not
202counted. The value of 8192 was chosen since the jemalloc native allocator
203had issues with this size. It is possible other sizes might show different
204results, but, as mentioned before, these microbenchmark numbers should
205not be used as absolutes for determining if an allocator is worth using.
206
207This benchmark is designed to verify that there is no performance issue
208related to having multiple allocations alive at the same time.
209
210To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
211
212 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default
213 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default
214
215To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
216
217 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1
218 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1
219
220For these benchmarks, the last parameter is the total number of allocations to
221do in each loop.
222
223The other variation of this benchmark is to always do forty allocations in
224each loop, but vary the size of the forty allocations. As with the other
225benchmark, only the time it takes to do the allocations is tracked, the
226frees are not counted. Forty allocations is an arbitrary number that could
227be modified in the future. It was chosen because a version of the native
228allocator, jemalloc, showed a problem at forty allocations.
229
230To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
231
232 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default
233 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default
234
235To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these command:
236
237 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1
238 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1
239
240For these benchmarks, the last parameter in the output is the size of the
241allocation in bytes.
242
243As with the other microbenchmarks, an allocator with numbers in the same
244proximity of the current values is usually sufficient to consider making
245a switch. The trace benchmarks are more important than these benchmarks
246since they simulate real world allocation profiles.
247
248#### SQL Allocation Trace Benchmark
249This benchmark is a trace of the allocations performed when running
250the SQLite BenchMark app.
251
252This benchmark is designed to verify that the allocator will be performant
253in a real world allocation scenario. SQL operations were chosen as a
254benchmark because these operations tend to do lots of malloc/realloc/free
255calls, and they tend to be on the critical path of applications.
256
257To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
258
259 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default
260 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default
261
262To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
263
264 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1
265 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1
266
267These numbers should be as performant as the current allocator.
268
Christopher Ferris75edf162019-11-13 13:55:17 -0800269#### mallinfo Benchmark
270This benchmark only verifies that mallinfo is still close to the performance
271of the current allocator.
272
273To run the benchmark, use these commands:
274
275 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallinfo
276 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallinfo
277
278Calls to mallinfo are used in ART so a new allocator is required to be
279nearly as performant as the current allocator.
280
Christopher Ferris4316d432019-06-27 00:08:23 -0700281### Memory Trace Benchmarks
Christopher Ferris05197f72019-08-07 14:27:52 -0700282These benchmarks measure all three axes of a native allocator, RSS, virtual
Christopher Ferris4316d432019-06-27 00:08:23 -0700283address space consumed, speed of allocation. They are designed to
284run on a trace of the allocations from a real world application or system
285process.
286
287To build this benchmark:
288
289 mmma -j system/extras/memory_replay
290
291This will build two executables:
292
293 /system/bin/memory_replay32
294 /system/bin/memory_replay64
295
296And these two benchmark executables:
297
298 /data/benchmarktest64/trace_benchmark/trace_benchmark
299 /data/benchmarktest/trace_benchmark/trace_benchmark
300
301#### Memory Replay Benchmarks
Christopher Ferris05197f72019-08-07 14:27:52 -0700302These benchmarks display RSS, virtual memory consumed (VA space), and do a
Christopher Ferris4316d432019-06-27 00:08:23 -0700303bit of performance testing on actual traces taken from running applications.
304
305The trace data includes what thread does each operation, so the replay
306mechanism will simulate this by creating threads and replaying the operations
307on a thread as if it was rerunning the real trace. The only issue is that
308this is a worst case scenario for allocations happening at the same time
309in all threads since it collapses all of the allocation operations to occur
310one after another. This will cause a lot of threads allocating at the same
311time. The trace data does not include timestamps,
312so it is not possible to create a completely accurate replay.
313
314To generate these traces, see the [Malloc Debug documentation](https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/README.md),
315the option [record\_allocs](https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/README.md#record_allocs_total_entries).
316
Christopher Ferris2f5fc332019-10-17 14:21:03 -0700317To run these benchmarks, first copy the trace files to the target using
318these commands:
Christopher Ferris4316d432019-06-27 00:08:23 -0700319
Christopher Ferrisaa22c0c2019-08-14 15:17:26 -0700320 adb shell push system/extras/traces /data/local/tmp
Christopher Ferris4316d432019-06-27 00:08:23 -0700321
322Since all of the traces come from applications, the `memory_replay` program
323will always call `mallopt(M_DECAY_TIME, 1)' before running the trace.
324
325Run the benchmark thusly:
326
Christopher Ferris2f5fc332019-10-17 14:21:03 -0700327 adb shell memory_replay64 /data/local/tmp/traces/XXX.zip
328 adb shell memory_replay32 /data/local/tmp/traces/XXX.zip
Christopher Ferris4316d432019-06-27 00:08:23 -0700329
Christopher Ferris2f5fc332019-10-17 14:21:03 -0700330Where XXX.zip is the name of a zipped trace file. The `memory_replay`
331program also can process text files, but all trace files are currently
332checked in as zip files.
Christopher Ferris4316d432019-06-27 00:08:23 -0700333
Christopher Ferris05197f72019-08-07 14:27:52 -0700334Every 100000 allocation operations, a dump of the RSS and VA space will be
335performed. At the end, a final RSS and VA space number will be printed.
Christopher Ferris4316d432019-06-27 00:08:23 -0700336For the most part, the intermediate data can be ignored, but it is always
337a good idea to look over the data to verify that no strange spikes are
338occurring.
339
340The performance number is a measure of the time it takes to perform all of
341the allocation calls (malloc/memalign/posix_memalign/realloc/free/etc).
342For any call that allocates a pointer, the time for the call and the time
343it takes to make the pointer completely resident in memory is included.
344
345The performance numbers for these runs tend to have a wide variability so
346they should not be used as absolute value for comparison against the
347current allocator. But, they should be in the same range as the current
348values.
349
350When evaluating an allocator, one of the most important traces is the
351camera.txt trace. The camera application does very large allocations,
352and some allocators might leave large virtual address maps around
353rather than delete them. When that happens, it can lead to allocation
354failures and would cause the camera app to abort/crash. It is
355important to verify that when running this trace using the 32 bit replay
356executable, the virtual address space consumed is not much larger than the
357current allocator. A small increase (on the order of a few MBs) would be okay.
358
Christopher Ferris05197f72019-08-07 14:27:52 -0700359There is no specific benchmark for memory fragmentation, instead, the RSS
360when running the memory traces acts as a proxy for this. An allocator that
361is fragmenting badly will show an increase in RSS. The best trace for
362tracking fragmentation is system\_server.txt which is an extremely long
363trace (~13 million operations). The total number of live allocations goes
364up and down a bit, but stays mostly the same so an allocator that fragments
365badly would likely show an abnormal increase in RSS on this trace.
366
Christopher Ferris4316d432019-06-27 00:08:23 -0700367NOTE: When a native allocator calls mmap, it is expected that the allocator
368will name the map using the call:
369
370 prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, <PTR>, <SIZE>, "libc_malloc");
371
372If the native allocator creates a different name, then it necessary to
373modify the file:
374
375 system/extras/memory_replay/NativeInfo.cpp
376
377The `GetNativeInfo` function needs to be modified to include the name
378of the maps that this allocator includes.
379
380In addition, in order for the frameworks code to keep track of the memory
381of a process, any named maps must be added to the file:
382
383 frameworks/base/core/jni/android_os_Debug.cpp
384
385Modify the `load_maps` function and add a check of the new expected name.
386
387#### Performance Trace Benchmarks
388This is a benchmark that treats the trace data as if all allocations
389occurred in a single thread. This is the scenario that could
390happen if all of the allocations are spaced out in time so no thread
391every does an allocation at the same time as another thread.
392
393Run these benchmarks thusly:
394
395 adb shell /data/benchmarktest64/trace_benchmark/trace_benchmark
396 adb shell /data/benchmarktest/trace_benchmark/trace_benchmark
397
398When run without any arguments, the benchmark will run over all of the
399traces and display data. It takes many minutes to complete these runs in
400order to get as accurate a number as possible.