blob: 97c3648f094d048b3c8e8cc95320e984ebd13d50 [file] [log] [blame] [view]
Christopher Ferris4316d432019-06-27 00:08:23 -07001# Native Memory Allocator Verification
2This document describes how to verify the native memory allocator on Android.
3This procedure should be followed when upgrading or moving to a new allocator.
4A small minor upgrade might not need to run all of the benchmarks, however,
5at least the
6[SQL Allocation Trace Benchmark](#sql-allocation-trace-benchmark),
7[Memory Replay Benchmarks](#memory-replay-benchmarks) and
8[Performance Trace Benchmarks](#performance-trace-benchmarks) should be run.
9
10It is important to note that there are two modes for a native allocator
11to run in on Android. The first is the normal allocator, the second is
12called the svelte config, which is designed to run on memory constrained
13systems and be a bit slower, but take less PSS. To enable the svelte config,
14add this line to the `BoardConfig.mk` for the given target:
15
16 MALLOC_SVELTE := true
17
18The `BoardConfig.mk` file is usually found in the directory
19`device/<DEVICE_NAME>/` or in a sub directory.
20
21When evaluating a native allocator, make sure that you benchmark both
22versions.
23
24## Android Extensions
25Android supports a few non-standard functions and mallopt controls that
26a native allocator needs to implement.
27
28### Iterator Functions
29These are functions that are used to implement a memory leak detector
30called `libmemunreachable`.
31
32#### malloc\_disable
33This function, when called, should pause all threads that are making a
34call to an allocation function (malloc/free/etc). When a call
35is made to `malloc_enable`, the paused threads should start running again.
36
37#### malloc\_enable
38This function, when called, does nothing unless there was a previous call
39to `malloc_disable`. This call will unpause any thread which is making
40a call to an allocation function (malloc/free/etc) when `malloc_disable`
41was called previously.
42
43#### malloc\_iterate
44This function enumerates all of the allocations currently live in the
45system. It is meant to be called after a call to `malloc_disable` to
46prevent further allocations while this call is being executed. To
47see what is expected for this function, the best description is the
48tests for this funcion in `bionic/tests/malloc_itearte_test.cpp`.
49
50### Mallopt Extensions
51These are mallopt options that Android requires for a native allocator
52to work efficiently.
53
54#### M\_DECAY\_TIME
55When set to zero, `mallopt(M_DECAY_TIME, 0)`, it is expected that an
56allocator will attempt to purge and release any unused memory back to the
57kernel on free calls. This is important in Android to avoid consuming extra
58PSS.
59
60When set to non-zero, `mallopt(M_DECAY_TIME, 1)`, an allocator can delay the
61purge and release action. The amount of delay is up to the allocator
62implementation, but it should be a reasonable amount of time. The jemalloc
63allocator was implemented to have a one second delay.
64
65The drawback to this option is that most allocators do not have a separate
66thread to handle the purge, so the decay is only handled when an
67allocation operation occurs. For server processes, this can mean that
68PSS is slightly higher when the server is waiting for the next connection
69and no other allocation calls are made. The `M_PURGE` option is used to
70force a purge in this case.
71
72For all applications on Android, the call `mallopt(M_DECAY_TIME, 1)` is
73made by default. The idea is that it allows application frees to run a
74bit faster, while only increasing PSS a bit.
75
76#### M\_PURGE
77When called, `mallopt(M_PURGE, 0)`, an allocator should purge and release
78any unused memory immediately. The argument for this call is ignored. If
79possible, this call should clear thread cached memory if it exists. The
80idea is that this can be called to purge memory that has not been
81purged when `M_DECAY_TIME` is set to one. This is useful if you have a
82server application that does a lot of native allocations and the
83application wants to purge that memory before waiting for the next connection.
84
85## Correctness Tests
86These are the tests that should be run to verify an allocator is
87working properly according to Android.
88
89### Bionic Unit Tests
90The bionic unit tests contain a small number of allocator tests. These
91tests are primarily verifying Android extensions and non-standard behavior
92of allocation routines such as what happens when a non-power of two alignment
93is passed to memalign.
94
95To run all of the compliance tests:
96
97 adb shell /data/nativetest64/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*"
98 adb shell /data/nativetest/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*"
99
100The allocation tests are not meant to be complete, so it is expected
101that a native allocator will have its own set of tests that can be run.
102
103### CTS Entropy Test
104In addition to the bionic tests, there is also a CTS test that is designed
105to verify that the addresses returned by malloc are sufficiently randomized
106to help defeat potential security bugs.
107
108Run this test thusly:
109
110 atest AslrMallocTest
111
112If there are multiple devices connected to the system, use `-s <SERIAL>`
113to specify a device.
114
115## Performance
116There are multiple different ways to evaluate the performance of a native
117allocator on Android. One is allocation speed in various different scenarios,
118anoher is total PSS taken by the allocator.
119
120The last is virtual address space consumed in 32 bit applications. There is
121a limited amount of address space available in 32 bit apps, and there have
122been allocator bugs that cause memory failures when too much virtual
123address space is consumed. For 64 bit executables, this can be ignored.
124
125### Bionic Benchmarks
126These are the microbenchmarks that are part of the bionic benchmarks suite of
127benchmarks. These benchmarks can be built using this command:
128
129 mmma -j bionic/benchmarks
130
131These benchmarks are only used to verify the speed of the allocator and
132ignore anything related to PSS and virtual address space consumed.
133
134#### Allocate/Free Benchmarks
135These are the benchmarks to verify the allocation speed of a loop doing a
136single allocation, touching every page in the allocation to make it resident
137and then freeing the allocation.
138
139To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
140
141 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_default
142 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_default
143
144To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
145
146 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_decay1
147 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_decay1
148
149The last value in the output is the size of the allocation in bytes. It is
150useful to look at these kinds of benchmarks to make sure that there are
151no outliers, but these numbers should not be used to make a final decision.
152If these numbers are slightly worse than the current allocator, the
153single thread numbers from trace data is a better representative of
154real world situations.
155
156#### Multiple Allocations Retained Benchmarks
157These are the benchmarks that examine how the allocator handles multiple
158allocations of the same size at the same time.
159
160The first set of these benchmarks does a set number of 8192 byte allocations
161in one loop, and then frees all of the allocations at the end of the loop.
162Only the time it takes to do the allocations is recorded, the frees are not
163counted. The value of 8192 was chosen since the jemalloc native allocator
164had issues with this size. It is possible other sizes might show different
165results, but, as mentioned before, these microbenchmark numbers should
166not be used as absolutes for determining if an allocator is worth using.
167
168This benchmark is designed to verify that there is no performance issue
169related to having multiple allocations alive at the same time.
170
171To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
172
173 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default
174 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default
175
176To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
177
178 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1
179 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1
180
181For these benchmarks, the last parameter is the total number of allocations to
182do in each loop.
183
184The other variation of this benchmark is to always do forty allocations in
185each loop, but vary the size of the forty allocations. As with the other
186benchmark, only the time it takes to do the allocations is tracked, the
187frees are not counted. Forty allocations is an arbitrary number that could
188be modified in the future. It was chosen because a version of the native
189allocator, jemalloc, showed a problem at forty allocations.
190
191To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
192
193 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default
194 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default
195
196To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these command:
197
198 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1
199 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1
200
201For these benchmarks, the last parameter in the output is the size of the
202allocation in bytes.
203
204As with the other microbenchmarks, an allocator with numbers in the same
205proximity of the current values is usually sufficient to consider making
206a switch. The trace benchmarks are more important than these benchmarks
207since they simulate real world allocation profiles.
208
209#### SQL Allocation Trace Benchmark
210This benchmark is a trace of the allocations performed when running
211the SQLite BenchMark app.
212
213This benchmark is designed to verify that the allocator will be performant
214in a real world allocation scenario. SQL operations were chosen as a
215benchmark because these operations tend to do lots of malloc/realloc/free
216calls, and they tend to be on the critical path of applications.
217
218To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
219
220 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default
221 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default
222
223To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
224
225 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1
226 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1
227
228These numbers should be as performant as the current allocator.
229
230### Memory Trace Benchmarks
231These benchmarks measure all three axes of a native allocator, PSS, virtual
232address space consumed, speed of allocation. They are designed to
233run on a trace of the allocations from a real world application or system
234process.
235
236To build this benchmark:
237
238 mmma -j system/extras/memory_replay
239
240This will build two executables:
241
242 /system/bin/memory_replay32
243 /system/bin/memory_replay64
244
245And these two benchmark executables:
246
247 /data/benchmarktest64/trace_benchmark/trace_benchmark
248 /data/benchmarktest/trace_benchmark/trace_benchmark
249
250#### Memory Replay Benchmarks
251These benchmarks display PSS, virtual memory consumed (VA space), and do a
252bit of performance testing on actual traces taken from running applications.
253
254The trace data includes what thread does each operation, so the replay
255mechanism will simulate this by creating threads and replaying the operations
256on a thread as if it was rerunning the real trace. The only issue is that
257this is a worst case scenario for allocations happening at the same time
258in all threads since it collapses all of the allocation operations to occur
259one after another. This will cause a lot of threads allocating at the same
260time. The trace data does not include timestamps,
261so it is not possible to create a completely accurate replay.
262
263To generate these traces, see the [Malloc Debug documentation](https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/README.md),
264the option [record\_allocs](https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/README.md#record_allocs_total_entries).
265
266To run these benchmarks, first copy the trace files to the target and
267unzip them using these commands:
268
269 adb shell push system/extras/dumps /data/local/tmp
270 adb shell 'cd /data/local/tmp/dumps && for name in *.zip; do unzip $name; done'
271
272Since all of the traces come from applications, the `memory_replay` program
273will always call `mallopt(M_DECAY_TIME, 1)' before running the trace.
274
275Run the benchmark thusly:
276
277 adb shell memory_replay64 /data/local/tmp/dumps/XXX.txt
278 adb shell memory_replay32 /data/local/tmp/dumps/XXX.txt
279
280Where XXX.txt is the name of a trace file.
281
282Every 100000 allocation operations, a dump of the PSS and VA space will be
283performed. At the end, a final PSS and VA space number will be printed.
284For the most part, the intermediate data can be ignored, but it is always
285a good idea to look over the data to verify that no strange spikes are
286occurring.
287
288The performance number is a measure of the time it takes to perform all of
289the allocation calls (malloc/memalign/posix_memalign/realloc/free/etc).
290For any call that allocates a pointer, the time for the call and the time
291it takes to make the pointer completely resident in memory is included.
292
293The performance numbers for these runs tend to have a wide variability so
294they should not be used as absolute value for comparison against the
295current allocator. But, they should be in the same range as the current
296values.
297
298When evaluating an allocator, one of the most important traces is the
299camera.txt trace. The camera application does very large allocations,
300and some allocators might leave large virtual address maps around
301rather than delete them. When that happens, it can lead to allocation
302failures and would cause the camera app to abort/crash. It is
303important to verify that when running this trace using the 32 bit replay
304executable, the virtual address space consumed is not much larger than the
305current allocator. A small increase (on the order of a few MBs) would be okay.
306
307NOTE: When a native allocator calls mmap, it is expected that the allocator
308will name the map using the call:
309
310 prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, <PTR>, <SIZE>, "libc_malloc");
311
312If the native allocator creates a different name, then it necessary to
313modify the file:
314
315 system/extras/memory_replay/NativeInfo.cpp
316
317The `GetNativeInfo` function needs to be modified to include the name
318of the maps that this allocator includes.
319
320In addition, in order for the frameworks code to keep track of the memory
321of a process, any named maps must be added to the file:
322
323 frameworks/base/core/jni/android_os_Debug.cpp
324
325Modify the `load_maps` function and add a check of the new expected name.
326
327#### Performance Trace Benchmarks
328This is a benchmark that treats the trace data as if all allocations
329occurred in a single thread. This is the scenario that could
330happen if all of the allocations are spaced out in time so no thread
331every does an allocation at the same time as another thread.
332
333Run these benchmarks thusly:
334
335 adb shell /data/benchmarktest64/trace_benchmark/trace_benchmark
336 adb shell /data/benchmarktest/trace_benchmark/trace_benchmark
337
338When run without any arguments, the benchmark will run over all of the
339traces and display data. It takes many minutes to complete these runs in
340order to get as accurate a number as possible.