Blame - docs/native_allocator.md - android_bionic

blob: 249b1448c2fee5c69c10fcafcb3e3938d22cc3f4 [file] [log] [blame] [view]

Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	1	# Native Memory Allocator Verification
				2	This document describes how to verify the native memory allocator on Android.
				3	This procedure should be followed when upgrading or moving to a new allocator.
				4	A small minor upgrade might not need to run all of the benchmarks, however,
				5	at least the
				6	[SQL Allocation Trace Benchmark](#sql-allocation-trace-benchmark),
				7	[Memory Replay Benchmarks](#memory-replay-benchmarks) and
				8	[Performance Trace Benchmarks](#performance-trace-benchmarks) should be run.
				9
				10	It is important to note that there are two modes for a native allocator
				11	to run in on Android. The first is the normal allocator, the second is
				12	called the svelte config, which is designed to run on memory constrained
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	13	systems and be a bit slower, but take less RSS. To enable the svelte config,
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	14	add this line to the `BoardConfig.mk` for the given target:
				15
				16	MALLOC_SVELTE := true
				17
				18	The `BoardConfig.mk` file is usually found in the directory
				19	`device/<DEVICE_NAME>/` or in a sub directory.
				20
				21	When evaluating a native allocator, make sure that you benchmark both
				22	versions.
				23
				24	## Android Extensions
				25	Android supports a few non-standard functions and mallopt controls that
				26	a native allocator needs to implement.
				27
				28	### Iterator Functions
				29	These are functions that are used to implement a memory leak detector
				30	called `libmemunreachable`.
				31
				32	#### malloc\_disable
				33	This function, when called, should pause all threads that are making a
				34	call to an allocation function (malloc/free/etc). When a call
				35	is made to `malloc_enable`, the paused threads should start running again.
				36
				37	#### malloc\_enable
				38	This function, when called, does nothing unless there was a previous call
				39	to `malloc_disable`. This call will unpause any thread which is making
				40	a call to an allocation function (malloc/free/etc) when `malloc_disable`
				41	was called previously.
				42
				43	#### malloc\_iterate
				44	This function enumerates all of the allocations currently live in the
				45	system. It is meant to be called after a call to `malloc_disable` to
				46	prevent further allocations while this call is being executed. To
				47	see what is expected for this function, the best description is the
				48	tests for this funcion in `bionic/tests/malloc_itearte_test.cpp`.
				49
				50	### Mallopt Extensions
				51	These are mallopt options that Android requires for a native allocator
				52	to work efficiently.
				53
				54	#### M\_DECAY\_TIME
				55	When set to zero, `mallopt(M_DECAY_TIME, 0)`, it is expected that an
				56	allocator will attempt to purge and release any unused memory back to the
				57	kernel on free calls. This is important in Android to avoid consuming extra
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	58	RSS.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	59
				60	When set to non-zero, `mallopt(M_DECAY_TIME, 1)`, an allocator can delay the
				61	purge and release action. The amount of delay is up to the allocator
				62	implementation, but it should be a reasonable amount of time. The jemalloc
				63	allocator was implemented to have a one second delay.
				64
				65	The drawback to this option is that most allocators do not have a separate
				66	thread to handle the purge, so the decay is only handled when an
				67	allocation operation occurs. For server processes, this can mean that
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	68	RSS is slightly higher when the server is waiting for the next connection
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	69	and no other allocation calls are made. The `M_PURGE` option is used to
				70	force a purge in this case.
				71
				72	For all applications on Android, the call `mallopt(M_DECAY_TIME, 1)` is
				73	made by default. The idea is that it allows application frees to run a
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	74	bit faster, while only increasing RSS a bit.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	75
				76	#### M\_PURGE
				77	When called, `mallopt(M_PURGE, 0)`, an allocator should purge and release
				78	any unused memory immediately. The argument for this call is ignored. If
				79	possible, this call should clear thread cached memory if it exists. The
				80	idea is that this can be called to purge memory that has not been
				81	purged when `M_DECAY_TIME` is set to one. This is useful if you have a
				82	server application that does a lot of native allocations and the
				83	application wants to purge that memory before waiting for the next connection.
				84
				85	## Correctness Tests
				86	These are the tests that should be run to verify an allocator is
				87	working properly according to Android.
				88
				89	### Bionic Unit Tests
				90	The bionic unit tests contain a small number of allocator tests. These
				91	tests are primarily verifying Android extensions and non-standard behavior
				92	of allocation routines such as what happens when a non-power of two alignment
				93	is passed to memalign.
				94
				95	To run all of the compliance tests:
				96
				97	adb shell /data/nativetest64/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*"
				98	adb shell /data/nativetest/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*"
				99
				100	The allocation tests are not meant to be complete, so it is expected
				101	that a native allocator will have its own set of tests that can be run.
				102
Christopher Ferris	51863b3	2019-10-25 15:24:16 -0700	[diff] [blame]	103	### Libmemunreachable Tests
				104	The libmemunreachable tests verify that the iterator functions are working
				105	properly.
				106
				107	To run all of the tests:
				108
				109	adb shell /data/nativetest64/memunreachable_binder_test/memunreachable_binder_test
				110	adb shell /data/nativetest/memunreachable_binder_test/memunreachable_binder_test
				111	adb shell /data/nativetest64/memunreachable_test/memunreachable_test
				112	adb shell /data/nativetest/memunreachable_test/memunreachable_test
				113	adb shell /data/nativetest64/memunreachable_unit_test/memunreachable_unit_test
				114	adb shell /data/nativetest/memunreachable_unit_test/memunreachable_unit_test
				115
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	116	### CTS Entropy Test
				117	In addition to the bionic tests, there is also a CTS test that is designed
				118	to verify that the addresses returned by malloc are sufficiently randomized
				119	to help defeat potential security bugs.
				120
				121	Run this test thusly:
				122
				123	atest AslrMallocTest
				124
				125	If there are multiple devices connected to the system, use `-s <SERIAL>`
				126	to specify a device.
				127
				128	## Performance
				129	There are multiple different ways to evaluate the performance of a native
				130	allocator on Android. One is allocation speed in various different scenarios,
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	131	another is total RSS taken by the allocator.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	132
				133	The last is virtual address space consumed in 32 bit applications. There is
				134	a limited amount of address space available in 32 bit apps, and there have
				135	been allocator bugs that cause memory failures when too much virtual
				136	address space is consumed. For 64 bit executables, this can be ignored.
				137
				138	### Bionic Benchmarks
				139	These are the microbenchmarks that are part of the bionic benchmarks suite of
				140	benchmarks. These benchmarks can be built using this command:
				141
				142	mmma -j bionic/benchmarks
				143
				144	These benchmarks are only used to verify the speed of the allocator and
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	145	ignore anything related to RSS and virtual address space consumed.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	146
Christopher Ferris	75edf16	2019-11-13 13:55:17 -0800	[diff] [blame^]	147	For all of these benchmark runs, it can be useful to add these two options:
				148
				149	--benchmark_repetitions=XX
				150	--benchmark_report_aggregates_only=true
				151
				152	This will run the benchmark XX times and then give a mean, median, and stddev
				153	and helps to get a number that can be compared to the new allocator.
				154
				155	In addition, there is another option:
				156
				157	--bionic_cpu=XX
				158
				159	Which will lock the benchmark to only run on core XX. This also avoids
				160	any issue related to the code migrating from one core to another
				161	with different characteristics. For example, on a big-little cpu, if the
				162	benchmark moves from big to little or vice-versa, this can cause scores
				163	to fluctuate in indeterminte ways.
				164
				165	For most runs, the best set of options to add is:
				166
				167	--benchmark_repetitions=10 --benchmark_report_aggregates_only=true --bionic_cpu=3
				168
				169	On most phones with a big-little cpu, the third core is the little core.
				170	Choosing to run on the little core can tend to highlight any performance
				171	differences.
				172
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	173	#### Allocate/Free Benchmarks
				174	These are the benchmarks to verify the allocation speed of a loop doing a
				175	single allocation, touching every page in the allocation to make it resident
				176	and then freeing the allocation.
				177
				178	To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
				179
				180	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_default
				181	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_default
				182
				183	To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
				184
				185	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_decay1
				186	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_decay1
				187
				188	The last value in the output is the size of the allocation in bytes. It is
				189	useful to look at these kinds of benchmarks to make sure that there are
				190	no outliers, but these numbers should not be used to make a final decision.
				191	If these numbers are slightly worse than the current allocator, the
				192	single thread numbers from trace data is a better representative of
				193	real world situations.
				194
				195	#### Multiple Allocations Retained Benchmarks
				196	These are the benchmarks that examine how the allocator handles multiple
				197	allocations of the same size at the same time.
				198
				199	The first set of these benchmarks does a set number of 8192 byte allocations
				200	in one loop, and then frees all of the allocations at the end of the loop.
				201	Only the time it takes to do the allocations is recorded, the frees are not
				202	counted. The value of 8192 was chosen since the jemalloc native allocator
				203	had issues with this size. It is possible other sizes might show different
				204	results, but, as mentioned before, these microbenchmark numbers should
				205	not be used as absolutes for determining if an allocator is worth using.
				206
				207	This benchmark is designed to verify that there is no performance issue
				208	related to having multiple allocations alive at the same time.
				209
				210	To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
				211
				212	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default
				213	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default
				214
				215	To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
				216
				217	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1
				218	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1
				219
				220	For these benchmarks, the last parameter is the total number of allocations to
				221	do in each loop.
				222
				223	The other variation of this benchmark is to always do forty allocations in
				224	each loop, but vary the size of the forty allocations. As with the other
				225	benchmark, only the time it takes to do the allocations is tracked, the
				226	frees are not counted. Forty allocations is an arbitrary number that could
				227	be modified in the future. It was chosen because a version of the native
				228	allocator, jemalloc, showed a problem at forty allocations.
				229
				230	To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
				231
				232	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default
				233	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default
				234
				235	To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these command:
				236
				237	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1
				238	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1
				239
				240	For these benchmarks, the last parameter in the output is the size of the
				241	allocation in bytes.
				242
				243	As with the other microbenchmarks, an allocator with numbers in the same
				244	proximity of the current values is usually sufficient to consider making
				245	a switch. The trace benchmarks are more important than these benchmarks
				246	since they simulate real world allocation profiles.
				247
				248	#### SQL Allocation Trace Benchmark
				249	This benchmark is a trace of the allocations performed when running
				250	the SQLite BenchMark app.
				251
				252	This benchmark is designed to verify that the allocator will be performant
				253	in a real world allocation scenario. SQL operations were chosen as a
				254	benchmark because these operations tend to do lots of malloc/realloc/free
				255	calls, and they tend to be on the critical path of applications.
				256
				257	To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
				258
				259	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default
				260	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default
				261
				262	To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
				263
				264	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1
				265	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1
				266
				267	These numbers should be as performant as the current allocator.
				268
Christopher Ferris	75edf16	2019-11-13 13:55:17 -0800	[diff] [blame^]	269	#### mallinfo Benchmark
				270	This benchmark only verifies that mallinfo is still close to the performance
				271	of the current allocator.
				272
				273	To run the benchmark, use these commands:
				274
				275	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallinfo
				276	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallinfo
				277
				278	Calls to mallinfo are used in ART so a new allocator is required to be
				279	nearly as performant as the current allocator.
				280
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	281	### Memory Trace Benchmarks
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	282	These benchmarks measure all three axes of a native allocator, RSS, virtual
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	283	address space consumed, speed of allocation. They are designed to
				284	run on a trace of the allocations from a real world application or system
				285	process.
				286
				287	To build this benchmark:
				288
				289	mmma -j system/extras/memory_replay
				290
				291	This will build two executables:
				292
				293	/system/bin/memory_replay32
				294	/system/bin/memory_replay64
				295
				296	And these two benchmark executables:
				297
				298	/data/benchmarktest64/trace_benchmark/trace_benchmark
				299	/data/benchmarktest/trace_benchmark/trace_benchmark
				300
				301	#### Memory Replay Benchmarks
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	302	These benchmarks display RSS, virtual memory consumed (VA space), and do a
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	303	bit of performance testing on actual traces taken from running applications.
				304
				305	The trace data includes what thread does each operation, so the replay
				306	mechanism will simulate this by creating threads and replaying the operations
				307	on a thread as if it was rerunning the real trace. The only issue is that
				308	this is a worst case scenario for allocations happening at the same time
				309	in all threads since it collapses all of the allocation operations to occur
				310	one after another. This will cause a lot of threads allocating at the same
				311	time. The trace data does not include timestamps,
				312	so it is not possible to create a completely accurate replay.
				313
				314	To generate these traces, see the [Malloc Debug documentation](https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/README.md),
				315	the option [record\_allocs](https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/README.md#record_allocs_total_entries).
				316
Christopher Ferris	2f5fc33	2019-10-17 14:21:03 -0700	[diff] [blame]	317	To run these benchmarks, first copy the trace files to the target using
				318	these commands:
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	319
Christopher Ferris	aa22c0c	2019-08-14 15:17:26 -0700	[diff] [blame]	320	adb shell push system/extras/traces /data/local/tmp
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	321
				322	Since all of the traces come from applications, the `memory_replay` program
				323	will always call `mallopt(M_DECAY_TIME, 1)' before running the trace.
				324
				325	Run the benchmark thusly:
				326
Christopher Ferris	2f5fc33	2019-10-17 14:21:03 -0700	[diff] [blame]	327	adb shell memory_replay64 /data/local/tmp/traces/XXX.zip
				328	adb shell memory_replay32 /data/local/tmp/traces/XXX.zip
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	329
Christopher Ferris	2f5fc33	2019-10-17 14:21:03 -0700	[diff] [blame]	330	Where XXX.zip is the name of a zipped trace file. The `memory_replay`
				331	program also can process text files, but all trace files are currently
				332	checked in as zip files.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	333
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	334	Every 100000 allocation operations, a dump of the RSS and VA space will be
				335	performed. At the end, a final RSS and VA space number will be printed.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	336	For the most part, the intermediate data can be ignored, but it is always
				337	a good idea to look over the data to verify that no strange spikes are
				338	occurring.
				339
				340	The performance number is a measure of the time it takes to perform all of
				341	the allocation calls (malloc/memalign/posix_memalign/realloc/free/etc).
				342	For any call that allocates a pointer, the time for the call and the time
				343	it takes to make the pointer completely resident in memory is included.
				344
				345	The performance numbers for these runs tend to have a wide variability so
				346	they should not be used as absolute value for comparison against the
				347	current allocator. But, they should be in the same range as the current
				348	values.
				349
				350	When evaluating an allocator, one of the most important traces is the
				351	camera.txt trace. The camera application does very large allocations,
				352	and some allocators might leave large virtual address maps around
				353	rather than delete them. When that happens, it can lead to allocation
				354	failures and would cause the camera app to abort/crash. It is
				355	important to verify that when running this trace using the 32 bit replay
				356	executable, the virtual address space consumed is not much larger than the
				357	current allocator. A small increase (on the order of a few MBs) would be okay.
				358
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	359	There is no specific benchmark for memory fragmentation, instead, the RSS
				360	when running the memory traces acts as a proxy for this. An allocator that
				361	is fragmenting badly will show an increase in RSS. The best trace for
				362	tracking fragmentation is system\_server.txt which is an extremely long
				363	trace (~13 million operations). The total number of live allocations goes
				364	up and down a bit, but stays mostly the same so an allocator that fragments
				365	badly would likely show an abnormal increase in RSS on this trace.
				366
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	367	NOTE: When a native allocator calls mmap, it is expected that the allocator
				368	will name the map using the call:
				369
				370	prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, <PTR>, <SIZE>, "libc_malloc");
				371
				372	If the native allocator creates a different name, then it necessary to
				373	modify the file:
				374
				375	system/extras/memory_replay/NativeInfo.cpp
				376
				377	The `GetNativeInfo` function needs to be modified to include the name
				378	of the maps that this allocator includes.
				379
				380	In addition, in order for the frameworks code to keep track of the memory
				381	of a process, any named maps must be added to the file:
				382
				383	frameworks/base/core/jni/android_os_Debug.cpp
				384
				385	Modify the `load_maps` function and add a check of the new expected name.
				386
				387	#### Performance Trace Benchmarks
				388	This is a benchmark that treats the trace data as if all allocations
				389	occurred in a single thread. This is the scenario that could
				390	happen if all of the allocations are spaced out in time so no thread
				391	every does an allocation at the same time as another thread.
				392
				393	Run these benchmarks thusly:
				394
				395	adb shell /data/benchmarktest64/trace_benchmark/trace_benchmark
				396	adb shell /data/benchmarktest/trace_benchmark/trace_benchmark
				397
				398	When run without any arguments, the benchmark will run over all of the
				399	traces and display data. It takes many minutes to complete these runs in
				400	order to get as accurate a number as possible.