Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 1 | # zygote-start is what officially starts netd (see //system/core/rootdir/init.rc) |
| 2 | # However, on some hardware it's started from post-fs-data as well, which is just |
| 3 | # a tad earlier. There's no benefit to that though, since on 4.9+ P+ devices netd |
| 4 | # will just block until bpfloader finishes and sets the bpf.progs_loaded property. |
| 5 | # |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 6 | # It is important that we start bpfloader after: |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 7 | # - /sys/fs/bpf is already mounted, |
| 8 | # - apex (incl. rollback) is initialized (so that in the future we can load bpf |
| 9 | # programs shipped as part of apex mainline modules) |
| 10 | # - logd is ready for us to log stuff |
| 11 | # |
| 12 | # At the same time we want to be as early as possible to reduce races and thus |
| 13 | # failures (before memory is fragmented, and cpu is busy running tons of other |
| 14 | # stuff) and we absolutely want to be before netd and the system boot slot is |
| 15 | # considered to have booted successfully. |
| 16 | # |
| 17 | on load_bpf_programs |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 18 | exec_start bpfloader |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 19 | |
Maciej Żenczykowski | 8c21593 | 2024-03-05 19:02:11 -0800 | [diff] [blame^] | 20 | # Note: This will actually execute /apex/com.android.tethering/bin/netbpfload |
| 21 | # by virtue of 'service bpfloader' being overridden by the apex shipped .rc |
| 22 | # Warning: most of the below settings are irrelevant unless the apex is missing. |
| 23 | service bpfloader /system/bin/false |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 24 | # netbpfload will do network bpf loading, then execute /system/bin/bpfloader |
Maciej Żenczykowski | 8c21593 | 2024-03-05 19:02:11 -0800 | [diff] [blame^] | 25 | #! capabilities CHOWN SYS_ADMIN NET_ADMIN |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 26 | # The following group memberships are a workaround for lack of DAC_OVERRIDE |
| 27 | # and allow us to open (among other things) files that we created and are |
| 28 | # no longer root owned (due to CHOWN) but still have group read access to |
| 29 | # one of the following groups. This is not perfect, but a more correct |
| 30 | # solution requires significantly more effort to implement. |
Maciej Żenczykowski | 8c21593 | 2024-03-05 19:02:11 -0800 | [diff] [blame^] | 31 | #! group root graphics network_stack net_admin net_bw_acct net_bw_stats net_raw system |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 32 | user root |
| 33 | # |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 34 | # Set RLIMIT_MEMLOCK to 1GiB for bpfloader |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 35 | # |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 36 | # Actually only 8MiB would be needed if bpfloader ran as its own uid. |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 37 | # |
| 38 | # However, while the rlimit is per-thread, the accounting is system wide. |
| 39 | # So, for example, if the graphics stack has already allocated 10MiB of |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 40 | # memlock data before bpfloader even gets a chance to run, it would fail |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 41 | # if its memlock rlimit is only 8MiB - since there would be none left for it. |
| 42 | # |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 43 | # bpfloader succeeding is critical to system health, since a failure will |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 44 | # cause netd crashloop and thus system server crashloop... and the only |
| 45 | # recovery is a full kernel reboot. |
| 46 | # |
| 47 | # We've had issues where devices would sometimes (rarely) boot into |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 48 | # a crashloop because bpfloader would occasionally lose a boot time |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 49 | # race against the graphics stack's boot time locked memory allocation. |
| 50 | # |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 51 | # Thus bpfloader's memlock has to be 8MB higher then the locked memory |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 52 | # consumption of the root uid anywhere else in the system... |
| 53 | # But we don't know what that is for all possible devices... |
| 54 | # |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 55 | # Ideally, we'd simply grant bpfloader the IPC_LOCK capability and it |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 56 | # would simply ignore it's memlock rlimit... but it turns that this |
| 57 | # capability is not even checked by the kernel's bpf system call. |
| 58 | # |
| 59 | # As such we simply use 1GiB as a reasonable approximation of infinity. |
| 60 | # |
Maciej Żenczykowski | 8c21593 | 2024-03-05 19:02:11 -0800 | [diff] [blame^] | 61 | #! rlimit memlock 1073741824 1073741824 |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 62 | oneshot |
| 63 | # |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 64 | # How to debug bootloops caused by 'bpfloader-failed'. |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 65 | # |
| 66 | # 1. On some lower RAM devices (like wembley) you may need to first enable developer mode |
| 67 | # (from the Settings app UI), and change the developer option "Logger buffer sizes" |
| 68 | # from the default (wembley: 64kB) to the maximum (1M) per log buffer. |
| 69 | # Otherwise buffer will overflow before you manage to dump it and you'll get useless logs. |
| 70 | # |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 71 | # 2. comment out 'reboot_on_failure reboot,bpfloader-failed' below |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 72 | # 3. rebuild/reflash/reboot |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 73 | # 4. as the device is booting up capture bpfloader logs via: |
| 74 | # adb logcat -s 'bpfloader:*' 'LibBpfLoader:*' 'NetBpfLoad:*' 'NetBpfLoader:*' |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 75 | # |
| 76 | # something like: |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 77 | # $ adb reboot; sleep 1; adb wait-for-device; adb root; sleep 1; adb wait-for-device; adb logcat -s 'bpfloader:*' 'LibBpfLoader:*' 'NetBpfLoad:*' 'NetBpfLoader:*' |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 78 | # will take care of capturing logs as early as possible |
| 79 | # |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 80 | # 5. look through the logs from the kernel's bpf verifier that bpfloader dumps out, |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 81 | # it usually makes sense to search back from the end and find the particular |
Maciej Żenczykowski | 7da54d9 | 2023-10-24 02:11:09 -0700 | [diff] [blame] | 82 | # bpf verifier failure that caused bpfloader to terminate early with an error code. |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 83 | # This will probably be something along the lines of 'too many jumps' or |
| 84 | # 'cannot prove return value is 0 or 1' or 'unsupported / unknown operation / helper', |
| 85 | # 'invalid bpf_context access', etc. |
| 86 | # |
Maciej Żenczykowski | 8c21593 | 2024-03-05 19:02:11 -0800 | [diff] [blame^] | 87 | reboot_on_failure reboot,netbpfload-missing |
Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame] | 88 | updatable |