Maciej Żenczykowski | 7db65c6 | 2023-10-19 16:51:15 -0700 | [diff] [blame^] | 1 | # zygote-start is what officially starts netd (see //system/core/rootdir/init.rc) |
| 2 | # However, on some hardware it's started from post-fs-data as well, which is just |
| 3 | # a tad earlier. There's no benefit to that though, since on 4.9+ P+ devices netd |
| 4 | # will just block until bpfloader finishes and sets the bpf.progs_loaded property. |
| 5 | # |
| 6 | # It is important that we start netbpfload after: |
| 7 | # - /sys/fs/bpf is already mounted, |
| 8 | # - apex (incl. rollback) is initialized (so that in the future we can load bpf |
| 9 | # programs shipped as part of apex mainline modules) |
| 10 | # - logd is ready for us to log stuff |
| 11 | # |
| 12 | # At the same time we want to be as early as possible to reduce races and thus |
| 13 | # failures (before memory is fragmented, and cpu is busy running tons of other |
| 14 | # stuff) and we absolutely want to be before netd and the system boot slot is |
| 15 | # considered to have booted successfully. |
| 16 | # |
| 17 | on load_bpf_programs |
| 18 | exec_start netbpfload |
| 19 | |
| 20 | service netbpfload /system/bin/netbpfload |
| 21 | capabilities CHOWN SYS_ADMIN NET_ADMIN |
| 22 | # The following group memberships are a workaround for lack of DAC_OVERRIDE |
| 23 | # and allow us to open (among other things) files that we created and are |
| 24 | # no longer root owned (due to CHOWN) but still have group read access to |
| 25 | # one of the following groups. This is not perfect, but a more correct |
| 26 | # solution requires significantly more effort to implement. |
| 27 | group root graphics network_stack net_admin net_bw_acct net_bw_stats net_raw system |
| 28 | user root |
| 29 | # |
| 30 | # Set RLIMIT_MEMLOCK to 1GiB for netbpfload |
| 31 | # |
| 32 | # Actually only 8MiB would be needed if netbpfload ran as its own uid. |
| 33 | # |
| 34 | # However, while the rlimit is per-thread, the accounting is system wide. |
| 35 | # So, for example, if the graphics stack has already allocated 10MiB of |
| 36 | # memlock data before netbpfload even gets a chance to run, it would fail |
| 37 | # if its memlock rlimit is only 8MiB - since there would be none left for it. |
| 38 | # |
| 39 | # netbpfload succeeding is critical to system health, since a failure will |
| 40 | # cause netd crashloop and thus system server crashloop... and the only |
| 41 | # recovery is a full kernel reboot. |
| 42 | # |
| 43 | # We've had issues where devices would sometimes (rarely) boot into |
| 44 | # a crashloop because netbpfload would occasionally lose a boot time |
| 45 | # race against the graphics stack's boot time locked memory allocation. |
| 46 | # |
| 47 | # Thus netbpfload's memlock has to be 8MB higher then the locked memory |
| 48 | # consumption of the root uid anywhere else in the system... |
| 49 | # But we don't know what that is for all possible devices... |
| 50 | # |
| 51 | # Ideally, we'd simply grant netbpfload the IPC_LOCK capability and it |
| 52 | # would simply ignore it's memlock rlimit... but it turns that this |
| 53 | # capability is not even checked by the kernel's bpf system call. |
| 54 | # |
| 55 | # As such we simply use 1GiB as a reasonable approximation of infinity. |
| 56 | # |
| 57 | rlimit memlock 1073741824 1073741824 |
| 58 | oneshot |
| 59 | # |
| 60 | # How to debug bootloops caused by 'netbpfload-failed'. |
| 61 | # |
| 62 | # 1. On some lower RAM devices (like wembley) you may need to first enable developer mode |
| 63 | # (from the Settings app UI), and change the developer option "Logger buffer sizes" |
| 64 | # from the default (wembley: 64kB) to the maximum (1M) per log buffer. |
| 65 | # Otherwise buffer will overflow before you manage to dump it and you'll get useless logs. |
| 66 | # |
| 67 | # 2. comment out 'reboot_on_failure reboot,netbpfload-failed' below |
| 68 | # 3. rebuild/reflash/reboot |
| 69 | # 4. as the device is booting up capture netbpfload logs via: |
| 70 | # adb logcat -s 'NetBpfLoad:*' 'NetBpfLoader:*' |
| 71 | # |
| 72 | # something like: |
| 73 | # $ adb reboot; sleep 1; adb wait-for-device; adb root; sleep 1; adb wait-for-device; adb logcat -s 'NetBpfLoad:*' 'NetBpfLoader:*' |
| 74 | # will take care of capturing logs as early as possible |
| 75 | # |
| 76 | # 5. look through the logs from the kernel's bpf verifier that netbpfload dumps out, |
| 77 | # it usually makes sense to search back from the end and find the particular |
| 78 | # bpf verifier failure that caused netbpfload to terminate early with an error code. |
| 79 | # This will probably be something along the lines of 'too many jumps' or |
| 80 | # 'cannot prove return value is 0 or 1' or 'unsupported / unknown operation / helper', |
| 81 | # 'invalid bpf_context access', etc. |
| 82 | # |
| 83 | reboot_on_failure reboot,netbpfload-failed |
| 84 | # we're not really updatable, but want to be able to load bpf programs shipped in apexes |
| 85 | updatable |