init: add "shared_kallsyms" option for tracing daemons

The perfetto tracing daemons can currently symbolise kernel addresses
within ftrace & perf_event data on debuggable builds, but not on release
builds. This is due to concerns with side-effects of changing the
kptr_restrict sysctl at runtime, as it affects more than just
/proc/kallsyms. The sysctl also controls vsprintf behaviour in the
kernel (affecting printk and e.g. /proc/vmallocinfo), as well as other
procfs files such as /proc/modules. This patch adds a special case to
init to allow for kernel address symbolisation on release builds without
any additional kptr_restrict changes.

The key observations are:
* symbol visibility through an opened /proc/kallsyms fd is fixed for the
  lifetime of that fd as it's cached in the kernel seq_file structure.
* second_stage init is responsible for changing the kptr_restrict from
  the on-boot Linux default (0) to the Android's default of 2. init is
  therefore uniquely positioned to open /proc/kallsyms while addresses
  are still visible to itself (due to CAP_SYSLOG).

This patch makes second_stage init open /proc/kallsyms and save that fd
as a static, which is then duplicated and passed to services specifying
a new "shared_kallsyms" option. Permissions are enforced by selinux,
since the service domains needs to be allowed to read files with the
`proc_kallsyms` label.

See go/perfetto-kallsyms-user for more details.

Test: atest CtsPerfettoTestCases
Test: manual tracing on panther-trunk_staging-user{,debug}
Bug: 383513654
Change-Id: I7bc707521349b7f50a32283642927bc4982dd9a1
diff --git a/init/README.md b/init/README.md
index 560c528..653dadd 100644
--- a/init/README.md
+++ b/init/README.md
@@ -369,6 +369,17 @@
 `setenv <name> <value>`
 > Set the environment variable _name_ to _value_ in the launched process.
 
+`shared_kallsyms`
+> If set, init will behave as if the service specified "file /proc/kallsyms r",
+  except the service will receive a duplicate of a single fd that init saved
+  during early second\_stage. This fd retains address visibility even after the
+  systemwide kptr\_restrict sysctl is set to its steady state on Android. The
+  ability to read from this fd is still constrained by selinux permissions,
+  which need to be granted separately and are gated by a neverallow.
+  Because of performance gotchas of concurrent use of this shared fd, all uses
+  need to coordinate via provisional flock(LOCK\_EX) locks on separately opened
+  /proc/kallsyms fds (since locking requires distinct open file descriptions).
+
 `shutdown <shutdown_behavior>`
 > Set shutdown behavior of the service process. When this is not specified,
   the service is killed during shutdown process by using SIGTERM and SIGKILL.
diff --git a/init/init.cpp b/init/init.cpp
index 5b0b0dd..b6ba6a8 100644
--- a/init/init.cpp
+++ b/init/init.cpp
@@ -1055,6 +1055,14 @@
         }
     }
 
+    // This needs to happen before SetKptrRestrictAction, as we are trying to
+    // open /proc/kallsyms while still being allowed to see the full addresses
+    // (since init holds CAP_SYSLOG, and Linux boots with kptr_restrict=0). The
+    // address visibility through the saved fd (more specifically, the backing
+    // open file description) will then be remembered by the kernel for the rest
+    // of its lifetime, even after we raise the kptr_restrict.
+    Service::OpenAndSaveStaticKallsymsFd();
+
     am.QueueBuiltinAction(SetupCgroupsAction, "SetupCgroups");
     am.QueueBuiltinAction(SetKptrRestrictAction, "SetKptrRestrict");
     am.QueueBuiltinAction(TestPerfEventSelinuxAction, "TestPerfEventSelinux");
diff --git a/init/service.cpp b/init/service.cpp
index d76a5d5..5630020 100644
--- a/init/service.cpp
+++ b/init/service.cpp
@@ -34,6 +34,7 @@
 #include <android-base/scopeguard.h>
 #include <android-base/stringprintf.h>
 #include <android-base/strings.h>
+#include <cutils/android_get_control_file.h>
 #include <cutils/sockets.h>
 #include <processgroup/processgroup.h>
 #include <selinux/selinux.h>
@@ -672,6 +673,14 @@
         }
     }
 
+    if (shared_kallsyms_file_) {
+        if (auto result = CreateSharedKallsymsFd(); result.ok()) {
+            descriptors.emplace_back(std::move(*result));
+        } else {
+            LOG(INFO) << "Could not obtain a copy of /proc/kallsyms: " << result.error();
+        }
+    }
+
     pid_t pid = -1;
     if (namespaces_.flags) {
         pid = clone(nullptr, nullptr, namespaces_.flags | SIGCHLD, nullptr);
@@ -835,6 +844,35 @@
     return unique_fd(signalfd(-1, &mask, SFD_CLOEXEC));
 }
 
+void Service::OpenAndSaveStaticKallsymsFd() {
+    Result<Descriptor> result = CreateSharedKallsymsFd();
+    if (!result.ok()) {
+      LOG(ERROR) << result.error();
+    }
+}
+
+// This function is designed to be called in two situations:
+// 1) early during second_stage init, to open and save the shared fd as a
+//    static (see OpenAndSaveStaticKallsymsFd).
+// 2) whenever a service requesting a copy of the fd is being started, at which
+//    point it will get a duplicated copy of the static fd.
+Result<Descriptor> Service::CreateSharedKallsymsFd() {
+    static constexpr char kallsyms_path[] = "/proc/kallsyms";
+    static int static_fd = open(kallsyms_path, O_RDONLY | O_NONBLOCK | O_CLOEXEC);
+    if (static_fd < 0) {
+        return ErrnoError() << "failed to open " << kallsyms_path;
+    }
+
+    unique_fd fd{fcntl(static_fd, F_DUPFD_CLOEXEC, /*min_fd=*/3)};
+    if (fd < 0) {
+        return ErrnoError() << "failed fcntl(F_DUPFD_CLOEXEC)";
+    }
+
+    // Use the same environment variable as if the service specified
+    // "file /proc/kallsyms r".
+    return Descriptor(std::string(ANDROID_FILE_ENV_PREFIX) + kallsyms_path, std::move(fd));
+}
+
 void Service::SetStartedInFirstStage(pid_t pid) {
     LOG(INFO) << "adding first-stage service '" << name_ << "'...";
 
diff --git a/init/service.h b/init/service.h
index ae75553..7193d7e 100644
--- a/init/service.h
+++ b/init/service.h
@@ -158,6 +158,7 @@
         static int sigchld_fd = CreateSigchldFd().release();
         return sigchld_fd;
     }
+    static void OpenAndSaveStaticKallsymsFd();
 
   private:
     void NotifyStateChange(const std::string& new_state) const;
@@ -171,6 +172,7 @@
                     InterprocessFifo setsid_finished);
     void SetMountNamespace();
     static ::android::base::unique_fd CreateSigchldFd();
+    static Result<Descriptor> CreateSharedKallsymsFd();
 
     static unsigned long next_start_order_;
     static bool is_exec_service_running_;
@@ -188,6 +190,7 @@
     std::optional<std::string> fatal_reboot_target_;  // reboot target of fatal handler
     bool was_last_exit_ok_ =
             true;  // true if the service never exited, or exited with status code 0
+    bool shared_kallsyms_file_ = false; // pass the service a pre-opened fd to /proc/kallsyms
 
     std::optional<CapSet> capabilities_;
     ProcessAttributes proc_attr_;
diff --git a/init/service_parser.cpp b/init/service_parser.cpp
index ec3b176..4c31718 100644
--- a/init/service_parser.cpp
+++ b/init/service_parser.cpp
@@ -309,6 +309,11 @@
     return {};
 }
 
+Result<void> ServiceParser::ParseSharedKallsyms(std::vector<std::string>&& args) {
+    service_->shared_kallsyms_file_ = true;
+    return {};
+}
+
 Result<void> ServiceParser::ParseMemcgSwappiness(std::vector<std::string>&& args) {
     if (!ParseInt(args[1], &service_->swappiness_, 0)) {
         return Error() << "swappiness value must be equal or greater than 0";
@@ -603,6 +608,7 @@
         {"rlimit",                  {3,     3,    &ServiceParser::ParseProcessRlimit}},
         {"seclabel",                {1,     1,    &ServiceParser::ParseSeclabel}},
         {"setenv",                  {2,     2,    &ServiceParser::ParseSetenv}},
+        {"shared_kallsyms",         {0,     0,    &ServiceParser::ParseSharedKallsyms}},
         {"shutdown",                {1,     1,    &ServiceParser::ParseShutdown}},
         {"sigstop",                 {0,     0,    &ServiceParser::ParseSigstop}},
         {"socket",                  {3,     6,    &ServiceParser::ParseSocket}},
diff --git a/init/service_parser.h b/init/service_parser.h
index f06cfc4..e42b62b 100644
--- a/init/service_parser.h
+++ b/init/service_parser.h
@@ -67,6 +67,7 @@
     Result<void> ParseRestartPeriod(std::vector<std::string>&& args);
     Result<void> ParseSeclabel(std::vector<std::string>&& args);
     Result<void> ParseSetenv(std::vector<std::string>&& args);
+    Result<void> ParseSharedKallsyms(std::vector<std::string>&& args);
     Result<void> ParseShutdown(std::vector<std::string>&& args);
     Result<void> ParseSigstop(std::vector<std::string>&& args);
     Result<void> ParseSocket(std::vector<std::string>&& args);