init: add "shared_kallsyms" option for tracing daemons
The perfetto tracing daemons can currently symbolise kernel addresses
within ftrace & perf_event data on debuggable builds, but not on release
builds. This is due to concerns with side-effects of changing the
kptr_restrict sysctl at runtime, as it affects more than just
/proc/kallsyms. The sysctl also controls vsprintf behaviour in the
kernel (affecting printk and e.g. /proc/vmallocinfo), as well as other
procfs files such as /proc/modules. This patch adds a special case to
init to allow for kernel address symbolisation on release builds without
any additional kptr_restrict changes.
The key observations are:
* symbol visibility through an opened /proc/kallsyms fd is fixed for the
lifetime of that fd as it's cached in the kernel seq_file structure.
* second_stage init is responsible for changing the kptr_restrict from
the on-boot Linux default (0) to the Android's default of 2. init is
therefore uniquely positioned to open /proc/kallsyms while addresses
are still visible to itself (due to CAP_SYSLOG).
This patch makes second_stage init open /proc/kallsyms and save that fd
as a static, which is then duplicated and passed to services specifying
a new "shared_kallsyms" option. Permissions are enforced by selinux,
since the service domains needs to be allowed to read files with the
`proc_kallsyms` label.
See go/perfetto-kallsyms-user for more details.
Test: atest CtsPerfettoTestCases
Test: manual tracing on panther-trunk_staging-user{,debug}
Bug: 383513654
Change-Id: I7bc707521349b7f50a32283642927bc4982dd9a1
diff --git a/init/README.md b/init/README.md
index 560c528..653dadd 100644
--- a/init/README.md
+++ b/init/README.md
@@ -369,6 +369,17 @@
`setenv <name> <value>`
> Set the environment variable _name_ to _value_ in the launched process.
+`shared_kallsyms`
+> If set, init will behave as if the service specified "file /proc/kallsyms r",
+ except the service will receive a duplicate of a single fd that init saved
+ during early second\_stage. This fd retains address visibility even after the
+ systemwide kptr\_restrict sysctl is set to its steady state on Android. The
+ ability to read from this fd is still constrained by selinux permissions,
+ which need to be granted separately and are gated by a neverallow.
+ Because of performance gotchas of concurrent use of this shared fd, all uses
+ need to coordinate via provisional flock(LOCK\_EX) locks on separately opened
+ /proc/kallsyms fds (since locking requires distinct open file descriptions).
+
`shutdown <shutdown_behavior>`
> Set shutdown behavior of the service process. When this is not specified,
the service is killed during shutdown process by using SIGTERM and SIGKILL.
diff --git a/init/init.cpp b/init/init.cpp
index 5b0b0dd..b6ba6a8 100644
--- a/init/init.cpp
+++ b/init/init.cpp
@@ -1055,6 +1055,14 @@
}
}
+ // This needs to happen before SetKptrRestrictAction, as we are trying to
+ // open /proc/kallsyms while still being allowed to see the full addresses
+ // (since init holds CAP_SYSLOG, and Linux boots with kptr_restrict=0). The
+ // address visibility through the saved fd (more specifically, the backing
+ // open file description) will then be remembered by the kernel for the rest
+ // of its lifetime, even after we raise the kptr_restrict.
+ Service::OpenAndSaveStaticKallsymsFd();
+
am.QueueBuiltinAction(SetupCgroupsAction, "SetupCgroups");
am.QueueBuiltinAction(SetKptrRestrictAction, "SetKptrRestrict");
am.QueueBuiltinAction(TestPerfEventSelinuxAction, "TestPerfEventSelinux");
diff --git a/init/service.cpp b/init/service.cpp
index d76a5d5..5630020 100644
--- a/init/service.cpp
+++ b/init/service.cpp
@@ -34,6 +34,7 @@
#include <android-base/scopeguard.h>
#include <android-base/stringprintf.h>
#include <android-base/strings.h>
+#include <cutils/android_get_control_file.h>
#include <cutils/sockets.h>
#include <processgroup/processgroup.h>
#include <selinux/selinux.h>
@@ -672,6 +673,14 @@
}
}
+ if (shared_kallsyms_file_) {
+ if (auto result = CreateSharedKallsymsFd(); result.ok()) {
+ descriptors.emplace_back(std::move(*result));
+ } else {
+ LOG(INFO) << "Could not obtain a copy of /proc/kallsyms: " << result.error();
+ }
+ }
+
pid_t pid = -1;
if (namespaces_.flags) {
pid = clone(nullptr, nullptr, namespaces_.flags | SIGCHLD, nullptr);
@@ -835,6 +844,35 @@
return unique_fd(signalfd(-1, &mask, SFD_CLOEXEC));
}
+void Service::OpenAndSaveStaticKallsymsFd() {
+ Result<Descriptor> result = CreateSharedKallsymsFd();
+ if (!result.ok()) {
+ LOG(ERROR) << result.error();
+ }
+}
+
+// This function is designed to be called in two situations:
+// 1) early during second_stage init, to open and save the shared fd as a
+// static (see OpenAndSaveStaticKallsymsFd).
+// 2) whenever a service requesting a copy of the fd is being started, at which
+// point it will get a duplicated copy of the static fd.
+Result<Descriptor> Service::CreateSharedKallsymsFd() {
+ static constexpr char kallsyms_path[] = "/proc/kallsyms";
+ static int static_fd = open(kallsyms_path, O_RDONLY | O_NONBLOCK | O_CLOEXEC);
+ if (static_fd < 0) {
+ return ErrnoError() << "failed to open " << kallsyms_path;
+ }
+
+ unique_fd fd{fcntl(static_fd, F_DUPFD_CLOEXEC, /*min_fd=*/3)};
+ if (fd < 0) {
+ return ErrnoError() << "failed fcntl(F_DUPFD_CLOEXEC)";
+ }
+
+ // Use the same environment variable as if the service specified
+ // "file /proc/kallsyms r".
+ return Descriptor(std::string(ANDROID_FILE_ENV_PREFIX) + kallsyms_path, std::move(fd));
+}
+
void Service::SetStartedInFirstStage(pid_t pid) {
LOG(INFO) << "adding first-stage service '" << name_ << "'...";
diff --git a/init/service.h b/init/service.h
index ae75553..7193d7e 100644
--- a/init/service.h
+++ b/init/service.h
@@ -158,6 +158,7 @@
static int sigchld_fd = CreateSigchldFd().release();
return sigchld_fd;
}
+ static void OpenAndSaveStaticKallsymsFd();
private:
void NotifyStateChange(const std::string& new_state) const;
@@ -171,6 +172,7 @@
InterprocessFifo setsid_finished);
void SetMountNamespace();
static ::android::base::unique_fd CreateSigchldFd();
+ static Result<Descriptor> CreateSharedKallsymsFd();
static unsigned long next_start_order_;
static bool is_exec_service_running_;
@@ -188,6 +190,7 @@
std::optional<std::string> fatal_reboot_target_; // reboot target of fatal handler
bool was_last_exit_ok_ =
true; // true if the service never exited, or exited with status code 0
+ bool shared_kallsyms_file_ = false; // pass the service a pre-opened fd to /proc/kallsyms
std::optional<CapSet> capabilities_;
ProcessAttributes proc_attr_;
diff --git a/init/service_parser.cpp b/init/service_parser.cpp
index ec3b176..4c31718 100644
--- a/init/service_parser.cpp
+++ b/init/service_parser.cpp
@@ -309,6 +309,11 @@
return {};
}
+Result<void> ServiceParser::ParseSharedKallsyms(std::vector<std::string>&& args) {
+ service_->shared_kallsyms_file_ = true;
+ return {};
+}
+
Result<void> ServiceParser::ParseMemcgSwappiness(std::vector<std::string>&& args) {
if (!ParseInt(args[1], &service_->swappiness_, 0)) {
return Error() << "swappiness value must be equal or greater than 0";
@@ -603,6 +608,7 @@
{"rlimit", {3, 3, &ServiceParser::ParseProcessRlimit}},
{"seclabel", {1, 1, &ServiceParser::ParseSeclabel}},
{"setenv", {2, 2, &ServiceParser::ParseSetenv}},
+ {"shared_kallsyms", {0, 0, &ServiceParser::ParseSharedKallsyms}},
{"shutdown", {1, 1, &ServiceParser::ParseShutdown}},
{"sigstop", {0, 0, &ServiceParser::ParseSigstop}},
{"socket", {3, 6, &ServiceParser::ParseSocket}},
diff --git a/init/service_parser.h b/init/service_parser.h
index f06cfc4..e42b62b 100644
--- a/init/service_parser.h
+++ b/init/service_parser.h
@@ -67,6 +67,7 @@
Result<void> ParseRestartPeriod(std::vector<std::string>&& args);
Result<void> ParseSeclabel(std::vector<std::string>&& args);
Result<void> ParseSetenv(std::vector<std::string>&& args);
+ Result<void> ParseSharedKallsyms(std::vector<std::string>&& args);
Result<void> ParseShutdown(std::vector<std::string>&& args);
Result<void> ParseSigstop(std::vector<std::string>&& args);
Result<void> ParseSocket(std::vector<std::string>&& args);