Looper: Use sequence numbers in epoll_event to track requests
Previously, Looper internally kept track of the requests to add fds
using the fd value itself. It used an internal sequence number
associated with each request to add a callback to avoid a situation
where a callback is unexpectedly called. However, since it used the fd
value rather than the sequence number to register events into epoll,
there was still a way where unintended hangups could occur.
This exact sequence of events caused unintended behavior in Looper:
- An fd (FD) is added to the looper.
- Looper registers FD into epoll using its FD number.
- FD is closed.
- A hangup event arrives from epoll_wait while the Looper is polling.
Looper is waiting for the lock to process the callback for FD, because
it is blocked by:
- A new fd is created and added to the looper. Since the lowest number
fd is reused, this new fd has the same value as FD.
- The poll request for Looper is now unblocked, so it looks up the
callback associated with FD to process the hangup.
- Since FD is already associated with the new callback, the new callback
is called unintentionally.
This CL uses the sequence number to register fds into epoll. That way,
when we get a hangup from epoll that is associated with a sequence
number, there is no way an unexpected callback will called.
This CL also adds a test to verify this behavior. Due to the
nondeterministic nature of this multi-thread scenario, the test verifies
this scenario repeatedly. Without the fix in Looper, the test is flaky,
but should never fail after the fix.
Bug: 195020232
Bug: 189135695
Test: atest libutils_test --rerun-until-failure
Ignore-AOSP-First: Topic CL aosp/1799831 has a merge conflict with
internal master, resolved in ag/15613419.
Change-Id: Ib4edab7f2407adaef6a1708b29bc52634f25dbb6
diff --git a/libutils/Looper_test.cpp b/libutils/Looper_test.cpp
index 34f424b..c859f9c 100644
--- a/libutils/Looper_test.cpp
+++ b/libutils/Looper_test.cpp
@@ -8,6 +8,9 @@
#include <utils/Looper.h>
#include <utils/StopWatch.h>
#include <utils/Timers.h>
+#include <thread>
+#include <unordered_map>
+#include <utility>
#include "Looper_test_pipe.h"
#include <utils/threads.h>
@@ -710,4 +713,123 @@
<< "no more messages to handle";
}
+class LooperEventCallback : public LooperCallback {
+ public:
+ using Callback = std::function<int(int fd, int events)>;
+ explicit LooperEventCallback(Callback callback) : mCallback(std::move(callback)) {}
+ int handleEvent(int fd, int events, void* /*data*/) override { return mCallback(fd, events); }
+
+ private:
+ Callback mCallback;
+};
+
+// A utility class that allows for pipes to be added and removed from the looper, and polls the
+// looper from a different thread.
+class ThreadedLooperUtil {
+ public:
+ explicit ThreadedLooperUtil(const sp<Looper>& looper) : mLooper(looper), mRunning(true) {
+ mThread = std::thread([this]() {
+ while (mRunning) {
+ static constexpr std::chrono::milliseconds POLL_TIMEOUT(500);
+ mLooper->pollOnce(POLL_TIMEOUT.count());
+ }
+ });
+ }
+
+ ~ThreadedLooperUtil() {
+ mRunning = false;
+ mThread.join();
+ }
+
+ // Create a new pipe, and return the write end of the pipe and the id used to track the pipe.
+ // The read end of the pipe is added to the looper.
+ std::pair<int /*id*/, base::unique_fd> createPipe() {
+ int pipeFd[2];
+ if (pipe(pipeFd)) {
+ ADD_FAILURE() << "pipe() failed.";
+ return {};
+ }
+ const int readFd = pipeFd[0];
+ const int writeFd = pipeFd[1];
+
+ int id;
+ { // acquire lock
+ std::scoped_lock l(mLock);
+
+ id = mNextId++;
+ mFds.emplace(id, readFd);
+
+ auto removeCallback = [this, id, readFd](int fd, int events) {
+ EXPECT_EQ(readFd, fd) << "Received callback for incorrect fd.";
+ if ((events & Looper::EVENT_HANGUP) == 0) {
+ return 1; // Not a hangup, keep the callback.
+ }
+ removePipe(id);
+ return 0; // Remove the callback.
+ };
+
+ mLooper->addFd(readFd, 0, Looper::EVENT_INPUT,
+ new LooperEventCallback(std::move(removeCallback)), nullptr);
+ } // release lock
+
+ return {id, base::unique_fd(writeFd)};
+ }
+
+ // Remove the pipe with the given id.
+ void removePipe(int id) {
+ std::scoped_lock l(mLock);
+ if (mFds.find(id) == mFds.end()) {
+ return;
+ }
+ mLooper->removeFd(mFds[id].get());
+ mFds.erase(id);
+ }
+
+ // Check if the pipe with the given id exists and has not been removed.
+ bool hasPipe(int id) {
+ std::scoped_lock l(mLock);
+ return mFds.find(id) != mFds.end();
+ }
+
+ private:
+ sp<Looper> mLooper;
+ std::atomic<bool> mRunning;
+ std::thread mThread;
+
+ std::mutex mLock;
+ std::unordered_map<int, base::unique_fd> mFds GUARDED_BY(mLock);
+ int mNextId GUARDED_BY(mLock) = 0;
+};
+
+TEST_F(LooperTest, MultiThreaded_NoUnexpectedFdRemoval) {
+ ThreadedLooperUtil util(mLooper);
+
+ // Iterate repeatedly to try to recreate a flaky instance.
+ for (int i = 0; i < 1000; i++) {
+ auto [firstPipeId, firstPipeFd] = util.createPipe();
+ const int firstFdNumber = firstPipeFd.get();
+
+ // Close the first pipe's fd, causing a fd hangup.
+ firstPipeFd.reset();
+
+ // Request to remove the pipe from this test thread. This causes a race for pipe removal
+ // between the hangup in the looper's thread and this remove request from the test thread.
+ util.removePipe(firstPipeId);
+
+ // Create the second pipe. Since the fds for the first pipe are closed, this pipe should
+ // have the same fd numbers as the first pipe because the lowest unused fd number is used.
+ const auto [secondPipeId, fd] = util.createPipe();
+ EXPECT_EQ(firstFdNumber, fd.get())
+ << "The first and second fds must match for the purposes of this test.";
+
+ // Wait for unexpected hangup to occur.
+ std::this_thread::sleep_for(std::chrono::milliseconds(1));
+
+ ASSERT_TRUE(util.hasPipe(secondPipeId)) << "The second pipe was removed unexpectedly.";
+
+ util.removePipe(secondPipeId);
+ }
+ SUCCEED() << "No unexpectedly removed fds.";
+}
+
} // namespace android