SF/CE: Fix startup race in HwcAsyncWorker
If a HwcAsyncWorker is created, and shortly after `HwcAsyncWorker::send`
is called to queue up a task to run, there is a chance that the thread
that is created to run the tasks hasn't gotten to waiting on the
internal condition variable before the `notify_one` call is made to wake
up any waiting threads.
The result is that the HwcAsyncWorker thread doesn't get notified, and
goes to sleep. Meanwhile the SurfaceFlinger main thread waits forever
for the task to complete. Eventually the System Services start timing
out as SurfaceFlinger isn't processing requests.
The fix is simple -- switch to the two argument (predicate) form of
`std::condition_variable::wait()`, with the predicate checking for a
task (and the done flag). The predicate is checked before actually
waiting on the condition variable.
Additionally a unit test is added to verify this and a few other edge
conditions. The test fails (times out) without the implementation fix
when run with `atest`, and passes/completes with it.
Note that this is normally an uncommon error particularly on most
devices as HwcAsyncWorkers are only created when new outputs (virtual or
physical displays) are created, and there is usually a big gap between
when the HwcAsyncWorker is created and when it is actually used. It
ended up being more of a problem on Chromebooks as we create outputs for
every new app window, which happens much more frequently, and with a
much shorter delay between creation and use.
Flag: EXEMPT bugfix
Bug: 333824019
Test: atest libcompositionengine_test
Test: Smoke test of running DOTA Underlords on a Trogdor Chromebook
(cherry picked from https://googleplex-android-review.googlesource.com/q/commit:e0b2747abd28f4595a6f18f3f9fbe8aa15ae7000)
Merged-In: If3ad46a1297a0448dbdb505957081329f4cd2aca
Change-Id: If3ad46a1297a0448dbdb505957081329f4cd2aca
diff --git a/services/surfaceflinger/CompositionEngine/Android.bp b/services/surfaceflinger/CompositionEngine/Android.bp
index 7fa58df..610b10f 100644
--- a/services/surfaceflinger/CompositionEngine/Android.bp
+++ b/services/surfaceflinger/CompositionEngine/Android.bp
@@ -148,6 +148,7 @@
"tests/CompositionEngineTest.cpp",
"tests/DisplayColorProfileTest.cpp",
"tests/DisplayTest.cpp",
+ "tests/HwcAsyncWorkerTest.cpp",
"tests/HwcBufferCacheTest.cpp",
"tests/MockHWC2.cpp",
"tests/MockHWComposer.cpp",
diff --git a/services/surfaceflinger/CompositionEngine/src/HwcAsyncWorker.cpp b/services/surfaceflinger/CompositionEngine/src/HwcAsyncWorker.cpp
index 6086f0b..91385b4 100644
--- a/services/surfaceflinger/CompositionEngine/src/HwcAsyncWorker.cpp
+++ b/services/surfaceflinger/CompositionEngine/src/HwcAsyncWorker.cpp
@@ -24,6 +24,7 @@
#include <android-base/thread_annotations.h>
#include <cutils/sched_policy.h>
+#include <ftl/fake_guard.h>
namespace android::compositionengine::impl {
@@ -60,7 +61,7 @@
std::unique_lock<std::mutex> lock(mMutex);
android::base::ScopedLockAssertion assumeLock(mMutex);
while (!mDone) {
- mCv.wait(lock);
+ mCv.wait(lock, [this]() FTL_FAKE_GUARD(mMutex) { return mTaskRequested || mDone; });
if (mTaskRequested && mTask.valid()) {
mTask();
mTaskRequested = false;
diff --git a/services/surfaceflinger/CompositionEngine/tests/HwcAsyncWorkerTest.cpp b/services/surfaceflinger/CompositionEngine/tests/HwcAsyncWorkerTest.cpp
new file mode 100644
index 0000000..dd04df6
--- /dev/null
+++ b/services/surfaceflinger/CompositionEngine/tests/HwcAsyncWorkerTest.cpp
@@ -0,0 +1,71 @@
+/*
+ * Copyright 2024 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <future>
+
+#include <compositionengine/impl/HwcAsyncWorker.h>
+#include <gtest/gtest.h>
+
+namespace android::compositionengine {
+namespace {
+
+using namespace std::chrono_literals;
+
+// For the edge case tests below, how much real time should be spent trying to reproduce edge cases
+// problems in a loop.
+//
+// Larger values mean problems are more likely to be detected, at the cost of making the unit test
+// run slower.
+//
+// As we expect the tests to be run continuously, even a short loop will eventually catch
+// problems, though not necessarily from changes in the same build that introduce them.
+constexpr auto kWallTimeForEdgeCaseTests = 5ms;
+
+TEST(HwcAsyncWorker, continuousTasksEdgeCase) {
+ // Ensures that a single worker that is given multiple tasks in short succession will run them.
+
+ impl::HwcAsyncWorker worker;
+ const auto endTime = std::chrono::steady_clock::now() + kWallTimeForEdgeCaseTests;
+ while (std::chrono::steady_clock::now() < endTime) {
+ auto f1 = worker.send([] { return false; });
+ EXPECT_FALSE(f1.get());
+ auto f2 = worker.send([] { return true; });
+ EXPECT_TRUE(f2.get());
+ }
+}
+
+TEST(HwcAsyncWorker, constructAndDestroyEdgeCase) {
+ // Ensures that newly created HwcAsyncWorkers can be immediately destroyed.
+
+ const auto endTime = std::chrono::steady_clock::now() + kWallTimeForEdgeCaseTests;
+ while (std::chrono::steady_clock::now() < endTime) {
+ impl::HwcAsyncWorker worker;
+ }
+}
+
+TEST(HwcAsyncWorker, newlyCreatedRunsTasksEdgeCase) {
+ // Ensures that newly created HwcAsyncWorkers will run a task if given one immediately.
+
+ const auto endTime = std::chrono::steady_clock::now() + kWallTimeForEdgeCaseTests;
+ while (std::chrono::steady_clock::now() < endTime) {
+ impl::HwcAsyncWorker worker;
+ auto f = worker.send([] { return true; });
+ f.get();
+ }
+}
+
+} // namespace
+} // namespace android::compositionengine