libbinder - avoid pthread_cond_broadcast per call

Because it uses:
- 0.4% of system_server CPU time
- 1% of com.android.bluetooth
- 0.8% of com.android.phone

This call is used to implement
IPCThreadState::blockUntilThreadAvailable, but this API is actually only
used by WatchDog.java, and due to the locking we have in place here, we
have more information than pthread does internally to tell it when a
broadcast would actually be useful.

Future considerations: this API is actually broken in the case of poll
calls or if too many userspace threads manually call joinRpcThreadpool.
We could move the binder part of WatchDog.java into a separate process
and completely remove all of the associated infrastructure. An external
process could call pingBinder (or similar) on different services. This
would have the same effect, but it would use the existing path of
processing a transaction in order to detect deadlocks.

Bug: 168806193
Test: boot, manually check how often this gets called now (only when
    the binder threadpool is saturated when this is called, so at most
    once/30 seconds given WatchDog's current implementation)
Change-Id: I44f8ff0d8ca2cdf236a9fa3ad1e3a0241663bfcd
diff --git a/libs/binder/IPCThreadState.cpp b/libs/binder/IPCThreadState.cpp
index 7d01e0b..be1b468 100644
--- a/libs/binder/IPCThreadState.cpp
+++ b/libs/binder/IPCThreadState.cpp
@@ -489,12 +489,14 @@
 void IPCThreadState::blockUntilThreadAvailable()
 {
     pthread_mutex_lock(&mProcess->mThreadCountLock);
+    mProcess->mWaitingForThreads++;
     while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {
         ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n",
                 static_cast<unsigned long>(mProcess->mExecutingThreadsCount),
                 static_cast<unsigned long>(mProcess->mMaxThreads));
         pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);
     }
+    mProcess->mWaitingForThreads--;
     pthread_mutex_unlock(&mProcess->mThreadCountLock);
 }
 
@@ -534,7 +536,12 @@
             }
             mProcess->mStarvationStartTimeMs = 0;
         }
-        pthread_cond_broadcast(&mProcess->mThreadCountDecrement);
+
+        // Cond broadcast can be expensive, so don't send it every time a binder
+        // call is processed. b/168806193
+        if (mProcess->mWaitingForThreads > 0) {
+            pthread_cond_broadcast(&mProcess->mThreadCountDecrement);
+        }
         pthread_mutex_unlock(&mProcess->mThreadCountLock);
     }