Improve memory coherence management in screenshot code [DO NOT MERGE]

The existing code worked in practice, but wasn't quite correct in
theory and relied on implementation details of other code. It's still
somewhat unusual and subtle, but now is correct-in-theory (I believe)
and a little better documented.

Bug: 16044767
Change-Id: I22b01d6640f0b7beca7cbfc74981795a3218b064
(cherry picked from commit c61576794e75898a829eac52fc524c8e907b4b02)
diff --git a/services/surfaceflinger/Barrier.h b/services/surfaceflinger/Barrier.h
index 6f8507e..3e9d443 100644
--- a/services/surfaceflinger/Barrier.h
+++ b/services/surfaceflinger/Barrier.h
@@ -28,15 +28,25 @@
 public:
     inline Barrier() : state(CLOSED) { }
     inline ~Barrier() { }
+
+    // Release any threads waiting at the Barrier.
+    // Provides release semantics: preceding loads and stores will be visible
+    // to other threads before they wake up.
     void open() {
         Mutex::Autolock _l(lock);
         state = OPENED;
         cv.broadcast();
     }
+
+    // Reset the Barrier, so wait() will block until open() has been called.
     void close() {
         Mutex::Autolock _l(lock);
         state = CLOSED;
     }
+
+    // Wait until the Barrier is OPEN.
+    // Provides acquire semantics: no subsequent loads or stores will occur
+    // until wait() returns.
     void wait() const {
         Mutex::Autolock _l(lock);
         while (state == CLOSED) {