fastboot: Increase maximum usbfs bulk size for writes to 256KiB

With a device capable of saturating the bus at SuperSpeed+,
the next bottleneck is the fixed (size-independent) overhead of
the usbfs ioctl() system calls, which includes entering/exiting
the kernel, allocating/deallocating a contiguous buffer for DMA,
configuring/deconfiguring the IOMMU and issuing the DMA to the HC. In
order to saturate the bus from the host software perspective, we must
reach the schedule() call in reap_as() before the next interrupt from
the HC indicating the completion of the URB.

In my experimental setup, with an SS+ capable host and device
and 16 KiB URBs, we reach the schedule() call in 25us, but the
URB is serviced in an estimated 16us, so we lose roughly a third
of the bandwidth. Increasing the URB size to 64KiB there are
65us between interrupts and 55us until schedule(). This means
we usually reach schedule() in time but not always, so we lose a
bit of bandwidth. Increasing it again to 128KiB and we have 128us
between interrupts and 65us until schedule(), so we're now comfortably
saturating the bus. In order to account for differences between hosts,
this CL uses a doubled maximum of 256KiB.

With larger allocation sizes we now risk contiguous allocation
failures, so I implemented a fallback where we try smaller sizes if
a larger one fails.

With this CL download speeds on my hosts are now around 980 MB/s over
SS+ and 440 MB/s over SS.

Bug: 325128548
Change-Id: Ie5ad480c73f2f71a50ce7f75ffb4aaa93ded2f0b
diff --git a/fastboot/usb_linux.cpp b/fastboot/usb_linux.cpp
index b7fd5ed..2d23e07 100644
--- a/fastboot/usb_linux.cpp
+++ b/fastboot/usb_linux.cpp
@@ -83,7 +83,18 @@
 // be reliable.
 // 256KiB seems to work, but 1MiB bulk transfers lock up my z620 with a 3.13
 // kernel.
-#define MAX_USBFS_BULK_SIZE (16 * 1024)
+// 128KiB was experimentally found to be enough to saturate the bus at
+// SuperSpeed+, so we first try double that for writes. If the operation fails
+// due to a lack of contiguous regions (or an ancient kernel), try smaller sizes
+// until we find one that works (see LinuxUsbTransport::Write). Reads are less
+// performance critical so for now just use a known good size.
+#define MAX_USBFS_BULK_WRITE_SIZE (256 * 1024)
+#define MAX_USBFS_BULK_READ_SIZE (16 * 1024)
+
+// This size should pretty much always work (it's compatible with pre-3.3
+// kernels and it's what we used to use historically), so if it doesn't work
+// something has gone badly wrong.
+#define MIN_USBFS_BULK_WRITE_SIZE (16 * 1024)
 
 struct usb_handle
 {
@@ -108,6 +119,7 @@
   private:
     std::unique_ptr<usb_handle> handle_;
     const uint32_t ms_timeout_;
+    size_t max_usbfs_bulk_write_size_ = MAX_USBFS_BULK_WRITE_SIZE;
 
     DISALLOW_COPY_AND_ASSIGN(LinuxUsbTransport);
 };
@@ -412,26 +424,32 @@
     }
 
     auto submit_urb = [&](size_t i) {
-        int xfer = (len > MAX_USBFS_BULK_SIZE) ? MAX_USBFS_BULK_SIZE : len;
+        while (true) {
+            int xfer = (len > max_usbfs_bulk_write_size_) ? max_usbfs_bulk_write_size_ : len;
 
-        urb[i].type = USBDEVFS_URB_TYPE_BULK;
-        urb[i].endpoint = handle_->ep_out;
-        urb[i].buffer_length = xfer;
-        urb[i].buffer = data;
-        urb[i].usercontext = (void *)i;
+            urb[i].type = USBDEVFS_URB_TYPE_BULK;
+            urb[i].endpoint = handle_->ep_out;
+            urb[i].buffer_length = xfer;
+            urb[i].buffer = data;
+            urb[i].usercontext = (void *)i;
 
-        int n = ioctl(handle_->desc, USBDEVFS_SUBMITURB, &urb[i]);
-        if (n != 0) {
-            DBG("ioctl(USBDEVFS_SUBMITURB) failed\n");
-            return false;
+            int n = ioctl(handle_->desc, USBDEVFS_SUBMITURB, &urb[i]);
+            if (n != 0) {
+                if (errno == ENOMEM && max_usbfs_bulk_write_size_ > MIN_USBFS_BULK_WRITE_SIZE) {
+                    max_usbfs_bulk_write_size_ /= 2;
+                    continue;
+                }
+                DBG("ioctl(USBDEVFS_SUBMITURB) failed\n");
+                return false;
+            }
+
+            pending[i] = true;
+            count += xfer;
+            len -= xfer;
+            data += xfer;
+
+            return true;
         }
-
-        pending[i] = true;
-        count += xfer;
-        len -= xfer;
-        data += xfer;
-
-        return true;
     };
 
     auto reap_urb = [&](size_t i) {
@@ -497,7 +515,7 @@
     }
 
     while (len > 0) {
-        int xfer = (len > MAX_USBFS_BULK_SIZE) ? MAX_USBFS_BULK_SIZE : len;
+        int xfer = (len > MAX_USBFS_BULK_READ_SIZE) ? MAX_USBFS_BULK_READ_SIZE : len;
 
         bulk.ep = handle_->ep_in;
         bulk.len = xfer;