crtbegin: Correctly align ESP to 16 for __i386__

The (lowest) address of the argument area (aka ESP immediately prior to
the call instruction) must be aligned to 0 mod 16. Here, it is aligned to
12 mod 16.

From the SysV ABI doc (2.2.2 The Stack Frame)

"""The end of the input argument area shall be aligned on a 16 (32, if
__m256 is passed on stack) byte boundary. In other words, the value
(%esp + 4) is always a multiple of 16 (32) when control is transferred to
the function entry point."""

Test: extract code into a separate C file and verify stack alignment in a
  "start_main" function
Test: use the upcoming NDK r17-beta1 (with new Bionic crtbegin*.o files)
  with an M-23 x86 system image, check alignment in main (compiled with
  Clang not GCC, compiled w/o -mstackrealign)
Bug: b/73140672

Change-Id: Ia8d93fe5668d0a514a9fd22c40bf8362805111e6
diff --git a/libc/arch-common/bionic/crtbegin.c b/libc/arch-common/bionic/crtbegin.c
index 31ad621..c4d2a5a 100644
--- a/libc/arch-common/bionic/crtbegin.c
+++ b/libc/arch-common/bionic/crtbegin.c
@@ -53,7 +53,7 @@
 #elif defined(__arm__)
 __asm__(PRE "mov r0,sp; b _start_main" POST);
 #elif defined(__i386__)
-__asm__(PRE "movl %esp,%eax; andl $~0xf,%esp; pushl %eax; calll _start_main" POST);
+__asm__(PRE "movl %esp,%eax; andl $~0xf,%esp; subl $12,%esp; pushl %eax; calll _start_main" POST);
 #elif defined(__x86_64__)
 __asm__(PRE "movq %rsp,%rdi; andq $~0xf,%rsp; callq _start_main" POST);
 #else