Please do make sure that "better code on x86" doesn't equally worse
code on other platforms. It may be the "majority" platform, but it is
not the only platform.
We are aware of that. In fact I gave up on trying to trick GCC into
producing good code by rearranging the C source. A single architecture
dependent #ifdef plus some inline assembly seems much cleaner. And it
can be removed quickly if it ever turns out to be a bad idea for newer x86