because we use simple mutex to protect rwlock, if simple mutex is
improved, rwlock should be improved by this side effect.
But rwlock will be significantly more expensive than a simple mutex
when uncontested, right?
I have a highly optimised rwlock mutex implementation at
=markup. If uncontested, it requires no more than a variable
increment and a TLS variable increment (with corresponding complexity
for unlocking) for read locking and four variable increments, three
variable stores, one TLS variable read for write locking. It's also