From | Sent On | Attachments |
---|---|---|
Raymond Hettinger | Apr 13, 2004 8:00 pm | |
Jeff Epler | Apr 13, 2004 9:10 pm | |
Bob Ippolito | Apr 13, 2004 9:26 pm | |
Jeff Epler | Apr 13, 2004 10:04 pm | |
Raymond Hettinger | Apr 13, 2004 10:17 pm | |
Jeff Epler | Apr 13, 2004 11:10 pm | |
Guido van Rossum | Apr 13, 2004 11:26 pm | |
Tim Peters | Apr 13, 2004 11:56 pm | |
Jeff Epler | Apr 14, 2004 9:08 am | |
Raymond Hettinger | Apr 14, 2004 12:06 pm | |
Andrew MacIntyre | Apr 14, 2004 3:23 pm | |
Jeff Epler | Apr 14, 2004 3:35 pm | |
Mike Pall | Apr 14, 2004 5:50 pm | |
Tim Peters | Apr 14, 2004 11:14 pm | |
Michael Hudson | Apr 15, 2004 7:05 am | |
Mike Pall | Apr 15, 2004 9:36 am | |
Guido van Rossum | Apr 15, 2004 10:27 am | |
Jeremy Hylton | Apr 15, 2004 10:38 am | |
Guido van Rossum | Apr 15, 2004 10:42 am | |
Mike Pall | Apr 15, 2004 11:56 am | |
Mike Pall | Apr 15, 2004 11:56 am | |
Skip Montanaro | Apr 15, 2004 11:59 am | |
Michael Hudson | Apr 15, 2004 1:27 pm | |
Raymond Hettinger | Apr 15, 2004 2:22 pm | |
Thomas Heller | Apr 15, 2004 2:31 pm | |
"Martin v. Löwis" | Apr 15, 2004 3:07 pm | |
Jeremy Hylton | Apr 15, 2004 11:26 pm | |
Tim Peters | Apr 16, 2004 12:18 am | |
"Martin v. Löwis" | Apr 16, 2004 2:00 am | |
Andrew MacIntyre | Apr 16, 2004 9:14 pm |
Subject: | [Python-Dev] String hash function multiplier | ![]() |
---|---|---|
From: | Tim Peters (tim_...@email.msn.com) | |
Date: | Apr 13, 2004 11:56:38 pm | |
List: | org.python.python-dev |
[Raymond]
Does anyone have any issues with changing the hash multiplier for the string and Unicode hash functions?
Don't touch it unless you can prove major benefits -- it's a remarkable fact of life that the current multiplier hasn't resulted in any real-life (but non-contrived) pathological cases.
Instead of 1000003, I would like to use Aho's 65599, a prime near 2**16 that is efficiently expressible as (x << 6) + (x << 16) - x. This replaces a multiply with fast adds and shifts (with the two shifts potentially being executed in parallel).
It's unclear why that's a good thing. Perhaps you think shifts and adds are faster? I wouldn't -- the imul instruction on modern Pentiums is very fast.
It is clear why it may be a bad thing: that it *can* be expressed as just a couple shifts and adds makes it suspect as a good scrambler (read Knuth).
Googling for "hash 65599" shows a long history of widespread use and testing without any problems.
Testing in the context of Python string hashing? I didn't think so <wink>. The current multiplier has been studied extensively *in* Python, both via real-life use and via large-scale focused statistical testing (we got some money to do that during BeOpen.com's brief life -- a surprise that came out of that was that CRC made a terrible string hash function as the number of input strings got large). The right thing to compare Python's string hash to is "the standard" Fowler-Noll-Vo string hash, which was developed independently, but which also ended up using a large multiplier.