atom feed30 messages in org.python.python-dev[Python-Dev] String hash function mul...
FromSent OnAttachments
Raymond HettingerApr 13, 2004 8:00 pm 
Jeff EplerApr 13, 2004 9:10 pm 
Bob IppolitoApr 13, 2004 9:26 pm 
Jeff EplerApr 13, 2004 10:04 pm 
Raymond HettingerApr 13, 2004 10:17 pm 
Jeff EplerApr 13, 2004 11:10 pm 
Guido van RossumApr 13, 2004 11:26 pm 
Tim PetersApr 13, 2004 11:56 pm 
Jeff EplerApr 14, 2004 9:08 am 
Raymond HettingerApr 14, 2004 12:06 pm 
Andrew MacIntyreApr 14, 2004 3:23 pm 
Jeff EplerApr 14, 2004 3:35 pm 
Mike PallApr 14, 2004 5:50 pm 
Tim PetersApr 14, 2004 11:14 pm 
Michael HudsonApr 15, 2004 7:05 am 
Mike PallApr 15, 2004 9:36 am 
Guido van RossumApr 15, 2004 10:27 am 
Jeremy HyltonApr 15, 2004 10:38 am 
Guido van RossumApr 15, 2004 10:42 am 
Mike PallApr 15, 2004 11:56 am 
Mike PallApr 15, 2004 11:56 am 
Skip MontanaroApr 15, 2004 11:59 am 
Michael HudsonApr 15, 2004 1:27 pm 
Raymond HettingerApr 15, 2004 2:22 pm 
Thomas HellerApr 15, 2004 2:31 pm 
"Martin v. Löwis"Apr 15, 2004 3:07 pm 
Jeremy HyltonApr 15, 2004 11:26 pm 
Tim PetersApr 16, 2004 12:18 am 
"Martin v. Löwis"Apr 16, 2004 2:00 am 
Andrew MacIntyreApr 16, 2004 9:14 pm 
Subject:[Python-Dev] String hash function multiplier
From:Bob Ippolito (bo@redivi.com)
Date:Apr 13, 2004 9:26:16 pm
List:org.python.python-dev

On Apr 13, 2004, at 9:09 PM, Jeff Epler wrote:

With -O2 -mcpu=i686 or newer, gcc uses "imul" for both 100003 and 65599, rather than shifts and adds.

There may be a few people who care about some other processor, but I wouldn't listen to them. (the only non-x86 CPU I program for on a weekly basis doesn't have hardware multiply, but it's also much too small for Python)

The current value goes back a long way: http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Objects/ stringobject.c#rev2.31 ... all the way back to when Python did string haching instead of hashing.

Other than some abstract beauty to 65599, are there some other practical advantages I'm missing?

It's not expected that GCC optimize an integer constant into shifts on its own. Anyways, a practical advantage is that with a sane instruction set, like PPC, it saves you a memory access or some instructions (depending on the compiler I guess). Both 100003 and 65599 are too big to be immediate values in a PPC instruction, but the shift constants are not.

I guess the real question for Raymond is, does it really make a measurable difference? And what effect does it have on pickled dicts (or other such hash-using data structures), if any?

-bob