atom feed30 messages in org.python.python-dev[Python-Dev] Optimization targets
FromSent OnAttachments
Raymond HettingerApr 13, 2004 8:00 pm 
Jeff EplerApr 13, 2004 9:10 pm 
Bob IppolitoApr 13, 2004 9:26 pm 
Jeff EplerApr 13, 2004 10:04 pm 
Raymond HettingerApr 13, 2004 10:17 pm 
Jeff EplerApr 13, 2004 11:10 pm 
Guido van RossumApr 13, 2004 11:26 pm 
Tim PetersApr 13, 2004 11:56 pm 
Jeff EplerApr 14, 2004 9:08 am 
Raymond HettingerApr 14, 2004 12:06 pm 
Andrew MacIntyreApr 14, 2004 3:23 pm 
Jeff EplerApr 14, 2004 3:35 pm 
Mike PallApr 14, 2004 5:50 pm 
Tim PetersApr 14, 2004 11:14 pm 
Michael HudsonApr 15, 2004 7:05 am 
Mike PallApr 15, 2004 9:36 am 
Guido van RossumApr 15, 2004 10:27 am 
Jeremy HyltonApr 15, 2004 10:38 am 
Guido van RossumApr 15, 2004 10:42 am 
Mike PallApr 15, 2004 11:56 am 
Mike PallApr 15, 2004 11:56 am 
Skip MontanaroApr 15, 2004 11:59 am 
Michael HudsonApr 15, 2004 1:27 pm 
Raymond HettingerApr 15, 2004 2:22 pm 
Thomas HellerApr 15, 2004 2:31 pm 
"Martin v. Löwis"Apr 15, 2004 3:07 pm 
Jeremy HyltonApr 15, 2004 11:26 pm 
Tim PetersApr 16, 2004 12:18 am 
"Martin v. Löwis"Apr 16, 2004 2:00 am 
Andrew MacIntyreApr 16, 2004 9:14 pm 
Subject:[Python-Dev] Optimization targets
From:Michael Hudson (mw@python.net)
Date:Apr 15, 2004 7:05:42 am
List:org.python.python-dev

Mike Pall <mike@mike.de> writes:

Hi,

Raymond wrote:

It looks like the best bet is to try to speedup the code without changing the multiplier.

Indeed. And while you are at it, there are other optimizations, that seem more promising:

I compiled a recent CVS Python with profiling and here is a list of the top CPU hogs (on a Pentium III, your mileage may vary):

I played this game recently, but on a G3 ibook, which is probably a much more boring processor from a scheduling point of view.

pystone:

CPU% Function Name ---------------------------- 55.44 eval_frame 7.30 lookdict_string 4.34 PyFrame_New 3.73 frame_dealloc 1.73 vgetargs1 1.65 PyDict_SetItem 1.42 string_richcompare 1.15 PyObject_GC_UnTrack 1.11 PyObject_RichCompare 1.08 PyInt_FromLong 1.08 tupledealloc 1.04 insertdict

I saw similar results to this, tho' I don't remember lookdict_string being so high on the list.

parrotbench:

CPU% Function Name ---------------------------- 23.65 eval_frame 8.68 l_divmod 4.43 lookdict_string 2.95 k_mul 2.27 PyType_IsSubtype 2.23 PyObject_Malloc 2.09 x_add 2.05 PyObject_Free 2.05 tupledealloc

Arguably parrotbench is a bit unrepresentative here. And beware: due to automatic inlining of static functions the real offender may be hidden (x_divmod is the hog, not l_divmod).

Probably a fine candidate function for rewriting in assembly too...

Anyway, this just confirms that the most important optimization targets are: 1. eval_frame 2. string keyed dictionaries 3. frame handling

[...]

But GCC has more to offer: read the man page entries for -fprofile-arcs and -fbranch-probabilities. Here is a short recipe:

I tried this on the ibook and I found that it made a small difference *on the program you ran to generate the profile data* (e.g. pystone), but made naff all difference for something else. I can well believe that it makes more difference on a P4 or G5.

[snippety]

I bet there are more, but I'm running out of time right now -- sorry.

I certainly don't want to discourage people from optimizing Python's current implementation, but...

Some months ago (just after I played with -fprofile-arcs, most likely) I wrote a rant about improving Python's performance, which I've finally got around to uploading:

http://starship.python.net/crew/mwh/hacks/speeding-python.html

Tell me what you think!

Cheers, mwh