atom feed69 messages in org.python.python-listIterator vs Lists Was: Speed of st...
FromSent OnAttachments
Mads Orbesen TroestApr 12, 2003 10:19 am 
Carl BanksApr 12, 2003 10:54 am 
Paul MooreApr 12, 2003 11:06 am 
Mads Orbesen TroestApr 12, 2003 11:37 am 
Alan McIntyreApr 12, 2003 11:40 am 
Alex MartelliApr 12, 2003 11:45 am 
Dave BrueckApr 12, 2003 12:31 pm 
Alan McIntyreApr 12, 2003 1:05 pm 
Carl BanksApr 12, 2003 3:35 pm 
Alex MartelliApr 12, 2003 3:41 pm 
Skip MontanaroApr 12, 2003 3:59 pm 
Roy SmithApr 12, 2003 5:14 pm 
Tim PetersApr 12, 2003 5:46 pm 
Daniel BerlinApr 12, 2003 5:52 pm 
Daniel BerlinApr 12, 2003 5:56 pm 
Carl BanksApr 12, 2003 6:04 pm 
Chris TavaresApr 12, 2003 6:08 pm 
Tim PetersApr 12, 2003 6:21 pm 
logistixApr 12, 2003 6:31 pm 
Alan McIntyreApr 12, 2003 6:57 pm 
CourageousApr 12, 2003 7:28 pm 
achrist at easystreet.comApr 12, 2003 7:58 pm 
Alex MartelliApr 13, 2003 1:37 am 
Alex MartelliApr 13, 2003 2:21 am 
Alex MartelliApr 13, 2003 2:26 am 
Martin v. LöwisApr 13, 2003 3:05 am 
Mads Orbesen TroestApr 13, 2003 3:08 am 
Martin v. LöwisApr 13, 2003 3:15 am 
Alex MartelliApr 13, 2003 9:15 am 
Carl BanksApr 13, 2003 9:18 am 
CourageousApr 13, 2003 10:26 am 
Greg Ewing (using news.cis.dfn.de)Apr 13, 2003 11:20 pm 
Greg Ewing (using news.cis.dfn.de)Apr 13, 2003 11:22 pm 
Alex MartelliApr 14, 2003 12:11 am 
Marcus AlanenApr 14, 2003 3:40 am 
Alex MartelliApr 14, 2003 3:53 am 
Marcus AlanenApr 14, 2003 6:47 am 
Alex MartelliApr 14, 2003 7:14 am 
Christopher A. CraigApr 14, 2003 7:24 am 
Daniel BerlinApr 14, 2003 10:38 am 
Tim PetersApr 14, 2003 11:04 am 
Skip MontanaroApr 14, 2003 11:28 am 
Marcus AlanenApr 14, 2003 11:36 am 
Raymond HettingerApr 14, 2003 4:31 pm 
Greg Ewing (using news.cis.dfn.de)Apr 14, 2003 8:20 pm 
Ben HutchingsApr 15, 2003 3:03 am 
Alex MartelliApr 15, 2003 5:21 am 
AahzApr 15, 2003 7:46 pm 
CourageousApr 15, 2003 7:55 pm 
AahzApr 15, 2003 8:03 pm 
Steve HoldenApr 16, 2003 6:57 am 
Gerhard HaeringApr 16, 2003 7:03 am 
Alex MartelliApr 16, 2003 8:14 am 
Gerhard HäringApr 16, 2003 8:58 am 
Steve HoldenApr 16, 2003 12:03 pm 
AahzApr 16, 2003 5:00 pm 
George KinneyApr 16, 2003 8:14 pm 
Parzival HerzogApr 16, 2003 9:27 pm 
yaipa h.Apr 16, 2003 11:00 pm 
AahzApr 16, 2003 11:25 pm 
Alex MartelliApr 17, 2003 3:39 am 
Drazen GemicApr 17, 2003 4:35 am 
Steve HoldenApr 17, 2003 5:02 am 
rzedApr 17, 2003 6:09 am 
Alex MartelliApr 17, 2003 7:53 am 
Alex MartelliApr 17, 2003 8:34 am 
Marcus AlanenApr 17, 2003 8:45 am 
yaipa h.Apr 17, 2003 9:11 am 
Andy JewellApr 17, 2003 3:09 pm 
Subject:Iterator vs Lists Was: Speed of string += string
From:Raymond Hettinger (vze4@verizon.net)
Date:Apr 14, 2003 4:31:21 pm
List:org.python.python-list

[Martelli bot]

d=dict.fromkeys(range(200))''' 'for x in d.keys(): pass' d=dict.fromkeys(range(200))''' 'for x in d.iterkeys(): pass' 10000 loops, best of 3: 28.2 usec per loop

That's most definitely NOT a significant performance difference, of course, but calling iteration on iterkeys "a lot faster" than iteration on keys would appear to be substantially misleading, at least without a lot of qualifications and hedging!-)

Here are a few thoughts about the performance differences between looping over an iterator versus building a list and looping over it.

* Constructing a list is especially fast when the underlying object can report its length. That enables the list constructors to pre-allocate the entire list at once. If not, the construction requires a series of appends and an occasional resize/copy operation.

* Iterators have a tiny constant size while lists take space proportional to the length of the list. The part that is not obvious is that looping over the iterator re-uses the same memory location again an again. So the relevant data is almost always in the hardware memory cache. The same is also true for small lists. However, large lists cannot all fit in the cache and accessing them will be slower than their iterator based counterparts. As processor speeds race ahead of memory speeds, the performance difference will become more pronounced over time.

* It takes time to allocate Python objects, but the smart allocator saves some of that time when it can reuse recently discarded objects. This technique works very well with iterators which tend to continuously create and abandon many same sized objects. However, list based approaches keep a reference to each element preventing their reuse.

Sometimes the difference is tiny; sometimes it is huge. There were some examples where itertools.izip() outperformed zip() by about ten to one. Timbot's subsequent optimizations to zip() made this much less likely. Still, izip() generally performs at least a little better than its list based counterpart.

The executive summary: For small datasets, iterator and list based approaches have similar performance. For larger datasets, iterators save both time and space.