18 messages in org.python.python-bugs-list[ python-Bugs-917055 ] add a stronger...
FromSent OnAttachments
SourceForge.netMar 15, 2004 9:46 pm 
SourceForge.netMar 16, 2004 5:44 am 
SourceForge.netMar 16, 2004 6:49 am 
SourceForge.netMar 16, 2004 6:52 am 
SourceForge.netMar 17, 2004 4:05 am 
SourceForge.netMar 17, 2004 6:27 am 
SourceForge.netMar 18, 2004 1:27 am 
SourceForge.netMar 18, 2004 3:51 am 
SourceForge.netMar 21, 2004 12:33 pm 
SourceForge.netMar 21, 2004 2:34 pm 
SourceForge.netMar 21, 2004 2:50 pm 
SourceForge.netMar 21, 2004 7:49 pm 
SourceForge.netMar 21, 2004 7:56 pm 
SourceForge.netMar 24, 2004 6:34 am 
SourceForge.netMar 25, 2004 12:15 am 
SourceForge.netMar 25, 2004 3:25 am 
SourceForge.netMar 25, 2004 4:08 am 
SourceForge.netMar 25, 2004 7:37 pm 
Actions with this message:
Paste this link in email or IM:
Paste this link in email or IM:
Atom feed for this thread
Paste this URL into your reader:
Subject:[ python-Bugs-917055 ] add a stronger PRNGActions...
From:SourceForge.net (nore@sourceforge.net)
Date:Mar 21, 2004 7:56:15 pm
List:org.python.python-bugs-list

Bugs item #917055, was opened at 2004-03-16 02:46 Message generated for change (Comment added) made by phr You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=917055&group_id=5470

Category: Python Library Group: Feature Request Status: Open Resolution: None Priority: 5 Submitted By: paul rubin (phr) Assigned to: Raymond Hettinger (rhettinger) Summary: add a stronger PRNG

Initial Comment: The default Mersenne Twister algorithm in the Random module is very fast but makes no serious attempt to generate output that stands up to adversarial analysis. Besides cryptography applications, this can be a serious problem in areas like computer games. Sites like www.partypoker.com routinely run online tournaments with prize funds of 100K USD or more. There's big financial incentives to find ways of guessing your opponent's cards with better than random chance probability. See bug #901285 for some discussion of possible correlations in Mersenne Twister's output.

Teukolsky et al discuss PRNG issues at some length in their book "Numerical Recipes". The original edition of Numerical Recipes had a full blown version of the FIPS Data Encryption Standard implemented horrendously in Fortran, as a way of making a PRNG with no easily discoverable output correlations. Later editions replaced the DES routine with a more efficient one based on similar principles.

Python already has an SHA module that makes a dandy PRNG. I coded a sample implementation:

http://www.nightsong.com/phr/python/sharandom.py

I'd like to ask that the Python lib include something like this as an alternative to MT. It would be similar to the existing whrandom module in that it's an alternative subclass to the regular Random class. The existing Random module wouldn't have to be changed.

I don't propose directly including the module above, since I think the Random API should also be extended to allow directly requesting pseudo-random integers from the generator subclass, rather than making them from floating-point output. That would allow making the above subclass work more cleanly. I'll make a separate post about this, but first will have to examine the Random module source code.

----------------------------------------------------------------------

Comment By: paul rubin (phr)

Date: 2004-03-22 00:56

Message: Logged In: YES user_id=72053

Two small corrections to below: 1) "in favor of an entropy" is an editing error--the intended meaning should be obvious. 2) I meant Bryan Mongeau, not Eric Mongeau. Bryan's lib is at <http://eevolved.com/cryptkit/>. Sorry for any confusion.

----------------------------------------------------------------------

Comment By: paul rubin (phr) Date: 2004-03-22 00:49

Message: Logged In: YES user_id=72053

I'm very much in favor of an entropy and Bram's suggested interface of entropy(x) supplying x bytes of randomness is fine. Perhaps it should really live in a cryptography API rather than in the Random API, but either way is ok. Mark Moraes wrote a Windows module that gets random numbers from the Windows CAPI; I put up a copy at

http://www.nightsong.com/phr/python/winrand.module

For Linux, Cygwin, and *BSD (including MacOS 10, I think), just read /dev/urandom for random bytes. However, various other systems (I think including Solaris) don't include anything like this. OpenSSL has an entropy gathering daemon that might be of some use in that situation. There's also the Yarrow generator (http://www.schneier.com/yarrow.html) and Eric Mongeau(?) wrote a pure-Python generator a while back that tried to gather entropy from thread racing, similar to Java's default SecureRandom class (I consider that method to be a bit bogus in both Python and Java).

I think, though, simply supporting /dev/*random and the Windows CAPI is a pretty good start, even if other OS's aren't supported. Providing that function in the Python lib will make quite a few people happy. A single module integrating both methods would be great. I don't have any Windows dev tools so can't test any wrappers for Mark Moraes's function but maybe one of you guys can do it.

I'm not too keen on the md5random.py patch for reasons discussed in the c.l.py thread. It depends too deeply on the precise characteristics of both md5 and Mersenne Twister. I think for a cryptography-based generator, it's better to stick to one cryptography-based primitive, and to use sha instead of md5. That also helps portability since it means other environments (maybe including Jython) can reproduce the PRNG stream without having to re-implement MT, as long as they have SHA (which is a US federal standard).

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger) Date: 2004-03-21 19:50

Message: Logged In: YES user_id=80475

Bram, if you have a patch, I would be happy to look at it. Please make it as platform independent as possible (its okay to have it conditionally compile differently so long as the API stays the same). Submit it as a separate patch and assign to me -- it doesn't have to be orginal, you can google around to determine prior art.

----------------------------------------------------------------------

Comment By: Bram Cohen (bram_cohen) Date: 2004-03-21 19:34

Message: Logged In: YES user_id=52561

The lack of a 'real' entropy source is the gap which can't be fixed with an application-level bit of code. I think there are simple hooks for this on all systems, such as /dev/random on linux, but they aren't cross platform. A unified API which always calls the native entropy hook would be a very good thing.

An example of a reasonable API would be to have a module named entropy, with a single function entropy(x) which returns a random string of length x. This is a problem which almost anyone writing a security-related application runs into, and lots of time is wasted writing dubious hacks to harvest entropy when a single simple library could magically solve it the right way.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger) Date: 2004-03-21 17:33

Message: Logged In: YES user_id=80475

Attaching my alternative. If it fulfills your use case, let me know and I'll apply it.

----------------------------------------------------------------------

Comment By: paul rubin (phr) Date: 2004-03-18 08:51

Message: Logged In: YES user_id=72053

Updated version of sharandom.py is at same url. It folds a counter into the hash and also includes a getrandbits method.

----------------------------------------------------------------------

Comment By: paul rubin (phr) Date: 2004-03-18 06:27

Message: Logged In: YES user_id=72053

I don't mind dropping the time() auto-seed but I think that means eliminating any auto-seed, and requiring a user-supplied seed.

There is no demonstrable minimum period for the SHA-OFB and it would be bad if there was, since it would then no longer act like a PRF. Note that the generator in the sample code actually comes from two different SHA instances and thus its expected period is about 2**160. Anyway, adding a simple counter (incrementing by 1 on every SHA call) to the SHA input removes any lingering chance of a repeating sequence. I'll update the code to do that. It's much less ugly than stirring in Mersenne Twister output.

I don't have Python 2.4 yet but will look at it when I get a chance.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger) Date: 2004-03-17 11:27

Message: Logged In: YES user_id=80475

It has some potential for being added. That being said, try to avoid a religious fervor.

I would like to see considerably more discussion and evolution first.

The auto-seeding with time *must* be dropped -- it is not harmonious with the goal of creating a secure random sequence. It is okay for the a subclass to deviate in this way.

Also, I was soliciting references stronger than (I don't know of any results ... It is generally considered ... ). If we put this in, people are going to rely on it. The docs *must* include references indicating the strengths and weaknesses of the approach. It should also concisely say why it works (a summary proof that makes it clear how a one-way digest function can be tranformed into a sequence generator that is cryptographicly strong to both the left and right with the latter being the one that is not obvious).

Not having a demonstrable minimum period is also bad. Nothing in the discussion so far precludes the existence of a bad seed that has a period of only 1 or 2. See my suggestion on comp.lang.python for a means of mitigating this issue.

With respect to the randint question, be sure to look at the current Py2.4 source for random.py. The API is expanded to include and an option method, getrandbits(). That in turn feeds the other integer methods without the int to float to int dance.

Upon further consideration, I think the export control question is moot since we're using an existing library function and not really expressing new art.

----------------------------------------------------------------------

Comment By: paul rubin (phr) Date: 2004-03-17 09:05

Message: Logged In: YES user_id=72053

There are no research results that I know of that can distinguish the output of sha1-ofb from random output in any practical way. To find such a distinguisher would be a significant result. It's a safe bet that cryptographers have searched for distinguishers, though I don't promise to have heard of every result that's been published. I'll ask on sci.crypt if anyone else has heard of such a thing, if you want. SHA1-HMAC is generally considered to be indistinguishable from a PRF (pseudorandom function, i.e. a function selected at random from the space of all functions from arbitrary strings to 160-bit outputs; this term is from concrete security theory). MD5 is deprecated these days and there's no point to using it instead of sha1 for this.

I'm not sure what happens if randint is added to the API. If you subclass Random and don't provide a randint method, you inherit from the base class, which can call self.random() to get floats to make the ints from.

US export restrictions basically don't exist any more. In principle, if you want to export something, you're supposed to send an email to an address at the commerce department, saying the name of the program and the url where it can be obtained and a few similar things. In practice, email to that address is ignored, they never check anything. I heard the address even stopped working for a while, though they may have fixed it since then. See http://www.bxa.doc.gov/Encryption/ for info. I've emailed notices to the address a few times and never heard back anything. Anyway, I don't think this should count as cryptography; it's simply using a cryptographic hash function as an PRNG to avoid the correlations in other PRNG's; scientific rationale for doing that is given in the Numerical Recipes book mentioned above.

The code that I linked uses the standard API but I wonder if the floating point output is optimally uniform, i.e. the N/2**56 calculation may not be exactly the right thing for an ieee float64.

Using the time of day is what the Random docs say to do by default. You're correct that any security application needs to supply a higher entropy seed.

I would like it very much if the std lib included a module that read some random bytes from the OS for OS's that support it. That means reading /dev/urandom on recent Un*x-ish systems or Cygwin, or calling CryptGenRandom on Windows. Reading /dev/urandom is trivial, and there's a guy on the pycrypt list who wrote a Windows function to call CryptGenRandom and return the output through the Python API. I forwarded the function to Guido with the author's permission but nothing seemed to happen with it. However, this gets away from this sharandom subclass. I'd like to make a few more improvements to the module but after that I think it can be dropped into the lib. Let me know what you think.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger) Date: 2004-03-16 11:52

Message: Logged In: YES user_id=80475

One other thought: if cryptographic strength is a goal, then seeding absolutely should require a long seed (key) as an input and the time should *never* be used.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger) Date: 2004-03-16 11:49

Message: Logged In: YES user_id=80475

It would have been ideal if the Random API had been designed with an integer generator at the core and floating point as a computed value, but that it the way it has been for a long time and efforts to switch it over would likely result in either incompatibility with existing subclasses or introducing new complexity (making it even harder to subclass). I think the API should be left alone until Py3.0.

The attached module would make a good recipe on ASPN where improvements and critiques can be posted. Do you have links to research showing that running SHA-1 in a cipher block feedback mode results in a cryptographically strong random number generator -- the result seems likely, but a research link would be great. Is there a link to research showing the RNG properties of the resulting generator (period, equidistribution, passing tests for randomness, etc)? Also, is there research showing the relative merits of this approach vs MD5, AES, or DES?

If something like this gets added to the library, I prefer it to be added to random.py using the existing API. Adding yet another random module would likely do more harm than good.

One other question (I don't know the answer to this): would including a cryptographically strong RNG trigger US export restrictions on the python distribution?

----------------------------------------------------------------------