Re: random() generates collisions too early - Mailing list pgsql-bugs

From Honza Horak
Subject Re: random() generates collisions too early
Date
Msg-id 5266726E.5030300@redhat.com
Whole thread Raw
In response to Re: random() generates collisions too early  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: random() generates collisions too early  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-bugs
On 10/21/2013 05:14 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> On 18.10.2013 14:55, Honza Horak wrote:
>>> The results covered only 181383 distinct values, and 68 values
>>> repeated four or five times each. We should at least consider using a
>>> higher-entropy seed.
>
>> Interesting. PostgreSQL's random() function just calls the underlying
>> libc random() function. I assume you tested this on with Linux and glibc.
>
> I agree with the theory that this probably isn't the fault of the random()
> function as such, but with our code to reset the random seed when forking
> a postmaster child process.  Note that the test case is only examining the
> first random value created in each process.  So basically what this is
> measuring is the number of different seed values we use.
>
>             regards, tom lane
>

After playing a bit more with random() value and the seed defined in
src/backend/postmaster/postmaster.c:4038 it seems really like the seed
doesn't have very good characteristic. The PID number used there ensures
the value to be different in child processes, but when only xor-ed with
usecs, it doesn't enlarge the total data domain of the seed, which stays
1M values. Then having in mind the birthday paradox and probably not
ideal PRNG, it doesn't seem unrealistic to get two same random numbers
in only hundreds/thousands of tries.

In order to enlarge possible data domain of the seed we can include sec
count as well and shift some xor-ed variables to use the whole range or
unsigned int. The attached patch uses a way that gave me much better
results (collision found approx. after a hundred thousands of values):

-       srandom((unsigned int) (MyProcPid ^ usecs));
+       srandom((unsigned int) (MyProcPid ^ (usecs << 12) ^ secs));

I'm also thinking of a reproducer -- does the test-suite allow to
"reconnect" the client during a test?

Regards,
Honza

Attachment

pgsql-bugs by date:

Previous
From: Honza Horak
Date:
Subject: Re: random() generates collisions too early
Next
From: Heikki Linnakangas
Date:
Subject: Re: random() generates collisions too early