Re: random() generates collisions too early - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: random() generates collisions too early
Date
Msg-id 526928D0.6090707@vmware.com
Whole thread Raw
In response to Re: random() generates collisions too early  (Honza Horak <hhorak@redhat.com>)
Responses Re: random() generates collisions too early
List pgsql-bugs
On 22.10.2013 15:41, Honza Horak wrote:
> On 10/21/2013 05:14 PM, Tom Lane wrote:
>> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>>> On 18.10.2013 14:55, Honza Horak wrote:
>>>> The results covered only 181383 distinct values, and 68 values
>>>> repeated four or five times each. We should at least consider using a
>>>> higher-entropy seed.
>>
>>> Interesting. PostgreSQL's random() function just calls the underlying
>>> libc random() function. I assume you tested this on with Linux and
>>> glibc.
>>
>> I agree with the theory that this probably isn't the fault of the
>> random()
>> function as such, but with our code to reset the random seed when forking
>> a postmaster child process. Note that the test case is only examining the
>> first random value created in each process. So basically what this is
>> measuring is the number of different seed values we use.
>>
>> regards, tom lane
>>
>
> After playing a bit more with random() value and the seed defined in
> src/backend/postmaster/postmaster.c:4038 it seems really like the seed
> doesn't have very good characteristic. The PID number used there ensures
> the value to be different in child processes, but when only xor-ed with
> usecs, it doesn't enlarge the total data domain of the seed, which stays
> 1M values. Then having in mind the birthday paradox and probably not
> ideal PRNG, it doesn't seem unrealistic to get two same random numbers
> in only hundreds/thousands of tries.
>
> In order to enlarge possible data domain of the seed we can include sec
> count as well and shift some xor-ed variables to use the whole range or
> unsigned int. The attached patch uses a way that gave me much better
> results (collision found approx. after a hundred thousands of values):
>
> - srandom((unsigned int) (MyProcPid ^ usecs));
> + srandom((unsigned int) (MyProcPid ^ (usecs << 12) ^ secs));

Seems reasonable, committed. Thanks!

- Heikki

pgsql-bugs by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: random() generates collisions too early
Next
From: Tom Lane
Date:
Subject: Re: random() generates collisions too early