Re: rand48 replacement - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: rand48 replacement
Date
Msg-id alpine.DEB.2.22.394.2107022340250.2359988@pseudo
Whole thread Raw
In response to Re: rand48 replacement  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: rand48 replacement  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
Hello Dean,

> It may be true that the bias is of the same magnitude as FP multiply, 
> but it is not of the same nature. With FP multiply, the 
> more-likely-to-be-chosen values are more-or-less evenly distributed 
> across the range, whereas modulo concentrates them all at one end, 
> making it more likely to bias test results.

Yes, that is true.

> It's worth paying attention to how other languages/libraries implement
> this, and basically no one chooses the modulo method, which ought to
> raise alarm bells. Of the biased methods, it has the worst kind of
> bias and the worst performance.

Hmmm. That is not exactly how I interpreted the figures in the blog.

> If a biased method is OK, then the biased integer multiply method
> seems to be the clear winner. This requires the high part of a
> 64x64-bit product, which is trivial if 128-bit integers are available,
> but would need a little more work otherwise. There's some code in
> common/d2s that might be suitable.

And yes, modulo is expensive. If we allow 128 bits integers operations, I 
would not choose this RNPG in the first place, I'd take PCG with a 128 bits state.
That does not change the discussion about bias, though.

> Most other implementations tend to use an unbiased method though, and I 
> think it's worth doing the same. It might be a bit slower, or even 
> faster depending on implementation and platform, but in the context of 
> the DB as a whole, I don't think a few extra cycles matters either way.

Ok ok ok, I surrender!

> The method recommended at the very end of that blog seems to be pretty 
> good all round, but any of the other commonly used unbiased methods 
> would probably be OK too.

That does not address my other issues with the proposed methods, in 
particular the fact that the generated sequence is less deterministic, but 
I think I have a simple way around that. I'm hesitating to skip to the the 
bitmask method, and give up performance uniformity. I'll try to come up 
with something over the week-end, and also address Tom's comments in 
passing.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Paul A Jungwirth
Date:
Subject: Re: SQL:2011 application time
Next
From: Jacob Champion
Date:
Subject: Re: Preventing abort() and exit() calls in libpq