Home > mailing lists

Re: rand48 replacement - Mailing list pgsql-hackers

From	Fabien COELHO
Subject	Re: rand48 replacement
Date	July 2, 2021 21:51:42
Msg-id	alpine.DEB.2.22.394.2107022340250.2359988@pseudo Whole thread Raw
In response to	Re: rand48 replacement (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses	Re: rand48 replacement
List	pgsql-hackers

Tree view

Hello Dean,

> It may be true that the bias is of the same magnitude as FP multiply, 
> but it is not of the same nature. With FP multiply, the 
> more-likely-to-be-chosen values are more-or-less evenly distributed 
> across the range, whereas modulo concentrates them all at one end, 
> making it more likely to bias test results.

Yes, that is true.

> It's worth paying attention to how other languages/libraries implement
> this, and basically no one chooses the modulo method, which ought to
> raise alarm bells. Of the biased methods, it has the worst kind of
> bias and the worst performance.

Hmmm. That is not exactly how I interpreted the figures in the blog.

> If a biased method is OK, then the biased integer multiply method
> seems to be the clear winner. This requires the high part of a
> 64x64-bit product, which is trivial if 128-bit integers are available,
> but would need a little more work otherwise. There's some code in
> common/d2s that might be suitable.

And yes, modulo is expensive. If we allow 128 bits integers operations, I 
would not choose this RNPG in the first place, I'd take PCG with a 128 bits state.
That does not change the discussion about bias, though.

> Most other implementations tend to use an unbiased method though, and I 
> think it's worth doing the same. It might be a bit slower, or even 
> faster depending on implementation and platform, but in the context of 
> the DB as a whole, I don't think a few extra cycles matters either way.

Ok ok ok, I surrender!

> The method recommended at the very end of that blog seems to be pretty 
> good all round, but any of the other commonly used unbiased methods 
> would probably be OK too.

That does not address my other issues with the proposed methods, in 
particular the fact that the generated sequence is less deterministic, but 
I think I have a simple way around that. I'm hesitating to skip to the the 
bitmask method, and give up performance uniformity. I'll try to come up 
with something over the week-end, and also address Tom's comments in 
passing.

-- 
Fabien.

pgsql-hackers by date:

From: Paul A Jungwirth
Date: 02 July 2021, 21:39:50
Subject: Re: SQL:2011 application time

From: Jacob Champion
Date: 02 July 2021, 22:15:45
Subject: Re: Preventing abort() and exit() calls in libpq

Re: rand48 replacement - Mailing list pgsql-hackers

Previous

Next