Re: rand48 replacement - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: rand48 replacement
Date
Msg-id CAEZATCVpnOjrwU=4712OPjNDqV2sVkkK7XHDiR-nzYDL8N6z=Q@mail.gmail.com
Whole thread Raw
In response to Re: rand48 replacement  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: rand48 replacement  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Thu, 1 Jul 2021 at 22:18, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> > However, this patch seems to be replacing that with a simple
> > modulo operation, which is perhaps the worst possible way to do it.
>
> The modulo operation is biased for large ranges close to the limit, sure.
> Also, the bias is somehow of the same magnitude as the FP multiplication
> approach used previously, so the "worst" has not changed much, it is
> really the same as before.
>

It may be true that the bias is of the same magnitude as FP multiply,
but it is not of the same nature. With FP multiply, the
more-likely-to-be-chosen values are more-or-less evenly distributed
across the range, whereas modulo concentrates them all at one end,
making it more likely to bias test results.

It's worth paying attention to how other languages/libraries implement
this, and basically no one chooses the modulo method, which ought to
raise alarm bells. Of the biased methods, it has the worst kind of
bias and the worst performance.

If a biased method is OK, then the biased integer multiply method
seems to be the clear winner. This requires the high part of a
64x64-bit product, which is trivial if 128-bit integers are available,
but would need a little more work otherwise. There's some code in
common/d2s that might be suitable.

Most other implementations tend to use an unbiased method though, and
I think it's worth doing the same. It might be a bit slower, or even
faster depending on implementation and platform, but in the context of
the DB as a whole, I don't think a few extra cycles matters either
way. The method recommended at the very end of that blog seems to be
pretty good all round, but any of the other commonly used unbiased
methods would probably be OK too.

Regards,
Dean



pgsql-hackers by date:

Previous
From: Ronan Dunklau
Date:
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Next
From: David Rowley
Date:
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates