Thread: Curious behaviour with "order by random()"

Curious behaviour with "order by random()"

From
Erwin Sebastian Andreasen
Date:
I wanted to generate some test data based on a subset of rows in a table, thus used ORDER BY RANDOM(). I was surprised to see that if RANDOM() is used in ORDER BY it cannot also be used in column names: the same value is returned.

Compare the output of:

select random(), random();

which will return 2 separate random values with:

select random(), random() order by random();

which returns two of the same values (and the same value is also used in order by). While I use 9.6, I got the same results on db fiddle with 13.0: https://www.db-fiddle.com/f/hNofvnT44izEUmyPyEoWh4/0

What gives? Does using RANDOM() as an ORDER BY somehow turn it into per-row stable rather than volatile?

Re: Curious behaviour with "order by random()"

From
Tom Lane
Date:
Erwin Sebastian Andreasen <erwin@andreasen.org> writes:
> Compare the output of:
> select random(), random();
> which will return 2 separate random values with:
> select random(), random() order by random();
> which returns two of the same values (and the same value is also used in
> order by). While I use 9.6, I got the same results on db fiddle with 13.0:
> https://www.db-fiddle.com/f/hNofvnT44izEUmyPyEoWh4/0

> What gives? Does using RANDOM() as an ORDER BY somehow turn it into per-row
> stable rather than volatile?

No, but there is logic to merge ORDER BY values with select-list entries
if they're textually equivalent.  This comes in part from the old SQL92
ordering syntax

    select random() from ... order by 1;

where it was quite explicit that the ordering value was the same as some
select-list entry.  SQL99 dropped that syntax, but we (and perhaps other
RDBMSes; haven't checked) suppose that "select x ... order by x" is still
asking for only one computation of x.

There's room to argue about how many computations of x should be implied
by "select x, x ... order by x", no doubt.  And it looks like PG's answer
to that has changed over time.  But right now it seems to be "just one".

I'd counsel divorcing the ordering computation from the output value
to make it explicit what you want.  Perhaps

    select x from (select random() as x) ss order by random();

The merging only happens between order by/group by/select-list entries
of the same query level, so this will definitely give you two different
evaluations of random().

            regards, tom lane