Home > mailing lists

Re: Should we optimize the `ORDER BY random() LIMIT x` case? - Mailing list pgsql-hackers

From	Andrei Lepikhov
Subject	Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Date	May 15, 2025 10:54:35
Msg-id	b7f18652-e40d-45c5-bbb1-134349c15244@gmail.com Whole thread Raw
In response to	Should we optimize the `ORDER BY random() LIMIT x` case? (Aleksander Alekseev <aleksander@timescale.com>)
List	pgsql-hackers

Tree view

On 15/5/2025 01:41, Aleksander Alekseev wrote:
> One can do `SELECT (the query above) ORDER BY random() LIMIT x` but
> this produces an inefficient plan. Alternatively one could create
> temporary tables using `CREATE TEMP TABLE ... AS SELECT * FROM tbl
> TABLESAMPLE BERNOULLI(20)` but this is inconvenient and would be
> suboptimal even if we supported global temporary tables.
> 
> 1. Do you think there might be value in addressing this issue?
> 2. If yes, how would you suggest addressing it from the UI point of
> view - by adding a special syntax, some sort of aggregate function, or
> ...?
I think I got your point, but just to be sure:
Do you want to have some random sampling from an arbitrary subquery with 
the guarantee that N distinct (by tid) tuples will be produced or all 
the tuples if the underlying subquery produces less than N?

What kind of optimisation trick may the optimiser use here to provide an 
optimal plan? As I see it, it will need to think that all the tuples 
should be returned from the subquery. The only profit is to skip sorting 
the massive sample.

As a palliative, you may laterally join your subquery with a stored 
procedure, which will process the incoming tuple and implement the logic 
of random sampling.

Implementation of that in the core will need a new "skip result" node 
and new syntax, which may be too much if a workaround is found.

-- 
regards, Andrei Lepikhov

pgsql-hackers by date:

From: Peter Smith
Date: 15 May 2025, 09:57:00
Subject: Re: Logical Replication of sequences

From: Jakub Wartak
Date: 15 May 2025, 10:57:01
Subject: xlogrecovery.c:WaitForWALToBecomeAvailable() - make "switched WAL source" visible by default?

Re: Should we optimize the `ORDER BY random() LIMIT x` case? - Mailing list pgsql-hackers

Previous

Next