Home > mailing lists

Re: Should we optimize the `ORDER BY random() LIMIT x` case? - Mailing list pgsql-hackers

From	wenhui qiu
Subject	Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Date	May 15 05:06:38
Msg-id	CAGjGUAJf9tjVqp3TjazHAuxnne8jxkqqVXFKjtt2X=i=V-MW+Q@mail.gmail.com Whole thread Raw
In response to	Should we optimize the `ORDER BY random() LIMIT x` case? (Aleksander Alekseev <aleksander@timescale.com>)
List	pgsql-hackers

Tree view

Hi Aleksander

if we can optimize the query， that would be great，Then we won't need to pull a lot of data to the program end and randomly pick the needed data there.

On Thu, 15 May 2025 at 07:41, Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi,

If I didn't miss anything, currently we don't seem to support sampling
the result of an arbitrary SELECT query efficiently.

To give one specific example:

````
CREATE TABLE temperature(
ts TIMESTAMP NOT NULL,
city TEXT NOT NULL,
temperature INT NOT NULL);

CREATE TABLE humidity(
ts TIMESTAMP NOT NULL,
city TEXT NOT NULL,
humidity INT NOT NULL);

-- imagine having much more data ...
INSERT INTO temperature (ts, city, temperature)
SELECT ts + (INTERVAL '60 minutes' * random()), city, 30*random()
FROM generate_series('2022-01-01' :: TIMESTAMP,
'2022-01-31', '1 day') AS ts,
unnest(array['City A', 'City B']) AS city;

INSERT INTO humidity (ts, city, humidity)
SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random()
FROM generate_series('2022-01-01' :: TIMESTAMP,
'2022-01-31', '1 day') AS ts,
unnest(array['City A', 'City B']) AS city;

-- "AS OF" join:
SELECT t.ts, t.city, t.temperature, h.humidity
FROM temperature AS t
LEFT JOIN LATERAL
( SELECT * FROM humidity
WHERE city = t.city AND ts <= t.ts
ORDER BY ts DESC LIMIT 1
) AS h ON TRUE
WHERE t.ts < '2022-01-05';
```

One can do `SELECT (the query above) ORDER BY random() LIMIT x` but
this produces an inefficient plan. Alternatively one could create
temporary tables using `CREATE TEMP TABLE ... AS SELECT * FROM tbl
TABLESAMPLE BERNOULLI(20)` but this is inconvenient and would be
suboptimal even if we supported global temporary tables.

1. Do you think there might be value in addressing this issue?
2. If yes, how would you suggest addressing it from the UI point of
view - by adding a special syntax, some sort of aggregate function, or
...?

--
Best regards,
Aleksander Alekseev

pgsql-hackers by date:

From: Aleksander Alekseev
Date: 15 May, 02:41:15
Subject: Should we optimize the `ORDER BY random() LIMIT x` case?

From: Daniil Davydov
Date: 15 May, 08:39:38
Subject: [BUG] Skipped initialization of some xl_xact_parsed_prepare fields

Re: Should we optimize the `ORDER BY random() LIMIT x` case? - Mailing list pgsql-hackers

Previous

Next