If I didn't miss anything, currently we don't seem to support sampling the result of an arbitrary SELECT query efficiently.
To give one specific example:
```` CREATE TABLE temperature( ts TIMESTAMP NOT NULL, city TEXT NOT NULL, temperature INT NOT NULL);
CREATE TABLE humidity( ts TIMESTAMP NOT NULL, city TEXT NOT NULL, humidity INT NOT NULL);
-- imagine having much more data ... INSERT INTO temperature (ts, city, temperature) SELECT ts + (INTERVAL '60 minutes' * random()), city, 30*random() FROM generate_series('2022-01-01' :: TIMESTAMP, '2022-01-31', '1 day') AS ts, unnest(array['City A', 'City B']) AS city;
INSERT INTO humidity (ts, city, humidity) SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random() FROM generate_series('2022-01-01' :: TIMESTAMP, '2022-01-31', '1 day') AS ts, unnest(array['City A', 'City B']) AS city;
-- "AS OF" join: SELECT t.ts, t.city, t.temperature, h.humidity FROM temperature AS t LEFT JOIN LATERAL ( SELECT * FROM humidity WHERE city = t.city AND ts <= t.ts ORDER BY ts DESC LIMIT 1 ) AS h ON TRUE WHERE t.ts < '2022-01-05'; ```
One can do `SELECT (the query above) ORDER BY random() LIMIT x` but this produces an inefficient plan. Alternatively one could create temporary tables using `CREATE TEMP TABLE ... AS SELECT * FROM tbl TABLESAMPLE BERNOULLI(20)` but this is inconvenient and would be suboptimal even if we supported global temporary tables.
1. Do you think there might be value in addressing this issue? 2. If yes, how would you suggest addressing it from the UI point of view - by adding a special syntax, some sort of aggregate function, or ...?