Home > mailing lists

Re: FETCH FIRST clause PERCENT option - Mailing list pgsql-hackers

From	Ryan Lambert
Subject	Re: FETCH FIRST clause PERCENT option
Date	July 9, 2019 16:13:10
Msg-id	CAN-V+g89y6ch0UPDRUCPJ4=3n3EfTtDeU8NSiU1oGZtCaJuQUA@mail.gmail.com Whole thread Raw
In response to	Re: FETCH FIRST clause PERCENT option (Surafel Temesgen <surafel3000@gmail.com>)
Responses	Re: FETCH FIRST clause PERCENT option (Ryan Lambert <ryan@rustprooflabs.com>) Re: FETCH FIRST clause PERCENT option (Surafel Temesgen <surafel3000@gmail.com>)
List	pgsql-hackers

Tree view

Surafel,

> The cost of FITCH FIRST N PERCENT execution in current implementation is the cost of pulling the full table plus the cost of storing and fetching the tuple from tuplestore so it can > not perform better than pulling the full table in any case . This is because we can't determined the number of rows to return without executing the plan until the end. We can find the > estimation of rows that will be return in planner estimation but that is not exact.

Ok, I can live with that for the normal use cases. This example from the end of my previous message using 95% seems like a problem still, I don't like syntax that unexpectedly kills performance like this one. If this can't be improved in the initial release of the feature I'd suggest we at least make a strong disclaimer in the docs, along the lines of:

"It is possible for FETCH FIRST N PERCENT to create poorly performing query plans when the N supplied exceeds 50 percent. In these cases query execution can take an order of magnitude longer to execute than simply returning the full table. If performance is critical using an explicit row count for limiting is recommended."

I'm not certain the 50 percent is the true threshold of where things start to fall apart, I just used that as a likely guess for now. I can do some more testing this week to identify where things start falling apart performance wise. Thanks,

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
FROM r10mwide
FETCH FIRST 95 PERCENT ROWS ONLY
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=651432.48..651432.49 rows=1 width=24) (actual time=58981.043..58981.044 rows=1 loops=1)
-> Limit (cost=230715.67..461431.34 rows=9500057 width=20) (actual time=0.017..55799.389 rows=9500000 loops=1)
-> Seq Scan on r10mwide (cost=0.00..242858.60 rows=10000060 width=20) (actual time=0.014..3847.146 rows=10000000 loops=1)
Planning Time: 0.117 ms
Execution Time: 59079.680 ms
(5 rows)

Ryan Lambert

pgsql-hackers by date:

From: Patrick McHardy
Date: 09 July 2019, 16:00:27
Subject: [PATCH] Fix trigger argument propagation to child partitions

From: Daniel Gustafsson
Date: 09 July 2019, 16:16:01
Subject: Re: Contribution to Perldoc for TestLib module in Postgres

Re: FETCH FIRST clause PERCENT option - Mailing list pgsql-hackers

Previous

Next