Re: FETCH FIRST clause PERCENT option - Mailing list pgsql-hackers

From Ryan Lambert
Subject Re: FETCH FIRST clause PERCENT option
Date
Msg-id CAN-V+g89y6ch0UPDRUCPJ4=3n3EfTtDeU8NSiU1oGZtCaJuQUA@mail.gmail.com
Whole thread Raw
In response to Re: FETCH FIRST clause PERCENT option  (Surafel Temesgen <surafel3000@gmail.com>)
Responses Re: FETCH FIRST clause PERCENT option  (Ryan Lambert <ryan@rustprooflabs.com>)
Re: FETCH FIRST clause PERCENT option  (Surafel Temesgen <surafel3000@gmail.com>)
List pgsql-hackers
Surafel,

> The cost of FITCH FIRST N PERCENT execution in current implementation is the cost of pulling the full table plus the cost of storing and fetching the tuple from tuplestore so it can > not perform better than pulling the full table in any case . This is because we can't determined the number of rows to return without executing the plan until the end. We can find the > estimation of rows that will be return in planner estimation but that is not exact.

Ok, I can live with that for the normal use cases.  This example from the end of my previous message using 95% seems like a problem still, I don't like syntax that unexpectedly kills performance like this one.  If this can't be improved in the initial release of the feature I'd suggest we at least make a strong disclaimer in the docs, along the lines of:

"It is possible for FETCH FIRST N PERCENT to create poorly performing query plans when the N supplied exceeds 50 percent.  In these cases query execution can take an order of magnitude longer to execute than simply returning the full table.  If performance is critical using an explicit row count for limiting is recommended."

I'm not certain the 50 percent is the true threshold of where things start to fall apart, I just used that as a likely guess for now.  I can do some more testing this week to identify where things start falling apart performance wise.  Thanks,

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
    FROM r10mwide
    FETCH FIRST 95 PERCENT ROWS ONLY
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;
                                                               QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=651432.48..651432.49 rows=1 width=24) (actual time=58981.043..58981.044 rows=1 loops=1)
   ->  Limit  (cost=230715.67..461431.34 rows=9500057 width=20) (actual time=0.017..55799.389 rows=9500000 loops=1)
         ->  Seq Scan on r10mwide  (cost=0.00..242858.60 rows=10000060 width=20) (actual time=0.014..3847.146 rows=10000000 loops=1)
 Planning Time: 0.117 ms
 Execution Time: 59079.680 ms
(5 rows)  

Ryan Lambert

pgsql-hackers by date:

Previous
From: Patrick McHardy
Date:
Subject: [PATCH] Fix trigger argument propagation to child partitions
Next
From: Daniel Gustafsson
Date:
Subject: Re: Contribution to Perldoc for TestLib module in Postgres