Re: FETCH FIRST clause PERCENT option - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: FETCH FIRST clause PERCENT option
Date
Msg-id 768688e7-3957-8956-3ffb-6bbf383853a2@2ndquadrant.com
Whole thread Raw
In response to Re: FETCH FIRST clause PERCENT option  (Surafel Temesgen <surafel3000@gmail.com>)
Responses Re: FETCH FIRST clause PERCENT option  (Surafel Temesgen <surafel3000@gmail.com>)
List pgsql-hackers
On 1/3/19 1:00 PM, Surafel Temesgen wrote:
> Hi
> 
> On Tue, Jan 1, 2019 at 10:08 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     ...
> 
>     Firstly, the estimated number of rows should be about the same (10k) in
>     both cases, but the "percent" case incorrectly uses the total row count
>     (i.e. as if 100% rows were returned).
> 
> Ok I will correct it
> 
> 
>     It's correct that the total cost for the "percent" case is based on 100%
>     of rows, of course, because the code has to fetch everything and stash
>     it into the tuplestore in LIMIT_INITIAL phase.
> 
>     But that implies the startup cost for the "percent" case can't be 0,
>     because all this work has to be done before emitting the first row.
> 
> 
> Correct, startup cost must be equal to total cost of source data path
> and total cost have to be slightly greater than startup cost . I am
> planing to increase the total cost by limitCount * 0.1(similar to the
> parallel_tuple_cost) because getting tuple from tuplestore are almost
> similar operation to passing a tuple from worker to master backend.
> 

OK, sounds good in principle.

> 
>     So these costing aspects need fixing, IMHO.
> 
> 
>     The execution part of the patch seems to be working correctly, but I
>     think there's an improvement - we don't need to execute the outer plan
>     to completion before emitting the first row. For example, let's say the
>     outer plan produces 10000 rows in total and we're supposed to return the
>     first 1% of those rows. We can emit the first row after fetching the
>     first 100 rows, we don't have to wait for fetching all 10k rows.
> 
> 
> but total rows count is not given how can we determine safe to return row
> 

But you know how many rows were fetched from the outer plan, and this
number only grows grows. So the number of rows returned by FETCH FIRST
... PERCENT also only grows. For example with 10% of rows, you know that
once you reach 100 rows you should emit ~10 rows, with 200 rows you know
you should emit ~20 rows, etc. So you may track how many rows we're
supposed to return / returned so far, and emit them early.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Delay locking partitions during query execution
Next
From: Peter Eisentraut
Date:
Subject: Re: Python versions (was Re: RHEL 8.0 build)