Home > mailing lists

Re: FETCH FIRST clause PERCENT option - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: FETCH FIRST clause PERCENT option
Date	January 3, 2019 16:51:43
Msg-id	768688e7-3957-8956-3ffb-6bbf383853a2@2ndquadrant.com Whole thread Raw
In response to	Re: FETCH FIRST clause PERCENT option (Surafel Temesgen <surafel3000@gmail.com>)
Responses	Re: FETCH FIRST clause PERCENT option (Surafel Temesgen <surafel3000@gmail.com>)
List	pgsql-hackers

Tree view

On 1/3/19 1:00 PM, Surafel Temesgen wrote:
> Hi
> 
> On Tue, Jan 1, 2019 at 10:08 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     ...
> 
>     Firstly, the estimated number of rows should be about the same (10k) in
>     both cases, but the "percent" case incorrectly uses the total row count
>     (i.e. as if 100% rows were returned).
> 
> Ok I will correct it
> 
> 
>     It's correct that the total cost for the "percent" case is based on 100%
>     of rows, of course, because the code has to fetch everything and stash
>     it into the tuplestore in LIMIT_INITIAL phase.
> 
>     But that implies the startup cost for the "percent" case can't be 0,
>     because all this work has to be done before emitting the first row.
> 
> 
> Correct, startup cost must be equal to total cost of source data path
> and total cost have to be slightly greater than startup cost . I am
> planing to increase the total cost by limitCount * 0.1(similar to the
> parallel_tuple_cost) because getting tuple from tuplestore are almost
> similar operation to passing a tuple from worker to master backend.
> 

OK, sounds good in principle.

> 
>     So these costing aspects need fixing, IMHO.
> 
> 
>     The execution part of the patch seems to be working correctly, but I
>     think there's an improvement - we don't need to execute the outer plan
>     to completion before emitting the first row. For example, let's say the
>     outer plan produces 10000 rows in total and we're supposed to return the
>     first 1% of those rows. We can emit the first row after fetching the
>     first 100 rows, we don't have to wait for fetching all 10k rows.
> 
> 
> but total rows count is not given how can we determine safe to return row
> 

But you know how many rows were fetched from the outer plan, and this
number only grows grows. So the number of rows returned by FETCH FIRST
... PERCENT also only grows. For example with 10% of rows, you know that
once you reach 100 rows you should emit ~10 rows, with 200 rows you know
you should emit ~20 rows, etc. So you may track how many rows we're
supposed to return / returned so far, and emit them early.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tomas Vondra
Date: 03 January 2019, 16:40:39
Subject: Re: Delay locking partitions during query execution

From: Peter Eisentraut
Date: 03 January 2019, 17:00:41
Subject: Re: Python versions (was Re: RHEL 8.0 build)

Re: FETCH FIRST clause PERCENT option - Mailing list pgsql-hackers

Previous

Next