Re: FETCH FIRST clause PERCENT option - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: FETCH FIRST clause PERCENT option
Date
Msg-id b6a3bdfa-d5c8-534c-c1e8-fdabee061b04@2ndquadrant.com
Whole thread Raw
In response to Re: FETCH FIRST clause PERCENT option  (Surafel Temesgen <surafel3000@gmail.com>)
Responses Re: FETCH FIRST clause PERCENT option  (Surafel Temesgen <surafel3000@gmail.com>)
List pgsql-hackers

On 1/4/19 7:40 AM, Surafel Temesgen wrote:
> 
> 
> On Thu, Jan 3, 2019 at 4:51 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
>     On 1/3/19 1:00 PM, Surafel Temesgen wrote:
>     > Hi
>     >
>     > On Tue, Jan 1, 2019 at 10:08 PM Tomas Vondra
>     > <tomas.vondra@2ndquadrant.com
>     <mailto:tomas.vondra@2ndquadrant.com>
>     <mailto:tomas.vondra@2ndquadrant.com
>     <mailto:tomas.vondra@2ndquadrant.com>>> wrote:
> 
>     >     The execution part of the patch seems to be working correctly,
>     but I
>     >     think there's an improvement - we don't need to execute the
>     outer plan
>     >     to completion before emitting the first row. For example,
>     let's say the
>     >     outer plan produces 10000 rows in total and we're supposed to
>     return the
>     >     first 1% of those rows. We can emit the first row after
>     fetching the
>     >     first 100 rows, we don't have to wait for fetching all 10k rows.
>     >
>     >
>     > but total rows count is not given how can we determine safe to
>     return row
>     >
> 
>     But you know how many rows were fetched from the outer plan, and this
>     number only grows grows. So the number of rows returned by FETCH FIRST
>     ... PERCENT also only grows. For example with 10% of rows, you know that
>     once you reach 100 rows you should emit ~10 rows, with 200 rows you know
>     you should emit ~20 rows, etc. So you may track how many rows we're
>     supposed to return / returned so far, and emit them early.
> 
> 
> 
> 
> yes that is clear but i don't find it easy to put that in formula. may
> be someone with good mathematics will help
> 

What formula? All the math remains exactly the same, you just need to
update the number of rows to return and track how many rows are already
returned.

I haven't tried doing that, but AFAICS you'd need to tweak how/when
node->count is computed - instead of computing it only once it needs to
be updated after fetching each row from the subplan.

Furthermore, you'll need to stash the subplan rows somewhere (into a
tuplestore probably), and whenever the node->count value increments,
you'll need to grab a row from the tuplestore and return that (i.e.
tweak node->position and set node->subSlot).

I hope that makes sense. The one thing I'm not quite sure about is
whether tuplestore allows adding and getting rows at the same time.

Does that make sense?

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Delay locking partitions during query execution
Next
From: Peter Eisentraut
Date:
Subject: Re: Fast path for empty relids in check_outerjoin_delay()