Thread: PassDownLimitBound for ForeignScan/CustomScan [take-2]

PassDownLimitBound for ForeignScan/CustomScan [take-2]

From
Kouhei Kaigai
Date:
Hello,

The attached patch is revised version of the pass-down-bounds feature.
Its functionality is not changed from the previous version, however,
implementation was revised according to the discussion at the last CF.

This patch add a new fields (ps_numTuples) to the PlanState. This is
a hint for optimization when parent node needs first N-tuples only.
It shall be set prior to ExecProcNode() after ExecInitNode() or
ExecReScan() by the parent node, then child nodes can adjust its
execution behavior (e.g, Sort will take top-N heapsort if ps_numTuples
is set) and pass down the hint to its child nodes furthermore.

As an example, I enhanced postgres_fdw to understand the ps_numTuples
if it is set. If and when remote ORDER BY is pushed down, the latest
code tries to sort the entire remote table because it does not know
how many rows to be returned. Thus, it took larger execution time.
On the other hands, the patched one runs the remote query with LIMIT
clause according to the ps_numTuples; which is informed by the Limit
node on top of the ForeignScan node.

* without patch
=================
postgres=# explain (analyze,verbose) select * from ft order by x,y limit 10;
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.00..100.43 rows=10 width=52) (actual time=2332.548..2332.550 rows=10 loops=1)
   Output: id, x, y, z
   ->  Foreign Scan on public.ft  (cost=100.00..146.46 rows=1077 width=52) (actual time=2332.547..2332.548 rows=10
loops=1)
         Output: id, x, y, z
         Remote SQL: SELECT id, x, y, z FROM public.t ORDER BY x ASC NULLS LAST, y ASC NULLS LAST
 Planning time: 0.177 ms
 Execution time: 2445.590 ms
(7 rows)

* with patch
==============
postgres=# explain (analyze,verbose) select * from ft order by x,y limit 10;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.00..100.43 rows=10 width=52) (actual time=579.469..579.471 rows=10 loops=1)
   Output: id, x, y, z
   ->  Foreign Scan on public.ft  (cost=100.00..146.46 rows=1077 width=52) (actual time=579.468..579.469 rows=10
loops=1)
         Output: id, x, y, z
         Remote SQL: SELECT id, x, y, z FROM public.t ORDER BY x ASC NULLS LAST, y ASC NULLS LAST
 Planning time: 0.123 ms
 Execution time: 579.858 ms
(7 rows)


Right now, I have a few concerns for this patch.
1. Because LIMIT clause can have expression not only constant value,
   we cannot determine the value of ps_numTuples until execution time.
   So, it is not possible to adjust remote query on planning time, and
   EXPLAIN does not show exact remote query even if LIMIT clause was
   attached actually.

2. Where is the best location to put the interface contract to set
   ps_numTuples field. It has to be set prior to the first ExecProcNode()
   after ExecInitNode() or ExecReScan().

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas@gmail.com]
> Sent: Friday, September 16, 2016 12:39 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Jeevan Chalke; pgsql-hackers@postgresql.org; Etsuro Fujita; Andres Freund
> Subject: ##freemail## Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan
> 
> On Tue, Sep 13, 2016 at 9:07 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > In the current implementation calls recompute_limits() on the first
> > invocation of ExecLimit and ExecReScanLimit. Do we expect the
> > ps->numTuples will be also passed down to the child nodes on the same
> > timing?
> 
> Sure, unless we find some reason why that's not good.
> 
> > I also think this new executor contract shall be considered as a hint
> > (but not a requirement) for the child nodes, because it allows the
> > parent nodes to re-distribute the upper limit regardless of the type
> > of the child nodes as long as the parent node can work correctly and
> > has benefit even if the child node returns a part of tuples. It makes
> > the decision whether the upper limit should be passed down much simple.
> > The child node "can" ignore the hint but can utilize for more optimization.
> 
> +1.
> 
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Attachment

Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

From
Jeevan Chalke
Date:
Hi,

I have reviewed the patch.

Patch applies cleanly. make/"make install"/"make check" all are fine.

I too see a good performance in execution time when LIMIT is pushed with
cursor query in postgres_fdw at execution time

However here are few comments on the changes:

1. ps_numTuples is declared as long, however offset and count members in
LimitState struct and bound member in SortState struct is int64.  However
long on 32 bit machine may be 32 bits and thus I think tuples_needed which
is long may have overflow hazards as it may store int64 + int64.  I think
ps_numTuples should be int64.

2. Robert suggested following in the previous discussion:
"For example, suppose we add a new PlanState member "long
numTuples" where 0 means that the number of tuples that will be needed
is unknown (so that most node types need not initialize it), a
positive value is an upper bound on the number of tuples that will be
fetched, and -1 means that it is known for certain that we will need
all of the tuples."

We should have 0 for the default case so that we don't need to initialize it
at most of the places.  But I see many such changes in the patch.  I think
this is not possible here since 0 can be a legal user provided value which
cannot be set as a default (default is all rows).

However do you think, can we avoid that? Is there any other way so that we
don't need every node having ps_numTuples to be set explicitly?

Apart from this patch look good to me.


Regards,
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

From
Robert Haas
Date:
On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> As an example, I enhanced postgres_fdw to understand the ps_numTuples
> if it is set. If and when remote ORDER BY is pushed down, the latest
> code tries to sort the entire remote table because it does not know
> how many rows to be returned. Thus, it took larger execution time.
> On the other hands, the patched one runs the remote query with LIMIT
> clause according to the ps_numTuples; which is informed by the Limit
> node on top of the ForeignScan node.

So there are two cases here.  If the user says LIMIT 12, we could in
theory know that at planner time and optimize accordingly.  If the
user says LIMIT twelve(), however, we will need to wait until
execution time unless twelve() happens to be capable of being
simplified to a constant by the planner.

Therefore, it's possible to imagine having two mechanisms here. In the
simple case where the LIMIT and OFFSET values are constants, we could
implement a system to get hold of that information during planning and
use it for whatever we like.   In addition, we can have an
execution-time system that optimizes based on values available at
execution (regardless of whether those values were also available
during planning).  Those are, basically, two separate things, and this
patch has enough to do just focusing on one of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

From
Robert Haas
Date:
On Tue, Nov 8, 2016 at 6:54 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> 1. ps_numTuples is declared as long, however offset and count members in
> LimitState struct and bound member in SortState struct is int64.  However
> long on 32 bit machine may be 32 bits and thus I think tuples_needed which
> is long may have overflow hazards as it may store int64 + int64.  I think
> ps_numTuples should be int64.

I suggested long originally because that's what ExecutorRun() was
using at the time.  It seems that it got changed to uint64 in
23a27b039d94ba359286694831eafe03cd970eef, so I guess we should
probably use uint64.

> 2. Robert suggested following in the previous discussion:
> "For example, suppose we add a new PlanState member "long
> numTuples" where 0 means that the number of tuples that will be needed
> is unknown (so that most node types need not initialize it), a
> positive value is an upper bound on the number of tuples that will be
> fetched, and -1 means that it is known for certain that we will need
> all of the tuples."
>
> We should have 0 for the default case so that we don't need to initialize it
> at most of the places.  But I see many such changes in the patch.  I think
> this is not possible here since 0 can be a legal user provided value which
> cannot be set as a default (default is all rows).
>
> However do you think, can we avoid that? Is there any other way so that we
> don't need every node having ps_numTuples to be set explicitly?

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

From
Kouhei Kaigai
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
> Sent: Thursday, November 10, 2016 3:08 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: pgsql-hackers@postgresql.org; Jeevan Chalke; Etsuro Fujita; Andres Freund
> Subject: Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]
> 
> On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > As an example, I enhanced postgres_fdw to understand the ps_numTuples
> > if it is set. If and when remote ORDER BY is pushed down, the latest
> > code tries to sort the entire remote table because it does not know
> > how many rows to be returned. Thus, it took larger execution time.
> > On the other hands, the patched one runs the remote query with LIMIT
> > clause according to the ps_numTuples; which is informed by the Limit
> > node on top of the ForeignScan node.
> 
> So there are two cases here.  If the user says LIMIT 12, we could in
> theory know that at planner time and optimize accordingly.  If the
> user says LIMIT twelve(), however, we will need to wait until
> execution time unless twelve() happens to be capable of being
> simplified to a constant by the planner.
> 
> Therefore, it's possible to imagine having two mechanisms here. In the
> simple case where the LIMIT and OFFSET values are constants, we could
> implement a system to get hold of that information during planning and
> use it for whatever we like.   In addition, we can have an
> execution-time system that optimizes based on values available at
> execution (regardless of whether those values were also available
> during planning).  Those are, basically, two separate things, and this
> patch has enough to do just focusing on one of them.
>
OK, we need to have a private value of postgres_fdw to indicate whether
LIMIT and OFFSET were supplied at the planner stage. If any, it has to
be matched with the ps_numTuples informed at the executor stage.

I'll revise the patch.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

From
Kouhei Kaigai
Date:
> On Tue, Nov 8, 2016 at 6:54 AM, Jeevan Chalke
> <jeevan.chalke@enterprisedb.com> wrote:
> > 1. ps_numTuples is declared as long, however offset and count members in
> > LimitState struct and bound member in SortState struct is int64.  However
> > long on 32 bit machine may be 32 bits and thus I think tuples_needed which
> > is long may have overflow hazards as it may store int64 + int64.  I think
> > ps_numTuples should be int64.
> 
> I suggested long originally because that's what ExecutorRun() was
> using at the time.  It seems that it got changed to uint64 in
> 23a27b039d94ba359286694831eafe03cd970eef, so I guess we should
> probably use uint64.
> 
> > 2. Robert suggested following in the previous discussion:
> > "For example, suppose we add a new PlanState member "long
> > numTuples" where 0 means that the number of tuples that will be needed
> > is unknown (so that most node types need not initialize it), a
> > positive value is an upper bound on the number of tuples that will be
> > fetched, and -1 means that it is known for certain that we will need
> > all of the tuples."
> >
> > We should have 0 for the default case so that we don't need to initialize it
> > at most of the places.  But I see many such changes in the patch.  I think
> > this is not possible here since 0 can be a legal user provided value which
> > cannot be set as a default (default is all rows).
> >
> > However do you think, can we avoid that? Is there any other way so that we
> > don't need every node having ps_numTuples to be set explicitly?
> 
> +1.
>
I thought we have to distinguish a case if LIMIT 0 is supplied.
However, in this case, ExecLimit() never goes down to the child nodes,
thus, its ps_numTuples shall not be referenced anywhere.

OK, I'll use uint64 for ps_numTuples, and 0 shall be a usual default
value that means no specific number of rows are given.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

From
Kouhei Kaigai
Date:
Hello,

The attached patch is a revised version of pass-down LIMIT to FDW/CSP.

Below is the updates from the last version.

'ps_numTuples' of PlanState was declared as uint64, instead of long
to avoid problems on 32bits machine when a large LIMIT clause is
supplied.

'ps_numTuples' is re-interpreted; 0 means that its upper node wants
to fetch all the tuples. It allows to eliminate a boring initialization
on ExecInit handler for each executor node.

Even though it was not suggested, estimate_path_cost_size() of postgres_fdw
adjusts number of rows if foreign-path is located on top-level of
the base-relations and LIMIT clause takes a constant value.
It will make more adequate plan as follows:

* WITHOUT this patch
--------------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id and t_a.x < t_b.x LIMIT 100;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Limit  (cost=261.17..274.43 rows=100 width=88)
   Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
   ->  Hash Join  (cost=261.17..581.50 rows=2416 width=88)
         Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
         Hash Cond: (t_a.id = t_b.id)
         Join Filter: (t_a.x < t_b.x)
         ->  Foreign Scan on public.t_a  (cost=100.00..146.12 rows=1204 width=44)
               Output: t_a.id, t_a.x, t_a.y
               Remote SQL: SELECT id, x, y FROM public.t
         ->  Hash  (cost=146.12..146.12 rows=1204 width=44)
               Output: t_b.id, t_b.x, t_b.y
               ->  Foreign Scan on public.t_b  (cost=100.00..146.12 rows=1204 width=44)
                     Output: t_b.id, t_b.x, t_b.y
                     Remote SQL: SELECT id, x, y FROM public.t
(14 rows)

* WITH this patch
-----------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id and t_a.x < t_b.x LIMIT 100;
                                                                      QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.00..146.58 rows=100 width=88)
   Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
   ->  Foreign Scan  (cost=100.00..146.58 rows=100 width=88)
         Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
         Relations: (public.t_a) INNER JOIN (public.t_b)
         Remote SQL: SELECT r1.id, r1.x, r1.y, r2.id, r2.x, r2.y FROM (public.t r1 INNER JOIN public.t r2 ON (((r1.x <
r2.x))AND ((r1.id = r2.id))))
 
(6 rows)


On the other hands, I noticed it is not safe to attach LIMIT clause at
the planner stage because root->limit_tuples is declared as double.
Even if LIMIT clause takes a constant value, it is potentially larger
than 2^53 which is the limitation we can represent accurately with
float64 data type but LIMIT clause allows up to 2^63-1.
So, postgres_fdw now attaches LIMIT clause on the remote query on
execution time only.

Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas@gmail.com]
> Sent: Thursday, November 10, 2016 3:08 AM
> To: Kaigai Kouhei(海外 浩平) <kaigai@ak.jp.nec.com>
> Cc: pgsql-hackers@postgresql.org; Jeevan Chalke
> <jeevan.chalke@enterprisedb.com>; Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>; Andres Freund <andres@anarazel.de>
> Subject: ##freemail## Re: PassDownLimitBound for ForeignScan/CustomScan
> [take-2]
> 
> On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
> wrote:
> > As an example, I enhanced postgres_fdw to understand the ps_numTuples
> > if it is set. If and when remote ORDER BY is pushed down, the latest
> > code tries to sort the entire remote table because it does not know
> > how many rows to be returned. Thus, it took larger execution time.
> > On the other hands, the patched one runs the remote query with LIMIT
> > clause according to the ps_numTuples; which is informed by the Limit
> > node on top of the ForeignScan node.
> 
> So there are two cases here.  If the user says LIMIT 12, we could in theory
> know that at planner time and optimize accordingly.  If the user says LIMIT
> twelve(), however, we will need to wait until execution time unless twelve()
> happens to be capable of being simplified to a constant by the planner.
> 
> Therefore, it's possible to imagine having two mechanisms here. In the
> simple case where the LIMIT and OFFSET values are constants, we could
> implement a system to get hold of that information during planning and
> use it for whatever we like.   In addition, we can have an
> execution-time system that optimizes based on values available at execution
> (regardless of whether those values were also available during planning).
> Those are, basically, two separate things, and this patch has enough to
> do just focusing on one of them.
> 
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> Company

Attachment

Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

From
Haribabu Kommi
Date:


On Thu, Nov 10, 2016 at 10:59 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> On Tue, Nov 8, 2016 at 6:54 AM, Jeevan Chalke
> <jeevan.chalke@enterprisedb.com> wrote:
> > 1. ps_numTuples is declared as long, however offset and count members in
> > LimitState struct and bound member in SortState struct is int64.  However
> > long on 32 bit machine may be 32 bits and thus I think tuples_needed which
> > is long may have overflow hazards as it may store int64 + int64.  I think
> > ps_numTuples should be int64.
>
> I suggested long originally because that's what ExecutorRun() was
> using at the time.  It seems that it got changed to uint64 in
> 23a27b039d94ba359286694831eafe03cd970eef, so I guess we should
> probably use uint64.
>
> > 2. Robert suggested following in the previous discussion:
> > "For example, suppose we add a new PlanState member "long
> > numTuples" where 0 means that the number of tuples that will be needed
> > is unknown (so that most node types need not initialize it), a
> > positive value is an upper bound on the number of tuples that will be
> > fetched, and -1 means that it is known for certain that we will need
> > all of the tuples."
> >
> > We should have 0 for the default case so that we don't need to initialize it
> > at most of the places.  But I see many such changes in the patch.  I think
> > this is not possible here since 0 can be a legal user provided value which
> > cannot be set as a default (default is all rows).
> >
> > However do you think, can we avoid that? Is there any other way so that we
> > don't need every node having ps_numTuples to be set explicitly?
>
> +1.
>
I thought we have to distinguish a case if LIMIT 0 is supplied.
However, in this case, ExecLimit() never goes down to the child nodes,
thus, its ps_numTuples shall not be referenced anywhere.

OK, I'll use uint64 for ps_numTuples, and 0 shall be a usual default
value that means no specific number of rows are given.


Marked as "returned with feedback" in 2016-11 commitfest.
Please feel free to update the status when you submit the latest patch.

Regards,
Hari Babu
Fujitsu Australia