Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2] - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F8012B8A9D@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]  (Kohei KaiGai <kaigai@kaigai.gr.jp>)
Responses Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan[take-2]
List pgsql-hackers
The attached patch is rebased version of pass-down LIMIT clause patch,
which was forgotten to register on the last CF.

It allows to inform required number of rows to the sub-plans not only
ones we have individually handled at pass_down_bound().
Its primary target is control of number of remote tuple transfer over
the network connection by postgres_fdw.

According to the past discussion, we add a new field @ps_numTuples on
the PlanState to represent the required number of tuples.
Limit node assign a particular number on the field of sub-plan, then
this sub-plan can know its upper node does not require entire tuples,
and adjust its execution storategy.
Like MergeAppend, the sub-plan can also pass down the bounds to its
sub-plans again, if it makes sense and works correctly.

This feature is potentially a basis of GPU-based sorting on top of
CustomScan, because it has advantage for a workload to pick up the
top-N tuples if its data-size is enough small to load onto GPU-RAM.

Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
> Sent: Tuesday, January 03, 2017 12:07 PM
> To: Kaigai Kouhei(海外 浩平) <kaigai@ak.jp.nec.com>
> Cc: Jeevan Chalke <jeevan.chalke@enterprisedb.com>; Robert Haas
> <robertmhaas@gmail.com>; pgsql-hackers@postgresql.org; Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>; Andres Freund <andres@anarazel.de>
> Subject: Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan
> [take-2]
> 
> Oops, I oversight this patch was marked as "returned with feedback", not
> "moved to the next CF".
> 
> Its status has not been changed since the last update. (Code was revised
> according to the last comment by Jeevan, but CF-Nov was time up at that
> time.)
> 
> How do I handle the patch?
> 
> 2016-12-05 16:49 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
> > Hello,
> >
> > Sorry for my late response.
> > The attached patch reflects your comments.
> >
> >> Here are few comments on latest patch:
> >>
> >>
> >> 1.
> >> make/make check is fine, however I am getting regression failure in
> >> postgres_fdw contrib module (attached regression.diff).
> >> Please investigate and fix.
> >>
> > It was an incorrect interaction when postgres_fdw tries to push down
> > sorting to the remote side. We cannot attach LIMIT clause on the plain
> > scan path across SORT, however, the previous version estimated the
> > cost for the plain scan with LIMIT clause even if local sorting is needed.
> > If remote scan may return just 10 rows, estimated cost of the local
> > sort is very lightweight, thus, this unreasonable path was chosen.
> > (On the other hands, its query execution results were correct because
> > ps_numTuples is not delivered across Sort node, so ForeignScan
> > eventually scanned all the remote tuples. It made correct results but
> > not optimal from the viewpoint of performance.)
> >
> > The v3 patch estimates the cost with remote LIMIT clause only if
> > supplied pathkey strictly matches with the final output order of the
> > query, thus, no local sorting is expected.
> >
> > Some of the regression test cases still have different plans but due
> > to the new optimization by remote LIMIT clause.
> > Without remote LIMIT clause, some of regression test cases preferred
> > remote-JOIN + local-SORT then local-LIMIT.
> > Once we have remote-LIMIT option, it allows to discount the cost for
> > remote-SORT by choice of top-k heap sorting.
> > It changed the optimizer's decision on some test cases.
> >
> > Potential one big change is the test case below.
> >
> >  -- CROSS JOIN, not pushed down
> >  EXPLAIN (VERBOSE, COSTS OFF)
> >  SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1,
> > t2.c1 OFFSET 100 LIMIT 10;
> >
> > It assumed CROSS JOIN was not pushed down due to the cost for network
> > traffic, however, remote LIMIT reduced the estimated number of tuples
> > to be moved. So, all of the CROSS JOIN + ORDER BY + LIMIT became to
> > run on the remote side.
> >
> >> 2.
> >> +             *
> >> +             * MEMO: root->limit_tuples is not attached when query
> >> contains
> >> +             * grouping-clause or aggregate functions. So, we don's
> adjust
> >> +             * rows even if LIMIT <const> is supplied.
> >>
> >> Can you please explain why you are not doing this for grouping-clause
> >> or aggregate functions.
> >>
> > See grouping_planner() at optimizer/plan/planner.c It puts an invalid
> > value on the root->limit_tuples if query has GROUP BY clause, so we
> > cannot know number of tuples to be returned even if we have upper
> > limit actually.
> >
> >         /*
> >          * Figure out whether there's a hard limit on the number of rows
> that
> >          * query_planner's result subplan needs to return.  Even if we
> know a
> >          * hard limit overall, it doesn't apply if the query has any
> >          * grouping/aggregation operations, or SRFs in the tlist.
> >          */
> >         if (parse->groupClause ||
> >             parse->groupingSets ||
> >             parse->distinctClause ||
> >             parse->hasAggs ||
> >             parse->hasWindowFuncs ||
> >             parse->hasTargetSRFs ||
> >             root->hasHavingQual)
> >             root->limit_tuples = -1.0;
> >         else
> >             root->limit_tuples = limit_tuples;
> >
> >> 3.
> >> Typo:
> >>
> >> don's  => don't
> >>
> > Fixed,
> >
> > best regards,
> > ----
> > PG-Strom Project / NEC OSS Promotion Center KaiGai Kohei
> > <kaigai@ak.jp.nec.com>
> >
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
> > make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >
> 
> 
> 
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Jan Michálek
Date:
Subject: Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki
Next
From: Rahila Syed
Date:
Subject: [HACKERS] Adding support for Default partition in partitioning