Re: add_partial_path() may remove dominated path but still in use - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: add_partial_path() may remove dominated path but still in use
Date
Msg-id CAOP8fzZYvQWpANOzF4acMgOee7T6p9m12a2ZKfhPv5nCXtFLZw@mail.gmail.com
Whole thread Raw
In response to Re: add_partial_path() may remove dominated path but still in use  (Kohei KaiGai <kaigai@heterodb.com>)
List pgsql-hackers
I tried to make a patch to have dual hooks at set_rel_pathlist(), and
adjusted PG-Strom for the new design. It stopped to create GatherPath
by itself, just added a partial path for the base relation.
It successfully made a plan using parallel custom-scan node, without
system crash.

As I mentioned above, it does not use the new "post_rel_pathlist_hook"
because we can add both of partial/regular path-node at the first hook
with no particular problems.

Thanks,

dbt3=# explain select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1994-01-01'
    and l_shipdate < cast(date '1994-01-01' + interval '1 year' as date)
    and l_discount between 0.08 - 0.01 and 0.08 + 0.01
    and l_quantity < 24 limit 1;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=144332.62..144332.63 rows=1 width=4)
   ->  Aggregate  (cost=144332.62..144332.63 rows=1 width=4)
         ->  Gather  (cost=144285.70..144329.56 rows=408 width=4)
               Workers Planned: 2
               ->  Parallel Custom Scan (GpuPreAgg) on lineitem
(cost=143285.70..143288.76 rows=204 width=4)
                     Reduction: NoGroup
                     Outer Scan: lineitem  (cost=1666.67..143246.16
rows=63254 width=8)
                     Outer Scan Filter: ((l_discount >= '0.07'::double
precision) AND
                                                   (l_discount <=
'0.09'::double precision) AND
                                                   (l_quantity <
'24'::double precision) AND
                                                   (l_shipdate <
'1995-01-01'::date) AND
                                                   (l_shipdate >=
'1994-01-01'::date))
(8 rows)

Thanks,

2019年1月2日(水) 22:34 Kohei KaiGai <kaigai@heterodb.com>:
>
> 2018年12月31日(月) 22:25 Amit Kapila <amit.kapila16@gmail.com>:
> >
> > On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > >
> > > 2018年12月31日(月) 13:10 Amit Kapila <amit.kapila16@gmail.com>:
> > > >
> > > > On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
> > > > > 2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
> > > > > >
> > > > > > Kohei KaiGai <kaigai@heterodb.com> writes:
> > > > > > > 2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
> > > > > > >> However, first I'd like to know why this situation is arising in the first
> > > > > > >> place.  To have the situation you're describing, we'd have to have
> > > > > > >> attempted to make some Gather paths before we have all the partial paths
> > > > > > >> for the relation they're for.  Why is that a good thing to do?  It seems
> > > > > > >> like such Gathers are necessarily being made with incomplete information,
> > > > > > >> and we'd be better off to fix things so that none are made till later.
> > > > > >
> > > > > > > Because of the hook location, Gather-node shall be constructed with built-in
> > > > > > > and foreign partial scan node first, then extension gets a chance to add its
> > > > > > > custom paths (partial and full).
> > > > > > > At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
> > > > > > > generate_gather_paths().
> > > > > >
> > > > > > Hmm.  I'm inclined to think that we should have a separate hook
> > > > > > in which extensions are allowed to add partial paths, and that
> > > > > > set_rel_pathlist_hook should only be allowed to add regular paths.
> > > >
> > > > +1.  This idea sounds sensible to me.
> > > >
> > > > > >
> > > > > I have almost same opinion, but the first hook does not need to be
> > > > > dedicated for partial paths. As like set_foreign_pathlist() doing, we can
> > > > > add both of partial and regular paths here, then generate_gather_paths()
> > > > > may generate a Gather-path on top of the best partial-path.
> > > > >
> > > >
> > > > Won't it be confusing for users if we allow both partial and full
> > > > paths in first hook and only full paths in the second hook?
> > > > Basically, in many cases, the second hook won't be of much use.  What
> > > > advantage you are seeing in allowing both partial and full paths in
> > > > the first hook?
> > > >
> > > Two advantages. The first one is, it follows same manner of
> > > set_foreign_pathlist()
> > > which allows to add both of full and partial path if FDW supports parallel-scan.
> > > The second one is practical. During the path construction, extension needs to
> > > check availability to run (e.g, whether operators in WHERE-clause is supported
> > > on GPU device...), calculate its estimated cost and so on. Not a small
> > > portion of
> > > them are common for both of full and partial path. So, if the first
> > > hook accepts to
> > > add both kind of paths at once, extension can share the common properties.
> > >
> >
> > You have a point, though I am not sure how much difference it can
> > create for cost computation as ideally, both will have different
> > costing model.  I understand there are some savings by avoiding some
> > common work, is there any way to cache the required information?
> >
> I have no idea for the clean way.
> We may be able to have an opaque pointer for extension usage, however,
> it may be problematic if multiple extension uses the hook.
>
> > > Probably, the second hook is only used for a few corner case where an extension
> > > wants to manipulate path-list already built, like pg_hint_plan.
> > >
> >
> > Okay, but it could be some work for extension authors who are using
> > the current hook, not sure they would like to divide the work between
> > first and second hook.
> >
> I guess they don't divide their code, but choose either of them.
> In case of PG-Strom, even if there are two hooks around the point, it will use
> the first hook only, unless it does not prohibit to call add_path() here.
> However, some adjustments are required. Its current implementation makes
> GatherPath node with partial CustomScanPath because set_rel_pathlist_hook()
> is called after the generate_gather_paths().
> Once we could choose the first hook, no need to make a GatherPath by itself,
> because PostgreSQL-core will make the path if partial custom-path is enough
> reasonable cost. Likely, this adjustment is more preferable one.
>
> Thanks,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei <kaigai@heterodb.com>



--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: WIP: Avoid creation of the free space map for small tables
Next
From: Andrew Alsup
Date:
Subject: Re: SQL/JSON: functions