Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id CALj2ACW376px6jYXhmu4EPQNsOwf42J1S=9eKtoN-tCgOypVRg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Luc Vlaming <luc@swarm64.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS  (Luc Vlaming <luc@swarm64.com>)
List pgsql-hackers
On Mon, Jan 4, 2021 at 5:44 PM Luc Vlaming <luc@swarm64.com> wrote:
> On 04-01-2021 12:16, Hou, Zhijie wrote:
> >> ================
> >> wrt v18-0002....patch:
> >>
> >> It looks like this introduces a state machine that goes like:
> >> - starts at CTAS_PARALLEL_INS_UNDEF
> >> - possibly moves to CTAS_PARALLEL_INS_SELECT
> >> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
> >> - if both were added at some stage, we can go to
> >> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
> >>
> >> what i'm wondering is why you opted to put logic around
> >> generate_useful_gather_paths and in cost_gather when to me it seems more
> >> logical to put it in create_gather_path? i'm probably missing something
> >> there?
> >
> > IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
> > And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can
onlycreate top node Gather.
 
> > So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top
node.

Right. We wanted to ignore parallel tuple cost for only the upper Gather path.

> I was wondering actually if we need the state machine. Reason is that as
> AFAICS the code could be placed in create_gather_path, where you can
> also check if it is a top gather node, whether the dest receiver is the
> right type, etc? To me that seems like a nicer solution as its makes
> that all logic that decides whether or not a parallel CTAS is valid is
> in a single place instead of distributed over various places.

IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.

I may be wrong. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Önder Kalacı
Date:
Subject: Re: row filtering for logical replication
Next
From: Dmitry Dolgov
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting