Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAJcOf-c2TrQFdhBgHMLs7HmzG1iLAUc3vnDmzYkmxek8Zjhfjg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)
List pgsql-hackers
On Wed, Feb 17, 2021 at 12:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Sat, Feb 13, 2021 at 12:17 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > > Actually, I tried adding the following in the loop that checks the
> > > > parallel-safety of each partition and it seemed to work:
> > > >
> > > >             glob->relationOids =
> > > >                     lappend_oid(glob->relationOids, pdesc->oids[i]);
> > > >
> > > > Can you confirm, is that what you were referring to?
> > >
> > > Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.
> > >
> > > Although it gets the job done, I'm not sure if manipulating
> > > relationOids from max_parallel_hazard() or its subroutines is okay,
> > > but I will let the committer decide that.  As I mentioned above, the
> > > person who designed this decided for some reason that it is
> > > extract_query_dependencies()'s job to populate
> > > PlannerGlobal.relationOids/invalItems.
> >
> > Yes, it doesn't really seem right doing it within max_parallel_hazard().
> > I tried doing it in extract_query_dependencies() instead - see
> > attached patch - and it seems to work, but I'm not sure if there might
> > be any unintended side-effects.
>
> One issue I see with the patch is that it fails to consider
> multi-level partitioning, because it's looking up partitions only in
> the target table's PartitionDesc and no other.
>
> @@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node,
> PlannerInfo *context)
>             RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
>
>             if (rte->rtekind == RTE_RELATION)
> -               context->glob->relationOids =
> -                   lappend_oid(context->glob->relationOids, rte->relid);
> +           {
> +               PlannerGlobal   *glob;
> +
> +               glob = context->glob;
> +               glob->relationOids =
> +                   lappend_oid(glob->relationOids, rte->relid);
> +               if (query->commandType == CMD_INSERT &&
> +                                   rte->relkind == RELKIND_PARTITIONED_TABLE)
>
> The RTE whose relkind is being checked here may not be the INSERT
> target relation's RTE, even though that's perhaps always true today.
> So, I suggest to pull the new block out of the loop over rtable and
> perform its deeds on the result RTE explicitly fetched using
> rt_fetch(), preferably using a separate recursive function.  I'm
> thinking something like the attached revised version.
>
>

Thanks. Yes, I'd forgotten about the fact a partition may itself be
partitioned, so it needs to be recursive (like in the parallel-safety
checks).
Your revised version seems OK, though I do have a concern:
Is the use of "table_close(rel, NoLock)'' intentional? That will keep
the lock (lockmode) until end-of-transaction.

Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: How to get Relation tuples in C function
Next
From: Thomas Munro
Date:
Subject: Re: pg_collation_actual_version() ERROR: cache lookup failed for collation 123