Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAA4eK1JFY6bETF_vq3695dWSCrpz-nm=LH6AFb7W7PuCskDEtA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
List pgsql-hackers
On Mon, Jan 18, 2021 at 10:45 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Fri, Jan 15, 2021 at 7:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Here is an additional review of
> > v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are
> > quite a few comments raised on the V11-0001* patch. I suggest first
> > post a revised version of V11-0001* patch addressing those comments
> > and then you can separately post a revised version of
> > v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.
> >
>
> 1)
>
> >Here, it seems we are accessing the relation descriptor without any
> >lock on the table which is dangerous considering the table can be
> >modified in a parallel session. Is there a reason why you think this
> >is safe? Did you check anywhere else such a coding pattern?
>
> Yes, there's a very good reason and I certainly have checked for the
> same coding pattern elsewhere, and not just randomly decided that
> locking can be ignored.
> The table has ALREADY been locked (by the caller) during the
> parse/analyze phase.
>

Fair enough. I suggest adding a comment saying the same so that the
reader doesn't get confused about the same.

> (This is not the case for a partition, in which case the patch code
> uses AccessShareLock, as you will see).

Okay, but I see you release the lock on partition rel immediately
after checking parallel-safety. What if a user added some
parallel-unsafe constraint (via Alter Table) after that check?

>
> 4)
>
> >domain_max_parallel_hazard_for_modify()
> >{
> >..
> >+ if (isnull)
> >+ {
> >+ /*
> >+ * This shouldn't ever happen, but if it does, log a WARNING
> >+ * and return UNSAFE, rather than erroring out.
> >+ */
> >+ elog(WARNING, "null conbin for constraint %u", con->oid);
> >+ context->max_hazard = PROPARALLEL_UNSAFE;
> >+ break;
> >+ }
> >..
> >}
> >index_expr_max_parallel_hazard_for_modify()
> >{
> >..
> >+ if (index_expr_item == NULL) /* shouldn't happen */
> >+ {
> >+ index_close(index_rel, lockmode);
> >+ context->max_hazard = PROPARALLEL_UNSAFE;
> >+ return context->max_hazard;
> >+ }
> >..
> >}
>
> >It is not clear why the above two are shouldn't happen cases and if so
> >why we want to treat them as unsafe. Ideally, there should be an
> >Assert if these can't happen but it is difficult to decide without
> >knowing why you have considered them unsafe?
>
> The checks being done here for "should never happen" cases are THE
> SAME as other parts of the Postgres code.
> For example, search Postgres code for "null conbin" and you'll find 6
> other places in the Postgres code which actually ERROR out if conbin
> (binary representation of the constraint) in a pg_constraint tuple is
> found to be null.
> The cases that you point out in the patch used to also error out in
> the same way, but Vignesh suggested changing them to just return
> parallel-unsafe instead of erroring-out, which I agree with.
>

You have not raised a WARNING for the second case. But in the first
place what is the reasoning for making this different from other parts
of code? If we don't have a solid reason then I suggest keeping these
checks and errors the same as in other parts of the code.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Tang, Haiying"
Date:
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Next
From: "Tang, Haiying"
Date:
Subject: RE: Parallel INSERT (INTO ... SELECT ...)