RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..) - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)
Date
Msg-id OS0PR01MB5716B3E950F4491E0944377A94EF9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)  (Greg Nancarrow <gregn4422@gmail.com>)
List pgsql-hackers
On August 2, 2021 2:04 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Mon, Aug 2, 2021 at 2:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jul 30, 2021 at 6:53 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Friday, July 30, 2021 2:52 PM Greg Nancarrow <gregn4422@gmail.com>
> wrote:
> > > > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > > >
> > > > > > Besides, I think we need a new default value about parallel
> > > > > > dml safety. Maybe 'auto' or 'null'(different from
> > > > > > safe/restricted/unsafe). Because, user is likely to alter the
> > > > > > safety to the default value to get the automatic safety check,
> > > > > > a independent default value can make it more clear.
> > > > > >
> > > > >
> > > > > Hmm, but auto won't work for partitioned tables, right? If so,
> > > > > that might appear like an inconsistency to the user and we need
> > > > > to document the same. Let me summarize the discussion so far in
> > > > > this thread so that it is helpful to others.
> > > > >
> > > >
> > > > To avoid that inconsistency, UNSAFE could be the default for
> > > > partitioned tables (and we would disallow setting AUTO for these).
> > > > So then AUTO is the default for non-partitioned tables only.
> > >
> > > I think this approach is reasonable, +1.
> > >
> >
> > I see the need to change to default via Alter Table but I am not sure
> > if Auto is the most appropriate way to handle that. How about using
> > DEFAULT itself as we do in the case of REPLICA IDENTITY? So, if users
> > have to alter parallel safety value to default, they need to just say
> > Parallel DML DEFAULT. The default would mean automatic behavior for
> > non-partitioned relations and ignore parallelism for partitioned
> > tables.
> >
> 
> Hmm, I'm not so sure I'm sold on that.
> I personally think "DEFAULT" here is vague, and users then need to know what
> DEFAULT equates to, based on the type of table (partitioned or non-partitioned
> table).
> Also, then there's two ways to set the actual "default" DML parallel-safety for
> partitioned tables: DEFAULT or UNSAFE.
> At least "AUTO" is a meaningful default option name for non-partitioned tables
> - "automatic" parallel-safety checking, and the fact that it isn't the default (and
> can't be set) for partitioned tables highlights the difference in the way being
> proposed to treat them (i.e. use automatic checking only for non-partitioned
> tables).
> I'd be interested to hear what others think.
> I think a viable alternative would be to record whether an explicit DML
> parallel-safety has been specified, and if not, apply default behavior (i.e. by
> default use automatic checking for non-partitioned tables and treat partitioned
> tables as UNSAFE). I'm just not sure whether this kind of distinction (explicit vs
> implicit default) has been used before in Postgres options.

I think both approaches are fine, but using "DEFAULT" might has a disadvantage
that if we somehow support automatic safety check for partitioned table in the
future, then the meaning of "DEFAULT" for partitioned table will change from
UNSAFE to automatic check. It could also bring some burden on the user to
modify their sql script.

Best regards,
houzj

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Skip partition tuple routing with constant partition key
Next
From: Andres Freund
Date:
Subject: EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS