Thread: [bug?] Missed parallel safety checks, and wrong parallel safety

[bug?] Missed parallel safety checks, and wrong parallel safety

From
"tsunakawa.takay@fujitsu.com"
Date:
Hello,


I think we've found a few existing problems with handling the parallel safety of functions while doing an experiment.
CouldI hear your opinions on what we should do?  I'd be willing to create and submit a patch to fix them. 

The experiment is to add a parallel safety check in FunctionCallInvoke() and run the regression test with
force_parallel_mode=regress. The added check errors out with ereport(ERROR) when the about-to-be-called function is
parallelunsafe and the process is currently in parallel mode.  6 test cases failed because the following
parallel-unsafefunctions were called: 

    dsnowball_init
    balkifnull
    int44out
    text_w_default_out
    widget_out

The first function is created in src/backend/snowball/snowball_create.sql for full text search.  The remaining
functionsare created during the regression test run. 

The relevant issues follow.


(1)
All the above functions are actually parallel safe looking at their implementations.  It seems that their CREATE
FUNCTIONstatements are just missing PARALLEL SAFE specifications, so I think I'll add them.  dsnowball_lexize() may
alsobe parallel safe. 


(2)
I'm afraid the above phenomenon reveals that postgres overlooks parallel safety checks in some places.  Specifically,
wenoticed the following: 

* User-defined aggregate
CREATE AGGREGATE allows to specify parallel safety of the aggregate itself and the planner checks it, but the support
functionof the aggregate is not checked.  OTOH, the document clearly says: 

https://www.postgresql.org/docs/devel/xaggr.html

"Worth noting also is that for an aggregate to be executed in parallel, the aggregate itself must be marked PARALLEL
SAFE.The parallel-safety markings on its support functions are not consulted." 

https://www.postgresql.org/docs/devel/sql-createaggregate.html

"An aggregate will not be considered for parallelization if it is marked PARALLEL UNSAFE (which is the default!) or
PARALLELRESTRICTED. Note that the parallel-safety markings of the aggregate's support functions are not consulted by
theplanner, only the marking of the aggregate itself." 

Can we check the parallel safety of aggregate support functions during statement execution and error out?  Is there any
reasonnot to do so? 

* User-defined data type
The input, output, send,receive, and other functions of a UDT are not checked for parallel safety.  Is there any good
reasonto not check them other than the concern about performance? 

* Functions for full text search
Should CREATE TEXT SEARCH TEMPLATE ensure that the functions are parallel safe?  (Those functions could be changed to
parallelunsafe later with ALTER FUNCTION, though.) 


(3) Built-in UDFs are not checked for parallel safety
The functions defined in fmgr_builtins[], which are derived from pg_proc.dat, are not checked.  Most of them are marked
parallelsafe, but some are paralel unsaferestricted. 

Besides, changing their parallel safety with ALTER FUNCTION PARALLEL does not affect the selection of query plan.  This
isbecause fmgr_builtins[] does not have a member for parallel safety. 

Should we add a member for parallel safety in fmgr_builtins[], and disallow ALTER FUNCTION to change the parallel
safetyof builtin UDFs? 


Regards
Takayuki Tsunakawa





Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Bharath Rupireddy
Date:
On Tue, Apr 20, 2021 at 2:23 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
> (2)
> I'm afraid the above phenomenon reveals that postgres overlooks parallel safety checks in some places.  Specifically,
wenoticed the following: 
>
> * User-defined aggregate
> CREATE AGGREGATE allows to specify parallel safety of the aggregate itself and the planner checks it, but the support
functionof the aggregate is not checked.  OTOH, the document clearly says: 
>
> https://www.postgresql.org/docs/devel/xaggr.html
>
> "Worth noting also is that for an aggregate to be executed in parallel, the aggregate itself must be marked PARALLEL
SAFE.The parallel-safety markings on its support functions are not consulted." 
>
> https://www.postgresql.org/docs/devel/sql-createaggregate.html
>
> "An aggregate will not be considered for parallelization if it is marked PARALLEL UNSAFE (which is the default!) or
PARALLELRESTRICTED. Note that the parallel-safety markings of the aggregate's support functions are not consulted by
theplanner, only the marking of the aggregate itself." 

IMO, the reason for not checking the parallel safety of the support
functions is that the functions themselves can have whole lot of other
functions (can be nested as well) which might be quite hard to check
at the planning time. That is why the job of marking an aggregate as
parallel safe is best left to the user. They have to mark the aggreage
parallel unsafe if at least one support function is parallel unsafe,
otherwise parallel safe.

> Can we check the parallel safety of aggregate support functions during statement execution and error out?  Is there
anyreason not to do so? 

And if we were to do above, within the function execution API, we need
to know where the function got called from(?). It is best left to the
user to decide whether a function/aggregate is parallel safe or not.
This is the main reason we have declarative constructs like parallel
safe/unsafe/restricted.

For core functions, we definitely should properly mark parallel
safe/restricted/unsafe tags wherever possible.

Please correct me If I miss something.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Tom Lane
Date:
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> On Tue, Apr 20, 2021 at 2:23 PM tsunakawa.takay@fujitsu.com
> <tsunakawa.takay@fujitsu.com> wrote:
>> https://www.postgresql.org/docs/devel/xaggr.html
>>
>> "Worth noting also is that for an aggregate to be executed in parallel, the aggregate itself must be marked PARALLEL
SAFE.The parallel-safety markings on its support functions are not consulted." 

> IMO, the reason for not checking the parallel safety of the support
> functions is that the functions themselves can have whole lot of other
> functions (can be nested as well) which might be quite hard to check
> at the planning time. That is why the job of marking an aggregate as
> parallel safe is best left to the user.

Yes.  I think the documentation is perfectly clear that this is
intentional; I don't see a need to change it.

>> Should we add a member for parallel safety in fmgr_builtins[], and disallow ALTER FUNCTION to change the parallel
safetyof builtin UDFs? 

No.  You'd have to be superuser anyway to do that, and we're not in the
habit of trying to put training wheels on superusers.

Don't have an opinion about the other points yet.

            regards, tom lane



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tom Lane <tgl@sss.pgh.pa.us>
> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> > IMO, the reason for not checking the parallel safety of the support
> > functions is that the functions themselves can have whole lot of other
> > functions (can be nested as well) which might be quite hard to check
> > at the planning time. That is why the job of marking an aggregate as
> > parallel safe is best left to the user.
>
> Yes.  I think the documentation is perfectly clear that this is
> intentional; I don't see a need to change it.

OK, that's what I expected.  I understood from this that the Postgres's stance toward parallel safety is that Postgres
doesits best effort to check parallel safety (as far as it doesn't hurt performance much, and perhaps the core code
doesn'tget very complex), and the user should be responsible for the actual parallel safety of ancillary objects (in
thiscase, support functions for an aggregate) of the target object that he/she marked as parallel safe. 


> >> Should we add a member for parallel safety in fmgr_builtins[], and disallow
> ALTER FUNCTION to change the parallel safety of builtin UDFs?
>
> No.  You'd have to be superuser anyway to do that, and we're not in the
> habit of trying to put training wheels on superusers.

Understood.  However, we may add the parallel safety member in fmgr_builtins[] in another thread for parallel INSERT
SELECT. I'd appreciate your comment on this if you see any concern. 


> Don't have an opinion about the other points yet.

I'd like to have your comments on them, too.  But I understand you must be so busy at least until the beta release of
PG14. 


Regards
Takayuki Tsunakawa





Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Tom Lane
Date:
"tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> writes:
> From: Tom Lane <tgl@sss.pgh.pa.us>
>> No.  You'd have to be superuser anyway to do that, and we're not in the
>> habit of trying to put training wheels on superusers.

> Understood.  However, we may add the parallel safety member in fmgr_builtins[] in another thread for parallel INSERT
SELECT. I'd appreciate your comment on this if you see any concern. 

[ raised eyebrow... ]  I find it very hard to understand why that would
be necessary, or even a good idea.  Not least because there's no spare
room there; you'd have to incur a substantial enlargement of the
array to add another flag.  But also, that would indeed lock down
the value of the parallel-safety flag, and that seems like a fairly
bad idea.

            regards, tom lane



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tom Lane <tgl@sss.pgh.pa.us>
> [ raised eyebrow... ]  I find it very hard to understand why that would
> be necessary, or even a good idea.  Not least because there's no spare
> room there; you'd have to incur a substantial enlargement of the
> array to add another flag.  But also, that would indeed lock down
> the value of the parallel-safety flag, and that seems like a fairly
> bad idea.

You're right, FmgrBuiltins is already fully packed (24 bytes on 64-bit machines).  Enlarging the frequently accessed
fmgr_builtinsarray may wreak unexpectedly large adverse effect on performance. 

I wanted to check the parallel safety of functions, which various objects (data type, index, trigger, etc.) come down
to,in FunctionCallInvoke() and other few places.  But maybe we skip the check for built-in functions.  That's a matter
ofwhere we draw a line between where we check and where we don't. 


Regards
Takayuki Tsunakawa





RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
> I think we've found a few existing problems with handling the parallel safety of
> functions while doing an experiment.  Could I hear your opinions on what we
> should do?  I'd be willing to create and submit a patch to fix them.
>
> The experiment is to add a parallel safety check in FunctionCallInvoke() and run
> the regression test with force_parallel_mode=regress.  The added check
> errors out with ereport(ERROR) when the about-to-be-called function is
> parallel unsafe and the process is currently in parallel mode.  6 test cases failed
> because the following parallel-unsafe functions were called:
>
>     dsnowball_init
>     balkifnull
>     int44out
>     text_w_default_out
>     widget_out
>
> The first function is created in src/backend/snowball/snowball_create.sql for
> full text search.  The remaining functions are created during the regression
> test run.
>
> (1)
> All the above functions are actually parallel safe looking at their
> implementations.  It seems that their CREATE FUNCTION statements are just
> missing PARALLEL SAFE specifications, so I think I'll add them.
> dsnowball_lexize() may also be parallel safe.

I agree that it's better to mark the function with correct parallel safety lable.
Especially for the above functions which will be executed in parallel mode.
It will be friendly to developer and user who is working on something related to parallel test.

So, I attached the patch to mark the above functions parallel safe.

Best regards,
houzj

Attachment

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Wed, Apr 21, 2021 at 8:12 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Tom Lane <tgl@sss.pgh.pa.us>
> > [ raised eyebrow... ]  I find it very hard to understand why that would
> > be necessary, or even a good idea.  Not least because there's no spare
> > room there; you'd have to incur a substantial enlargement of the
> > array to add another flag.  But also, that would indeed lock down
> > the value of the parallel-safety flag, and that seems like a fairly
> > bad idea.
>
> You're right, FmgrBuiltins is already fully packed (24 bytes on 64-bit machines).  Enlarging the frequently accessed
fmgr_builtinsarray may wreak unexpectedly large adverse effect on performance. 
>
> I wanted to check the parallel safety of functions, which various objects (data type, index, trigger, etc.) come down
to,in FunctionCallInvoke() and other few places.  But maybe we skip the check for built-in functions.  That's a matter
ofwhere we draw a line between where we check and where we don't. 
>

IIUC, the idea here is to check for parallel safety of functions at
someplace in the code during function invocation so that if we execute
any parallel unsafe/restricted function via parallel worker then we
error out. If so, isn't it possible to deal with built-in and
non-built-in functions in the same way?

I think we want to have some safety checks for functions as we have
for transaction id in AssignTransactionId(), command id in
CommandCounterIncrement(), for write operations in
heap_prepare_insert(), etc. Is that correct?

--
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Tom Lane
Date:
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Wed, Apr 21, 2021 at 8:12 AM tsunakawa.takay@fujitsu.com
> <tsunakawa.takay@fujitsu.com> wrote:
>> From: Tom Lane <tgl@sss.pgh.pa.us>
>>> [ raised eyebrow... ]  I find it very hard to understand why that would
>>> be necessary, or even a good idea.

> IIUC, the idea here is to check for parallel safety of functions at
> someplace in the code during function invocation so that if we execute
> any parallel unsafe/restricted function via parallel worker then we
> error out. If so, isn't it possible to deal with built-in and
> non-built-in functions in the same way?

Yeah, one of the reasons I doubt this is a great idea is that you'd
still have to fetch the pg_proc row for non-built-in functions.

The obvious place to install such a check is fmgr_info(), which is
fetching said row anyway for other purposes, so it's really hard to
see how adding anything to FmgrBuiltin is going to help.

            regards, tom lane



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tom Lane <tgl@sss.pgh.pa.us>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > IIUC, the idea here is to check for parallel safety of functions at
> > someplace in the code during function invocation so that if we execute
> > any parallel unsafe/restricted function via parallel worker then we
> > error out. If so, isn't it possible to deal with built-in and
> > non-built-in functions in the same way?
>
> Yeah, one of the reasons I doubt this is a great idea is that you'd
> still have to fetch the pg_proc row for non-built-in functions.
>
> The obvious place to install such a check is fmgr_info(), which is
> fetching said row anyway for other purposes, so it's really hard to
> see how adding anything to FmgrBuiltin is going to help.

Thank you, fmgr_info() looks like the best place to do the parallel safety check.  Having a quick look at its callers,
Ididn't find any concerning place (of course, we can't be relieved until the regression test succeeds.)  Also, with
fmgr_info(),we don't have to find other places to add the check to deal with functions calls in execExpr.c and
execExprInterp.c. This is beautiful. 

But the current fmgr_info() does not check the parallel safety of builtin functions.  It does not have information to
dothat.  There are two options.  Which do you think is better?  I think 2. 

1) fmgr_info() reads pg_proc like for non-builtin functions
This ruins the effort for the fast path for builtin functions.  I can't imagine how large the adverse impact on
performancewould be, but I'm worried. 

The benefit is that ALTER FUNCTION on builtin functions takes effect.  But such operations are nonsensical, so I don't
thinkwe want to gain such a benefit. 


2) Gen_fmgrtab.pl adds a member for proparallel in FmgrBuiltin
But we don't want to enlarge FmgrBuiltin struct.  So, change the existing bool members strict and and retset into one
memberof type char, and represent the original values with some bit flags.  Then we use that member for proparallel as
well. (As a result, one byte is left for future use.) 


I think we'll try 2).  I'd be grateful if you could point out anything I need to be careful to.


Regards
Takayuki Tsunakawa





RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Hou, Zhijie/侯 志杰 <houzj.fnst@fujitsu.com>
> I agree that it's better to mark the function with correct parallel safety lable.
> Especially for the above functions which will be executed in parallel mode.
> It will be friendly to developer and user who is working on something related to
> parallel test.
>
> So, I attached the patch to mark the above functions parallel safe.

Thank you, the patch looks good.  Please register it with the next CF if not yet.


Regards
Takayuki Tsunakawa





RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
> Thank you, fmgr_info() looks like the best place to do the parallel safety check.
> Having a quick look at its callers, I didn't find any concerning place (of course,
> we can't be relieved until the regression test succeeds.)  Also, with fmgr_info(),
> we don't have to find other places to add the check to deal with functions calls
> in execExpr.c and execExprInterp.c.  This is beautiful.
>
> But the current fmgr_info() does not check the parallel safety of builtin
> functions.  It does not have information to do that.  There are two options.
> Which do you think is better?  I think 2.
>
> 1) fmgr_info() reads pg_proc like for non-builtin functions This ruins the effort
> for the fast path for builtin functions.  I can't imagine how large the adverse
> impact on performance would be, but I'm worried.

For approach 1): I think it could result in infinite recursion.

For example:
If we first access one built-in function A which have not been cached,
it need access the pg_proc, When accessing the pg_proc, it internally still need some built-in function B to scan.
At this time, if B is not cached , it still need to fetch function B's parallel flag by accessing the
pg_proc.proparallel.
Then it could result in infinite recursion.

So, I think we can consider the approach 2)

Best regards,
houzj



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Hou, Zhijie/侯 志杰 <houzj.fnst@fujitsu.com>
> For approach 1): I think it could result in infinite recursion.
>
> For example:
> If we first access one built-in function A which have not been cached,
> it need access the pg_proc, When accessing the pg_proc, it internally still need
> some built-in function B to scan.
> At this time, if B is not cached , it still need to fetch function B's parallel flag by
> accessing the pg_proc.proparallel.
> Then it could result in infinite recursion.
>
> So, I think we can consider the approach 2)

Hmm, that makes sense.  That's a problem structure similar to that of relcache.  Only one choice is left already,
unlessthere's another better idea. 



Regards
Takayuki Tsunakawa





Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Greg Nancarrow
Date:
On Wed, Apr 21, 2021 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> writes:
> > From: Tom Lane <tgl@sss.pgh.pa.us>
> >> No.  You'd have to be superuser anyway to do that, and we're not in the
> >> habit of trying to put training wheels on superusers.
>
> > Understood.  However, we may add the parallel safety member in fmgr_builtins[] in another thread for parallel
INSERTSELECT.  I'd appreciate your comment on this if you see any concern.
 
>
> [ raised eyebrow... ]  I find it very hard to understand why that would
> be necessary, or even a good idea.  Not least because there's no spare
> room there; you'd have to incur a substantial enlargement of the
> array to add another flag.  But also, that would indeed lock down
> the value of the parallel-safety flag, and that seems like a fairly
> bad idea.
>

I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
that would "lock down the value" of the strict flag, wouldn't it?

Regards,
Greg Nancarrow
Fujitsu Australia



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Wed, Apr 21, 2021 at 7:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > On Wed, Apr 21, 2021 at 8:12 AM tsunakawa.takay@fujitsu.com
> > <tsunakawa.takay@fujitsu.com> wrote:
> >> From: Tom Lane <tgl@sss.pgh.pa.us>
> >>> [ raised eyebrow... ]  I find it very hard to understand why that would
> >>> be necessary, or even a good idea.
>
> > IIUC, the idea here is to check for parallel safety of functions at
> > someplace in the code during function invocation so that if we execute
> > any parallel unsafe/restricted function via parallel worker then we
> > error out. If so, isn't it possible to deal with built-in and
> > non-built-in functions in the same way?
>
> Yeah, one of the reasons I doubt this is a great idea is that you'd
> still have to fetch the pg_proc row for non-built-in functions.
>

So, are you suggesting that we should fetch the pg_proc row for
built-in functions as well for this purpose? If not, then how to
identify parallel safety of built-in functions in fmgr_info()?

Another idea could be that we check parallel safety of built-in
functions based on some static information. As we know the func_ids of
non-parallel-safe built-in functions, we can have a function
fmgr_builtin_parallel_safe() which check if the func_id is not one
among the predefined func_ids of non-parallel-safe built-in functions,
it returns true, otherwise, false. Then, we can call this new function
in fmgr_info for built-in functions.

Thoughts?

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Tom Lane
Date:
Greg Nancarrow <gregn4422@gmail.com> writes:
> I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
> that would "lock down the value" of the strict flag, wouldn't it?

It does, but that's much more directly a property of the function's
C code than parallel-safety is.

            regards, tom lane



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Fri, Apr 23, 2021 at 9:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Nancarrow <gregn4422@gmail.com> writes:
> > I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
> > that would "lock down the value" of the strict flag, wouldn't it?
>
> It does, but that's much more directly a property of the function's
> C code than parallel-safety is.

I'm not sure I agree with that, but I think having the "strict" flag
in FmgrBuiltin isn't that nice either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 23, 2021 at 9:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Greg Nancarrow <gregn4422@gmail.com> writes:
>>> I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
>>> that would "lock down the value" of the strict flag, wouldn't it?

>> It does, but that's much more directly a property of the function's
>> C code than parallel-safety is.

> I'm not sure I agree with that, but I think having the "strict" flag
> in FmgrBuiltin isn't that nice either.

Yeah, if we could readily do without it, we probably would.  But the
function call mechanism itself is responsible for implementing strictness,
so it *has* to have that flag available.

            regards, tom lane



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Fri, Apr 23, 2021 at 6:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Greg Nancarrow <gregn4422@gmail.com> writes:
> > I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
> > that would "lock down the value" of the strict flag, wouldn't it?
>
> It does, but that's much more directly a property of the function's
> C code than parallel-safety is.
>

Isn't parallel safety also the C code property? I mean unless someone
changes the built-in function code, changing that property would be
dangerous. The other thing is even if a user is allowed to change one
function's property, how will they know which other functions are
called by that function and whether they are parallel-safe or not. For
example, say if the user wants to change the parallel safe property of
a built-in function brin_summarize_new_values, unless she changes its
code and the functions called by it like brin_summarize_range, it
would be dangerous. So, isn't it better to disallow changing parallel
safety for built-in functions?

Also, if the strict property of built-in functions is fixed
internally, why we allow users to change it and is that of any help?

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Greg Nancarrow
Date:
On Sat, Apr 24, 2021 at 12:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 23, 2021 at 6:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Greg Nancarrow <gregn4422@gmail.com> writes:
> > > I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
> > > that would "lock down the value" of the strict flag, wouldn't it?
> >
> > It does, but that's much more directly a property of the function's
> > C code than parallel-safety is.
> >
>
> Isn't parallel safety also the C code property? I mean unless someone
> changes the built-in function code, changing that property would be
> dangerous. The other thing is even if a user is allowed to change one
> function's property, how will they know which other functions are
> called by that function and whether they are parallel-safe or not. For
> example, say if the user wants to change the parallel safe property of
> a built-in function brin_summarize_new_values, unless she changes its
> code and the functions called by it like brin_summarize_range, it
> would be dangerous. So, isn't it better to disallow changing parallel
> safety for built-in functions?
>
> Also, if the strict property of built-in functions is fixed
> internally, why we allow users to change it and is that of any help?
>

Yes, I'd like to know too.
I think it would make more sense to disallow changing properties like
strict/parallel-safety on built-in functions.
Also, with sufficient privileges, a built-in function can be
redefined, yet the original function (whose info is cached in
FmgrBuiltins[], from build-time) is always invoked, not the
newly-defined version.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
> >>> I'm curious. The FmgrBuiltin struct includes the "strict" flag, so
> >>> that would "lock down the value" of the strict flag, wouldn't it?
>
> >> It does, but that's much more directly a property of the function's C
> >> code than parallel-safety is.
>
> > I'm not sure I agree with that, but I think having the "strict" flag
> > in FmgrBuiltin isn't that nice either.
>
> Yeah, if we could readily do without it, we probably would.  But the function
> call mechanism itself is responsible for implementing strictness, so it *has* to
> have that flag available.

So, If we do not want to lock down the parallel safety of built-in functions.
It seems we can try to fetch the proparallel from pg_proc for built-in function
in fmgr_info_cxt_security too. To avoid recursive safety check when fetching
proparallel from pg_proc, we can add a Global variable to mark is it in a recursive state.
And we skip safety check in a recursive state, In this approach, parallel safety
will not be locked, and there are no new members in FmgrBuiltin.

Attaching the patch about this approach [0001-approach-1].
Thoughts ?

I also attached another approach patch [0001-approach-2] about adding
parallel safety in FmgrBuiltin, because this approach seems faster and
we can combine some bool member into a bitflag to avoid enlarging the
FmgrBuiltin array, though this approach will lock down the parallel safety
of built-in function.

Best regards,
houzj

Attachment

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Fri, Apr 23, 2021 at 10:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Isn't parallel safety also the C code property?

In my opinion, yes.

> So, isn't it better to disallow changing parallel
> safety for built-in functions?

Superusers can do a lot of DML operations on the system catalogs that
are manifestly unsafe. I think we should really consider locking that
down somehow, but I doubt it makes sense to treat this case separately
from all the others. What do you think will happen if you change
proargtypes?

> Also, if the strict property of built-in functions is fixed
> internally, why we allow users to change it and is that of any help?

One real application of allowing these sorts of changes is letting
users correct things that were done wrong originally without waiting
for a new major release.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Wed, Apr 28, 2021 at 9:42 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> So, If we do not want to lock down the parallel safety of built-in functions.
> It seems we can try to fetch the proparallel from pg_proc for built-in function
> in fmgr_info_cxt_security too. To avoid recursive safety check when fetching
> proparallel from pg_proc, we can add a Global variable to mark is it in a recursive state.
> And we skip safety check in a recursive state, In this approach, parallel safety
> will not be locked, and there are no new members in FmgrBuiltin.
>
> Attaching the patch about this approach [0001-approach-1].
> Thoughts ?

This seems to be full of complicated if-tests that don't seem
necessary and aren't explained by the comments. Also, introducing a
system cache lookup here seems completely unacceptable from a
reliability point of view, and I bet it's not too good for
performance, either.

> I also attached another approach patch [0001-approach-2] about adding
> parallel safety in FmgrBuiltin, because this approach seems faster and
> we can combine some bool member into a bitflag to avoid enlarging the
> FmgrBuiltin array, though this approach will lock down the parallel safety
> of built-in function.

This doesn't seem like a good idea either.

I really don't understand what problem any of this is intended to
solve. Bharath's analysis above seems right on point to me. I think if
anybody is writing a patch that requires that this be changed in this
way, that person is probably doing something wrong.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Greg Nancarrow
Date:
On Wed, May 5, 2021 at 5:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 23, 2021 at 10:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Isn't parallel safety also the C code property?
>
> > Also, if the strict property of built-in functions is fixed
> > internally, why we allow users to change it and is that of any help?
>
> One real application of allowing these sorts of changes is letting
> users correct things that were done wrong originally without waiting
> for a new major release.
>

Problem is, for built-in functions, the changes are allowed, but for
some properties (like strict) the allowed changes don't actually take
effect (this is what Amit was referring to - so why allow those
changes?).
It's because some of the function properties are cached in
FmgrBuiltins[] (for a "fast-path" lookup for built-ins), according to
their state at build time (from pg_proc.dat), but ALTER FUNCTION is
just changing it in the system catalogs. Also, with sufficient
privileges, a built-in function can be redefined, yet the original
function (whose info is cached in FmgrBuiltins[]) is always invoked,
not the newly-defined version.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Tue, May 4, 2021 at 11:47 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> Problem is, for built-in functions, the changes are allowed, but for
> some properties (like strict) the allowed changes don't actually take
> effect (this is what Amit was referring to - so why allow those
> changes?).
> It's because some of the function properties are cached in
> FmgrBuiltins[] (for a "fast-path" lookup for built-ins), according to
> their state at build time (from pg_proc.dat), but ALTER FUNCTION is
> just changing it in the system catalogs. Also, with sufficient
> privileges, a built-in function can be redefined, yet the original
> function (whose info is cached in FmgrBuiltins[]) is always invoked,
> not the newly-defined version.

I agree. I think that's not ideal. I think we should consider putting
some more restrictions on updating system catalog changes, and I also
think that if we can get out of having strict need to be part of
FmgrBuiltins[] that would be good. But what I don't agree with is the
idea that since strict already has this problem, it's OK to do the
same thing with parallel-safety. That seems to me to be making a bad
situation worse, and I can't see what problem it actually solves.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com>
> On Tue, May 4, 2021 at 11:47 PM Greg Nancarrow <gregn4422@gmail.com>
> wrote:
> > Problem is, for built-in functions, the changes are allowed, but for
> > some properties (like strict) the allowed changes don't actually take
> > effect (this is what Amit was referring to - so why allow those
> > changes?).
> > It's because some of the function properties are cached in
> > FmgrBuiltins[] (for a "fast-path" lookup for built-ins), according to
> > their state at build time (from pg_proc.dat), but ALTER FUNCTION is
> > just changing it in the system catalogs. Also, with sufficient
> > privileges, a built-in function can be redefined, yet the original
> > function (whose info is cached in FmgrBuiltins[]) is always invoked,
> > not the newly-defined version.
> 
> I agree. I think that's not ideal. I think we should consider putting
> some more restrictions on updating system catalog changes, and I also
> think that if we can get out of having strict need to be part of
> FmgrBuiltins[] that would be good. But what I don't agree with is the
> idea that since strict already has this problem, it's OK to do the
> same thing with parallel-safety. That seems to me to be making a bad
> situation worse, and I can't see what problem it actually solves.

Let me divide the points:


(1) Is it better to get hardcoded function properties out of fmgr_builtins[]?
It's little worth doing so or thinking about that.  It's no business for users to change system objects, in this case
systemfunctions.
 

Also, hardcoding is a worthwhile strategy for good performance or other inevitable reasons.  Postgres is using it as in
thesystem catalog relcache below.
 

[relcache.c]
/*
 *      hardcoded tuple descriptors, contents generated by genbki.pl
 */
static const FormData_pg_attribute Desc_pg_class[Natts_pg_class] = {Schema_pg_class};
static const FormData_pg_attribute Desc_pg_attribute[Natts_pg_attribute] = {Schema_pg_attribute};
...


(2) Should it be disallowed for users to change system function properties with ALTER FUNCTION?
Maybe yes, but it's not an important issue for achieving parallel INSERT SELECT at the moment.  So, I think this can be
discussedin an independent separate thread.
 

As a reminder, Postgres have safeguards against modifying system objects as follows.

test=# drop table^C
test=# drop function pg_wal_replay_pause();
ERROR:  cannot drop function pg_wal_replay_pause() because it is required by the database system
test=# drop table pg_largeobject;
ERROR:  permission denied: "pg_largeobject" is a system catalog

OTOH, Postgres doesn't disallow changing the system table column values directly, such as UPDATE pg_proc SET ....  But
it'swarned in the manual that such operations are dangerous.  So, we don't have to care about it.
 

Chapter 52. System Catalogs
https://www.postgresql.org/docs/devel/catalogs.html

"You can drop and recreate the tables, add columns, insert and update values, and severely mess up your system that
way.Normally, one should not change the system catalogs by hand, there are normally SQL commands to do that. (For
example,CREATE DATABASE inserts a row into the pg_database catalog — and actually creates the database on disk.) There
aresome exceptions for particularly esoteric operations, but many of those have been made available as SQL commands
overtime, and so the need for direct manipulation of the system catalogs is ever decreasing."
 


(3) Why do we want to have parallel-safety in fmgr_builtins[]?
As proposed in this thread and/or "Parallel INSERT SELECT take 2", we thought of detecting parallel unsafe function
executionduring SQL statement execution, instead of imposing much overhead to check parallel safety during query
planning. Specifically, we add parallel safety check in fmgr_info() and/or FunctionCallInvoke().
 

(Alternatively, I think we can conclude that we assume parallel unsafe built-in functions won't be used in parallel
DML. In that case, we don't change FmgrBuiltin and we just skip the parallel safety check for built-in functions when
thefunction is called.  Would you buy this?)
 


Regards
Takayuki Tsunakawa



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Wed, May 5, 2021 at 7:39 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, May 4, 2021 at 11:47 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > Problem is, for built-in functions, the changes are allowed, but for
> > some properties (like strict) the allowed changes don't actually take
> > effect (this is what Amit was referring to - so why allow those
> > changes?).
> > It's because some of the function properties are cached in
> > FmgrBuiltins[] (for a "fast-path" lookup for built-ins), according to
> > their state at build time (from pg_proc.dat), but ALTER FUNCTION is
> > just changing it in the system catalogs. Also, with sufficient
> > privileges, a built-in function can be redefined, yet the original
> > function (whose info is cached in FmgrBuiltins[]) is always invoked,
> > not the newly-defined version.
>
> I agree. I think that's not ideal. I think we should consider putting
> some more restrictions on updating system catalog changes, and I also
> think that if we can get out of having strict need to be part of
> FmgrBuiltins[] that would be good. But what I don't agree with is the
> idea that since strict already has this problem, it's OK to do the
> same thing with parallel-safety. That seems to me to be making a bad
> situation worse, and I can't see what problem it actually solves.
>

The idea here is to check for parallel safety of functions at
someplace in the code during function invocation so that if we execute
any parallel unsafe/restricted function via parallel worker then we
error out. I think that is a good safety net especially if we can do
it with some simple check. Now, we already have pg_proc information in
fmgr_info_cxt_security for non-built-in functions, so we can check
that and error out if the unsafe function is invoked in parallel mode.
It has been observed that we were calling some unsafe functions in
parallel-mode in the regression tests which is caught by such a check.

I think here the main challenge is to do a similar check for built-in
functions and one of the ideas to do that was to extend FmgrBuiltins
to cache that information. I see why that idea is not good and maybe
we can see if there is some other place where we already fetch pg_proc
for built-in functions and can we have such a check at that place? If
that is not feasible then we can probably have such a check just for
non-built-in functions as that seems straightforward.

-- 
With Regards,
Amit Kapila.



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com>
> On Wed, Apr 28, 2021 at 9:42 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > So, If we do not want to lock down the parallel safety of built-in functions.
> > It seems we can try to fetch the proparallel from pg_proc for built-in function
> > in fmgr_info_cxt_security too. To avoid recursive safety check when fetching
> > proparallel from pg_proc, we can add a Global variable to mark is it in a
> recursive state.
> > And we skip safety check in a recursive state, In this approach, parallel safety
> > will not be locked, and there are no new members in FmgrBuiltin.
> >
> > Attaching the patch about this approach [0001-approach-1].
> > Thoughts ?
> 
> This seems to be full of complicated if-tests that don't seem
> necessary and aren't explained by the comments. Also, introducing a
> system cache lookup here seems completely unacceptable from a
> reliability point of view, and I bet it's not too good for
> performance, either.

Agreed.  Also, PG_TRY() would be relatively heavyweight here.  I'm inclined to avoid this approach.


> > I also attached another approach patch [0001-approach-2] about adding
> > parallel safety in FmgrBuiltin, because this approach seems faster and
> > we can combine some bool member into a bitflag to avoid enlarging the
> > FmgrBuiltin array, though this approach will lock down the parallel safety
> > of built-in function.
> 
> This doesn't seem like a good idea either.

This looks good to me.  What makes you think so?

That said, I actually think we want to avoid even this change.  That is, I'm wondering if we can skip the parallel
safetyof built-in functions.
 

Can anyone think of the need to check the parallel safety of built-in functions in the context of parallel INSERT
SELECT? The planner already checks (or can check) the parallel safety of the SELECT part with max_parallel_hazard().
Regardingthe INSERT part, we're trying to rely on the parallel safety of the target table that the user specified with
CREATE/ALTERTABLE.  I don't see where we need to check the parallel safety of uilt-in functions.
 


Regards
Takayuki Tsunakawa


Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Wed, May 5, 2021 at 10:54 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
> (1) Is it better to get hardcoded function properties out of fmgr_builtins[]?
> It's little worth doing so or thinking about that.  It's no business for users to change system objects, in this case
systemfunctions. 

I don't entirely agree with this. Whether or not users have any
business changing system functions, it's better to have one source of
truth than two. Now that being said, this is not a super-important
problem for us to go solve, and hard-coding a certain amount of stuff
is probably necessary to allow the system to bootstrap itself. So for
me it's one of those things that is in.a grey area: if someone showed
up with a patch to make it better, I'd be happy. But I probably
wouldn't spend much time on writing such a patch unless it solved some
other problem that I cared about.

> (3) Why do we want to have parallel-safety in fmgr_builtins[]?
> As proposed in this thread and/or "Parallel INSERT SELECT take 2", we thought of detecting parallel unsafe function
executionduring SQL statement execution, instead of imposing much overhead to check parallel safety during query
planning. Specifically, we add parallel safety check in fmgr_info() and/or FunctionCallInvoke(). 

I haven't read that thread, but I don't understand how that can work.
The reason we need to detect it at plan time is because we might need
to use a different plan. At execution time it's too late for that.

Also, it seems potentially quite expensive. A query may be planned
once and executed many times. Also, a single query execution may call
the same SQL function many times. I think we don't want to incur the
overhead of an extra syscache lookup every time anyone calls any
function. A very simple expression like a+b+c+d+e involves four
function calls, and + need not be a built-in, if the data type is
user-defined. And that might be happening for every row in a table
with millions of rows.

> (Alternatively, I think we can conclude that we assume parallel unsafe built-in functions won't be used in parallel
DML. In that case, we don't change FmgrBuiltin and we just skip the parallel safety check for built-in functions when
thefunction is called.  Would you buy this?) 

I don't really understand this idea. There's no such thing as parallel
DML, is there? There's just DML, which we must to decide whether can
be done in parallel or not based on, among other things, the
parallel-safety markings of the functions it contains. Maybe I am not
understanding you correctly, but it seems like you're suggesting that
in some cases we can just assume that the user hasn't done something
parallel-unsafe without making any attempt to check it. I don't think
I could support that.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Thu, May 6, 2021 at 3:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> The idea here is to check for parallel safety of functions at
> someplace in the code during function invocation so that if we execute
> any parallel unsafe/restricted function via parallel worker then we
> error out. I think that is a good safety net especially if we can do
> it with some simple check. Now, we already have pg_proc information in
> fmgr_info_cxt_security for non-built-in functions, so we can check
> that and error out if the unsafe function is invoked in parallel mode.
> It has been observed that we were calling some unsafe functions in
> parallel-mode in the regression tests which is caught by such a check.

I see your point, but I am not convinced. As I said to Tsunakawa-san,
doing the check here seems expensive. Also, I had the idea in mind
that parallel-safety should work like volatility. We don't check at
runtime whether a volatile function is being called in a context where
volatile functions are not supposed to be used. If for example you try
to call a volatile function directly from an index expression I
believe you will get an error. But if the index expression calls an
immutable function and then that function internally calls something
volatile, you don't get an error. Now it might not be a good idea: you
could end up with a broken index. But that's your fault for
mislabeling the function you used.

Sometimes this is actually quite useful. You might know that, while
the function is in general volatile, it is immutable in the particular
way that you are using it. Or, perhaps, you are using the volatile
function incidentally and it doesn't affect the output of your
function at all. Or, maybe you actually want to build an index that
might break, and then it's up to you to rebuild the index if and when
that is required. Users do this kind of thing all the time, I think,
and would be unhappy if we started checking it more rigorously than we
do today.

Now, I don't see why the same idea can't or shouldn't apply to
parallel-safety. If you call a parallel-unsafe function in a parallel
context, it's pretty likely that you are going to get an error, and so
you might not want to do it. If the function is written in C, it could
even cause horrible things to happen so that you crash the whole
backend or something, but I tried to set things up so that for
built-in functions you'll just get an error. But on the other hand,
maybe the parallel-unsafe function you are calling is not
parallel-unsafe in all cases. If you want to create a wrapper function
that is labelled parallel-safe and try to make that it only calls the
parallel-unsafe function in the cases where there's no safety problem,
that's up to you!

It's possible that I had the wrong idea here, so maybe the question
deserves more thought, but I wanted to explain what my thought process
was.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Greg Nancarrow
Date:
On Thu, May 6, 2021 at 5:26 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> Can anyone think of the need to check the parallel safety of built-in functions in the context of parallel INSERT
SELECT? The planner already checks (or can check) the parallel safety of the SELECT part with max_parallel_hazard().
Regardingthe INSERT part, we're trying to rely on the parallel safety of the target table that the user specified with
CREATE/ALTERTABLE.  I don't see where we need to check the parallel safety of uilt-in functions. 
>

Yes, I certainly can think of a reason to do this.
The idea is, for the approach being discussed, is to allow the user to
declare parallel-safety on a table, but then to catch any possible
violations of this at runtime (as opposed to adding additional
parallel-safety checks at planning time).
So for INSERT with parallel SELECT for example (which runs in
parallel-mode), then the execution of index expressions,
column-default expressions, check constraints etc. may end up invoking
functions (built-in or otherwise) that are NOT parallel-safe - so we
could choose to error-out in this case when these violations are
detected.
As far as I can see, this checking of function parallel-safety can be
done with little overhead to the current code - it already gets proc
information from the system cache for non-built-in-functions, and for
built-in functions it could store the parallel-safety status in
FmgrBuiltin and simply get it from there (I don't think we should be
allowing changes to built-in function properties - currently it is
allowed, but it doesn't work properly).
The other option is to just blindly trust the parallel-safety
declaration on tables and whatever happens at runtime happens.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Thu, May 6, 2021 at 4:35 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, May 6, 2021 at 3:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > The idea here is to check for parallel safety of functions at
> > someplace in the code during function invocation so that if we execute
> > any parallel unsafe/restricted function via parallel worker then we
> > error out. I think that is a good safety net especially if we can do
> > it with some simple check. Now, we already have pg_proc information in
> > fmgr_info_cxt_security for non-built-in functions, so we can check
> > that and error out if the unsafe function is invoked in parallel mode.
> > It has been observed that we were calling some unsafe functions in
> > parallel-mode in the regression tests which is caught by such a check.
>
> I see your point, but I am not convinced. As I said to Tsunakawa-san,
> doing the check here seems expensive.
>

If I read your email correctly then you are saying it is expensive
based on the idea that we need to perform extra syscache lookup but
actually for non-built-in functions, we already have parallel-safety
information so such a check should not incur a significant cost.

> Also, I had the idea in mind
> that parallel-safety should work like volatility. We don't check at
> runtime whether a volatile function is being called in a context where
> volatile functions are not supposed to be used. If for example you try
> to call a volatile function directly from an index expression I
> believe you will get an error. But if the index expression calls an
> immutable function and then that function internally calls something
> volatile, you don't get an error. Now it might not be a good idea: you
> could end up with a broken index. But that's your fault for
> mislabeling the function you used.
>
> Sometimes this is actually quite useful. You might know that, while
> the function is in general volatile, it is immutable in the particular
> way that you are using it. Or, perhaps, you are using the volatile
> function incidentally and it doesn't affect the output of your
> function at all. Or, maybe you actually want to build an index that
> might break, and then it's up to you to rebuild the index if and when
> that is required. Users do this kind of thing all the time, I think,
> and would be unhappy if we started checking it more rigorously than we
> do today.
>
> Now, I don't see why the same idea can't or shouldn't apply to
> parallel-safety. If you call a parallel-unsafe function in a parallel
> context, it's pretty likely that you are going to get an error, and so
> you might not want to do it. If the function is written in C, it could
> even cause horrible things to happen so that you crash the whole
> backend or something, but I tried to set things up so that for
> built-in functions you'll just get an error. But on the other hand,
> maybe the parallel-unsafe function you are calling is not
> parallel-unsafe in all cases. If you want to create a wrapper function
> that is labelled parallel-safe and try to make that it only calls the
> parallel-unsafe function in the cases where there's no safety problem,
> that's up to you!
>

I think it is difficult to say for what purpose parallel-unsafe
function got called in parallel context so if we give an error in
cases where otherwise it could lead to a crash or caused other
horrible things, users will probably appreciate us. OTOH, if the
parallel-safety labeling is wrong (parallel-safe function is marked
parallel-unsafe) and we gave an error in such a case, the user can
always change the parallel-safety attribute by using Alter Function.
Now, if adding such a check is costly or needs some major re-design
then probably it might not be worth whereas I don't think that is the
case for non-built-in function invocation.

-- 
With Regards,
Amit Kapila.



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com>
> On Wed, May 5, 2021 at 10:54 PM tsunakawa.takay@fujitsu.com
> <tsunakawa.takay@fujitsu.com> wrote:
> > As proposed in this thread and/or "Parallel INSERT SELECT take 2", we
> thought of detecting parallel unsafe function execution during SQL statement
> execution, instead of imposing much overhead to check parallel safety during
> query planning.  Specifically, we add parallel safety check in fmgr_info()
> and/or FunctionCallInvoke().
> 
> I haven't read that thread, but I don't understand how that can work.
> The reason we need to detect it at plan time is because we might need
> to use a different plan. At execution time it's too late for that.

(I forgot to say this in my previous email.  Robert-san, thank you very much for taking time to look at this and giving
feedback. It was sad that we had to revert our parallel INSERT SELECT for redesign at the very end of the last CF.  We
needadvice and suggestions from knowledgeable and thoughtful people like Tom-san, Andres-san and you in early stages to
notrepeat the tragedy.)
 

I'd really like you to have a look at the first mail in [1], and to get your feedback like "this part should be like
...instead" and "this part would probably work, I think."  Without feedback from leading developers, I'm somewhat at a
lossif and how we can proceed with the proposed approach.
 

To put it shortly, we found that it can take non-negligible time for the planner to check the parallel safety of the
targettable of INSERT SELECT when it has many (hundreds or thousands of) partitions.  The check also added much
complicatedcode, too.  So, we got inclined to take Tom-san's suggestion -- let the user specify the parallel safety of
thetarget table with CREATE/ALTER TABLE and the planner just decides a query plan based on it.  Caching the results of
parallelsafety checks in relcache or a new shared hash table didn't seem to work well to me, or it should be beyond my
brainat least.
 

We may think that it's okay to just believe the user-specified parallel safety.  But I thought we could step up and do
ourbest to check the parallel safety during statement execution, if it's not very invasive in terms of performance and
codecomplexity.  The aforementioned idea is that if the parallel processes find the called functions parallel unsafe,
theyerror out.  All ancillary objects of the target table, data types, constraints, indexes, triggers, etc., come down
tosome UDF, so it should be enough to check the parallel safety when the UDF is called.
 


> Also, it seems potentially quite expensive. A query may be planned
> once and executed many times. Also, a single query execution may call
> the same SQL function many times. I think we don't want to incur the
> overhead of an extra syscache lookup every time anyone calls any
> function. A very simple expression like a+b+c+d+e involves four
> function calls, and + need not be a built-in, if the data type is
> user-defined. And that might be happening for every row in a table
> with millions of rows.

We (optimistically) expect that the overhead won't be serious, because the parallel safety information is already at
handin the FmgrInfo struct when the function is called.  We don't have to look up the syscache every time the function
iscalled.
 

Of course, adding even a single if statement may lead to a disaster in a critical path, so we need to assess the
performance. I'd also appreciate if you could suggest some good workload we should experiment in the thread above.
 



[1]
Parallel INSERT SELECT take 2
https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com


Regards
Takayuki Tsunakawa


RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
> > Sometimes this is actually quite useful. You might know that, while
> > the function is in general volatile, it is immutable in the particular
> > way that you are using it. Or, perhaps, you are using the volatile
> > function incidentally and it doesn't affect the output of your
> > function at all. Or, maybe you actually want to build an index that
> > might break, and then it's up to you to rebuild the index if and when
> > that is required. Users do this kind of thing all the time, I think,
> > and would be unhappy if we started checking it more rigorously than we
> > do today.
> >
> > Now, I don't see why the same idea can't or shouldn't apply to
> > parallel-safety. If you call a parallel-unsafe function in a parallel
> > context, it's pretty likely that you are going to get an error, and so
> > you might not want to do it. If the function is written in C, it could
> > even cause horrible things to happen so that you crash the whole
> > backend or something, but I tried to set things up so that for
> > built-in functions you'll just get an error. But on the other hand,
> > maybe the parallel-unsafe function you are calling is not
> > parallel-unsafe in all cases. If you want to create a wrapper function
> > that is labelled parallel-safe and try to make that it only calls the
> > parallel-unsafe function in the cases where there's no safety problem,
> > that's up to you!
> >
> 
> I think it is difficult to say for what purpose parallel-unsafe function got called in
> parallel context so if we give an error in cases where otherwise it could lead to
> a crash or caused other horrible things, users will probably appreciate us.
> OTOH, if the parallel-safety labeling is wrong (parallel-safe function is marked
> parallel-unsafe) and we gave an error in such a case, the user can always change
> the parallel-safety attribute by using Alter Function.
> Now, if adding such a check is costly or needs some major re-design then
> probably it might not be worth whereas I don't think that is the case for
> non-built-in function invocation.

Temporarily, Just in case someone want to take a look at the patch for the safety check.
I splited the patch into 0001(parallel safety check for user define function), 0003(parallel safety check for builtin
function)
and the fix for testcases.

IMO, With such a check to give an error when detecting parallel unsafe function in parallel mode,
it will be easier for users to discover potential threats(parallel unsafe function) in parallel mode.

I think users is likely to invoke parallel unsafe function inner a parallel safe function unintentionally.
Such a check can help they detect the problem easier.

Although, the strict check limits some usages(intentionally wrapper function) like Robert-san said.
To mitigate the effect of the limit,  I was thinking can we do the safety check conditionally, such as only check the
topfunction invoke and/or
 
introduce a guc option to control whether do the strict parallel safety check?  Thoughts ?

Best regards,
houzj


Attachment

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Tue, May 11, 2021 at 12:28 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> Temporarily, Just in case someone want to take a look at the patch for the safety check.
>

I am not sure still there is a consensus on which cases exactly need
to be dealt with. Let me try to summarize the discussion and see if
that helps. As per my understanding, the main reasons for this work
are:
a. Ensure parallel-unsafe functions don't get executed in parallel
mode. We do have checks to ensure that we don't select parallel-mode
for most cases where the parallel-unsafe function is used but we don't
have checks for input/output funcs, aggregate funcs, etc. This
proposal is to detect such cases during function invocation and return
an error. I think if, for some cases like aggregate or another type of
functions we allow selecting parallelism relying on the user, it is
not a bad idea to detect and return an error if some parallel-unsafe
function is executed in parallel mode.
b. Detect wrong parallel-safety markings. Say the user has declared
some function as parallel-safe but it invokes another parallel-unsafe
function.
c. The other motive is that this work can help us to enable
parallelism for inserts (and maybe update/delete in the future). As
being discussed in another thread [1], we are considering allowing
parallel inserts on a table based on user input and then at runtime
detect if the insert is invoking any parallel-unsafe expression. The
idea is that the user will be able to specify whether a write
operation is allowed in parallel on a specified relation and we allow
to select parallelism for such writes based on that and do the checks
for Select as we are doing now. There are other options like determine
the parallel-safety of writes in a planner and then only allow
parallelism but those all seem costly. Now, I think it is not
compulsory to have such checks for this particular reason as we are
relying on user input but it will be good if we have it.

I think the last purpose (c) is still debatable even though we
couldn't come up with anything better till now but even if leave that
aside for now, I think the other reasons are good enough to have some
form of checks.

Now, the proposal being discussed is to add a parallel-safety check in
fmgr_info which seems to be invoked during all function executions. We
need to have access to proparallel attribute of the function to check
the parallel-safety and that is readily available in fmgr_info for
non-built-in functions because we already get the pg_proc information
from sys cache. So, I guess there is no harm in checking it when the
information is readily available. However, for built-in functions that
information is not readily available as we get required function
information from FmgrBuiltin (which doesn't have parallel-safety
information). For built-in functions, the following options have been
discussed:
a. Extend FmgrBuiltin without increasing its size to include parallel
information.
b. Enquire pg_proc cache to get the information. Accessing this for
each invocation of builtin could be costly. We can probably incur this
cost only when built-in is invoked in parallel-mode.
c. Don't add check for builtins.

I think if we can't think of any other better way to have checks for
builtins and don't like any of (a) or (b) then there is no harm in
(c). This will at least allow us to have parallel-safety check for
user-defined functions.

Thoughts?

[1] -
https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Fri, Jun 4, 2021 at 6:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Thoughts?

As far as I can see, trying to error out at function call time if the
function is parallel-safe doesn't fix any problem we have, and just
makes the design of this part of the system less consistent with what
we've done elsewhere. For example, if you create a stable function
that internally calls a volatile function, you don't get an error. You
can use your stable function in an index definition if you wish. That
may break, but if so, that's your problem. Also, when it breaks, it
probably won't blow up the entire world; you'll just have a messed-up
index. Currently, the parallel-safety stuff works the same way. If we
notice that something is marked parallel-unsafe, we'll skip
parallelism. But you can lie to us and claim that things are safe when
they're not, and if you do, it may break, but that's your problem.
Mostly likely your query will just error out, and there will be no
worse consequences than that, though if your parallel-unsafe function
is written in C, it could do horrible things like crash, which is
unavoidable because C code can do anything.

Now, the reason for all of this work, as I understand it, is because
we want to enable parallel inserts, and the problem there is that a
parallel insert could involve a lot of different things: it might need
to compute expressions, or fire triggers, or check constraints, and
any of those things could be parallel-unsafe. If we enable parallelism
and then find out that we need to do to one of those things, we have a
problem. Something probably will error out. The thing is, with this
proposal, that issue is not solved. Something will definitely error
out. You'll probably get the error in a different place, but nobody
fires off an INSERT hoping to get one error message rather than
another. What they want is for it to work. So I'm kind of confused how
we ended up going in this direction which seems to me at least to be a
tangent from the real issue, and somewhat at odds with the way the
rest of PostgreSQL is designed.

It seems to me that we could simply add a flag to each relation saying
whether or not we think that INSERT operations - or perhaps DML
operations generally - are believed to be parallel-safe for that
relation. Like the marking on functions, it would be the user's
responsibility to get that marking correct. If they don't, they might
call a parallel-unsafe function in parallel mode, and that will
probably error out. But that's no worse than what we already have in
existing cases, so I don't see why it requires doing what's proposed
here first. Now, it does have the advantage of being not very
convenient for users, who, I'm sure, would prefer that the system
figure out for them automatically whether or not parallel inserts are
likely to be safe, rather than making them declare it, especially
since presumably the default declaration would have to be "unsafe," as
it is for functions. But I don't have a better idea right now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Mon, Jun 7, 2021 at 7:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jun 4, 2021 at 6:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Thoughts?
>
> As far as I can see, trying to error out at function call time if the
> function is parallel-safe doesn't fix any problem we have, and just
> makes the design of this part of the system less consistent with what
> we've done elsewhere. For example, if you create a stable function
> that internally calls a volatile function, you don't get an error. You
> can use your stable function in an index definition if you wish. That
> may break, but if so, that's your problem. Also, when it breaks, it
> probably won't blow up the entire world; you'll just have a messed-up
> index. Currently, the parallel-safety stuff works the same way. If we
> notice that something is marked parallel-unsafe, we'll skip
> parallelism.
>

This is not true in all cases which is one of the reasons for this
thread. For example, we don't skip parallelism when I/O functions are
parallel-unsafe as is shown in the following case:

postgres=# CREATE FUNCTION text_w_default_in(cstring)   RETURNS
text_w_default   AS 'textin'   LANGUAGE internal STRICT IMMUTABLE;
NOTICE:  type "text_w_default" is not yet defined
DETAIL:  Creating a shell type definition.
CREATE FUNCTION

postgres=# CREATE FUNCTION text_w_default_out(text_w_default)
RETURNS cstring   AS 'textout'   LANGUAGE internal STRICT IMMUTABLE;
NOTICE:  argument type text_w_default is only a shell
CREATE FUNCTION
postgres=# CREATE TYPE text_w_default (   internallength = variable,
input = text_w_default_in,   output = text_w_default_out,   alignment
= int4,   default = 'zippo');
CREATE TYPE
postgres=# CREATE TABLE default_test (f1 text_w_default, f2 int);
CREATE TABLE
postgres=# INSERT INTO default_test DEFAULT VALUES;
INSERT 0 1
postgres=# SELECT * FROM default_test;
ERROR:  parallel-safety execution violation of function "text_w_default_out" (u)

Note the error is raised after applying the patch, without the patch,
the above won't show any error (error message could be improved here).
Such cases can lead to unpredictable behavior without a patch because
we won't be able to detect the execution of parallel-unsafe functions.
There are similar examples from regression tests. Now, one way to deal
with similar cases could be that we document them and say we don't
consider parallel-safety in such cases and the other way is to detect
such cases and error out. Yet another way could be that we somehow try
to check these cases as well before enabling parallelism but I thought
these cases fall in the similar category as aggregate's support
functions.

> But you can lie to us and claim that things are safe when
> they're not, and if you do, it may break, but that's your problem.
> Mostly likely your query will just error out, and there will be no
> worse consequences than that, though if your parallel-unsafe function
> is written in C, it could do horrible things like crash, which is
> unavoidable because C code can do anything.
>

That is true but I was worried for cases where users didn't lie to us
but we still allowed those to choose parallelism.

> Now, the reason for all of this work, as I understand it, is because
> we want to enable parallel inserts, and the problem there is that a
> parallel insert could involve a lot of different things: it might need
> to compute expressions, or fire triggers, or check constraints, and
> any of those things could be parallel-unsafe. If we enable parallelism
> and then find out that we need to do to one of those things, we have a
> problem. Something probably will error out. The thing is, with this
> proposal, that issue is not solved. Something will definitely error
> out. You'll probably get the error in a different place, but nobody
> fires off an INSERT hoping to get one error message rather than
> another. What they want is for it to work. So I'm kind of confused how
> we ended up going in this direction which seems to me at least to be a
> tangent from the real issue, and somewhat at odds with the way the
> rest of PostgreSQL is designed.
>
> It seems to me that we could simply add a flag to each relation saying
> whether or not we think that INSERT operations - or perhaps DML
> operations generally - are believed to be parallel-safe for that
> relation.
>

This is exactly the direction we are trying to pursue. The proposal
[1] has semantics like:
CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE };
    ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE };

This property is recorded in pg_class's relparallel column as 'u',
'r', or 's', just like pg_proc's proparallel. The default is UNSAFE.
This might require some bike-shedding to decide how exactly we want to
expose it to the user but I think it is on the lines of what you have
described here.

> Like the marking on functions, it would be the user's
> responsibility to get that marking correct. If they don't, they might
> call a parallel-unsafe function in parallel mode, and that will
> probably error out. But that's no worse than what we already have in
> existing cases, so I don't see why it requires doing what's proposed
> here first.
>

I agree it is not necessarily required if we give the responsibility
to the user but this might give a better user experience, OTOH,
without this as well, as you said it won't be any worse than current
behavior. But that was not the sole motivation of this proposal as
explained above in the email by giving example.

> Now, it does have the advantage of being not very
> convenient for users, who, I'm sure, would prefer that the system
> figure out for them automatically whether or not parallel inserts are
> likely to be safe, rather than making them declare it, especially
> since presumably the default declaration would have to be "unsafe," as
> it is for functions.
>

To improve the user experience in this regard, the proposal [1]
provides a function pg_get_parallel_safety(oid) using which users can
determine whether it is safe to enable parallelism. Surely, after the
user has checked with that function, one can add some unsafe
constraints to the table by altering the table but it will still be an
aid to enable parallelism on a relation.

[1] -
https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Mon, Jun 7, 2021 at 11:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Note the error is raised after applying the patch, without the patch,
> the above won't show any error (error message could be improved here).
> Such cases can lead to unpredictable behavior without a patch because
> we won't be able to detect the execution of parallel-unsafe functions.
> There are similar examples from regression tests. Now, one way to deal
> with similar cases could be that we document them and say we don't
> consider parallel-safety in such cases and the other way is to detect
> such cases and error out. Yet another way could be that we somehow try
> to check these cases as well before enabling parallelism but I thought
> these cases fall in the similar category as aggregate's support
> functions.

I'm not very excited about the idea of checking type input and type
output functions. It's hard to imagine someone wanting to do something
parallel-unsafe in such a function, unless they're just trying to
prove a point. So I don't think checking it would be a good investment
of CPU cycles. If we do anything at all, I'd vote for just documenting
that such functions should be parallel-safe and that their
parallel-safety marks are not checked when they are used as type
input/output functions. Perhaps we ought to document the same thing
with regard to opclass support functions, another place where it's
hard to imagine a realistic use case for doing something
parallel-unsafe.

In the case of aggregates, I see the issues slightly differently. I
don't know that it's super-likely that someone would want to create a
parallel-unsafe aggregate function, but I think there should be a way
to do it, just in case. However, if somebody wants that, they can just
mark the aggregate itself unsafe. There's no benefit for the user to
marking the aggregate safe and the support functions unsafe and hoping
that the system figures it out somehow.

In my opinion, you're basically taking too pure a view of this. We're
not trying to create a system that does such a good job checking
parallel safety markings that nobody can possibly find a thing that
isn't checked no matter how hard they poke around the dark corners of
the system. Or at least we shouldn't be trying to do that. We should
be trying to create a system that works well in practice, and gives
people the flexibility to easily avoid parallelism when they have a
query that is parallel-unsafe, while still getting the benefit of
parallelism the rest of the time.

I don't know what all the cases you've uncovered are, and maybe
there's something in there that I'd be more excited about changing if
I knew what it was, but the particular problems you're mentioning here
seem more theoretical than real to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, June 8, 2021 10:51 PM Robert Haas <robertmhaas@gmail.com>
> On Mon, Jun 7, 2021 at 11:33 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > Note the error is raised after applying the patch, without the patch,
> > the above won't show any error (error message could be improved here).
> > Such cases can lead to unpredictable behavior without a patch because
> > we won't be able to detect the execution of parallel-unsafe functions.
> > There are similar examples from regression tests. Now, one way to deal
> > with similar cases could be that we document them and say we don't
> > consider parallel-safety in such cases and the other way is to detect
> > such cases and error out. Yet another way could be that we somehow try
> > to check these cases as well before enabling parallelism but I thought
> > these cases fall in the similar category as aggregate's support
> > functions.
> 
> I'm not very excited about the idea of checking type input and type output
> functions. It's hard to imagine someone wanting to do something
> parallel-unsafe in such a function, unless they're just trying to prove a point. So
> I don't think checking it would be a good investment of CPU cycles. If we do
> anything at all, I'd vote for just documenting that such functions should be
> parallel-safe and that their parallel-safety marks are not checked when they are
> used as type input/output functions. Perhaps we ought to document the same
> thing with regard to opclass support functions, another place where it's hard to
> imagine a realistic use case for doing something parallel-unsafe.
> 
> In the case of aggregates, I see the issues slightly differently. I don't know that
> it's super-likely that someone would want to create a parallel-unsafe
> aggregate function, but I think there should be a way to do it, just in case.
> However, if somebody wants that, they can just mark the aggregate itself
> unsafe. There's no benefit for the user to marking the aggregate safe and the
> support functions unsafe and hoping that the system figures it out somehow.
> 
> In my opinion, you're basically taking too pure a view of this. We're not trying to
> create a system that does such a good job checking parallel safety markings
> that nobody can possibly find a thing that isn't checked no matter how hard
> they poke around the dark corners of the system. Or at least we shouldn't be
> trying to do that. We should be trying to create a system that works well in
> practice, and gives people the flexibility to easily avoid parallelism when they
> have a query that is parallel-unsafe, while still getting the benefit of parallelism
> the rest of the time.
> 
> I don't know what all the cases you've uncovered are, and maybe there's
> something in there that I'd be more excited about changing if I knew what it
> was, but the particular problems you're mentioning here seem more
> theoretical than real to me.

I think another case that parallel unsafe function could be invoked in parallel mode is
the TEXT SEARCH TEMPLATE's init_function or lexize_function. Because currently, 
the planner does not check the safety of these function. Please see the example below[1]

I am not sure will user use parallel unsafe function in init_function or lexize_function,
but if user does, it could cause unexpected result.

Does it make sense to add some check for init_function or lexize_function
or document this together with type input/output and opclass support functions ?

[1]----------------------------EXAMPLE------------------------------------
CREATE FUNCTION dsnowball_init(INTERNAL)
RETURNS INTERNAL AS '$libdir/dict_snowball', 'dsnowball_init'
LANGUAGE C STRICT;

CREATE FUNCTION dsnowball_lexize(INTERNAL, INTERNAL, INTERNAL, INTERNAL)
RETURNS INTERNAL AS '$libdir/dict_snowball', 'dsnowball_lexize'
LANGUAGE C STRICT;

CREATE TEXT SEARCH TEMPLATE snowball
(INIT = dsnowball_init,
LEXIZE = dsnowball_lexize);

COMMENT ON TEXT SEARCH TEMPLATE snowball IS 'snowball stemmer';

create table pendtest (ts tsvector);
create index pendtest_idx on pendtest using gin(ts);
insert into pendtest select (to_tsvector('Lore ipsum')) from generate_series(1,10000000,1);
analyze;

set enable_bitmapscan = off;

postgres=# explain select * from pendtest where to_tsquery('345&qwerty') @@ ts;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Gather  (cost=1000.00..1168292.86 rows=250 width=31)
   Workers Planned: 2
   ->  Parallel Seq Scan on pendtest  (cost=0.00..1167267.86 rows=104 width=31)
         Filter: (to_tsquery('345&qwerty'::text) @@ ts)

-- In the example above, dsnowball_init() and dsnowball_lexize() will be executed in parallel mode.

----------------------------EXAMPLE------------------------------------

Best regards,
houzj


Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Tom Lane
Date:
"houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com> writes:
> On Tuesday, June 8, 2021 10:51 PM Robert Haas <robertmhaas@gmail.com>
> wrote:
>> In my opinion, you're basically taking too pure a view of this. We're
>> not trying to create a system that does such a good job checking
>> parallel safety markings that nobody can possibly find a thing that
>> isn't checked no matter how hard they poke around the dark corners of
>> the system. Or at least we shouldn't be trying to do that.

> I think another case that parallel unsafe function could be invoked in
> parallel mode is the TEXT SEARCH TEMPLATE's init_function or
> lexize_function.

Another point worth making in this connection is what I cited earlier
today in ba2c6d6ce:

: ...  We could imagine prohibiting SCROLL when
: the query contains volatile functions, but that would be
: expensive to enforce.  Moreover, it could break applications
: that work just fine, if they have functions that are in fact
: stable but the user neglected to mark them so.  So settle for
: documenting the hazard.

If you break an application that used to work, because the
developer was careless about marking a function PARALLEL SAFE
even though it actually is, I do not think you have made any
friends or improved anyone's life.  In fact, you could easily
make things far worse, by encouraging people to mark things
PARALLEL SAFE that are not.  (We just had a thread about somebody
marking a function immutable because they wanted effect X of that,
and then whining because they also got effect Y.)

There are specific cases where there's a good reason to worry.
For example, if we assume blindly that domain_in() is parallel
safe, we will have cause to regret that.  But I don't find that
to be a reason why we need to lock down everything everywhere.
We need to understand the tradeoffs involved in what we check,
and apply checks that are likely to avoid problems, while not
being too nanny-ish.

            regards, tom lane



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Wed, Jun 9, 2021 at 2:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There are specific cases where there's a good reason to worry.
> For example, if we assume blindly that domain_in() is parallel
> safe, we will have cause to regret that.  But I don't find that
> to be a reason why we need to lock down everything everywhere.
> We need to understand the tradeoffs involved in what we check,
> and apply checks that are likely to avoid problems, while not
> being too nanny-ish.

Yeah, that's exactly how I feel about it, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Wed, Jun 9, 2021 at 9:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jun 9, 2021 at 2:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > There are specific cases where there's a good reason to worry.
> > For example, if we assume blindly that domain_in() is parallel
> > safe, we will have cause to regret that.  But I don't find that
> > to be a reason why we need to lock down everything everywhere.
> > We need to understand the tradeoffs involved in what we check,
> > and apply checks that are likely to avoid problems, while not
> > being too nanny-ish.
>
> Yeah, that's exactly how I feel about it, too.
>

Fair enough. So, I think there is a consensus to drop this patch and
if one wants then we can document these cases. Also, we don't want it
to enable parallelism for Inserts where we are trying to pursue the
approach to have a flag in pg_class which allows users to specify
whether writes are allowed on a specified relation.

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Thu, Jun 10, 2021 at 12:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Fair enough. So, I think there is a consensus to drop this patch and
> if one wants then we can document these cases. Also, we don't want it
> to enable parallelism for Inserts where we are trying to pursue the
> approach to have a flag in pg_class which allows users to specify
> whether writes are allowed on a specified relation.

+1. The question that's still on my mind a little bit is whether
there's a reasonable alternative to forcing users to set a flag
manually. It seems less convenient than having to do the same thing
for a function, because most users probably only create functions
occasionally, but creating tables seems like it's likely to be a more
common operation. Plus, a function is basically a program, so it sort
of feels reasonable that you might need to give the system some hints
about what the program does, but that doesn't apply to a table.

Now, if we forget about partitioned tables here for a moment, I don't
really see why we couldn't do this computation based on the relcache
entry, and then just cache the flag there? I think anything that would
change the state for a plain old table would also cause some
invalidation that we could notice. And I don't think that the cost of
walking over triggers, constraints, etc. and computing the value we
need on demand would be exorbitant.

For a partitioned table, things are a lot more difficult. For one
thing, the cost of computation can be a lot higher; there might be a
thousand or more partitions. For another thing, computing the value
could have scary side effects, like opening all the partitions, which
would also mean taking locks on them and building expensive relcache
entries. For a third thing, we'd have no way of knowing whether the
value was still current, because an event that produces an
invalidation for a partition doesn't necessarily produce any
invalidation for the partitioned table.

So one idea is maybe we only need an explicit flag for partitioned
tables, and regular tables we can just work it out automatically.
Another idea is maybe we try to solve the problems somehow so that it
can also work with partitioned tables. I don't really have a great
idea right at the moment, but maybe it's worth devoting some more
thought to the problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Thu, Jun 10, 2021 at 10:59 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jun 10, 2021 at 12:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Fair enough. So, I think there is a consensus to drop this patch and
> > if one wants then we can document these cases. Also, we don't want it
> > to enable parallelism for Inserts where we are trying to pursue the
> > approach to have a flag in pg_class which allows users to specify
> > whether writes are allowed on a specified relation.
>
> +1. The question that's still on my mind a little bit is whether
> there's a reasonable alternative to forcing users to set a flag
> manually. It seems less convenient than having to do the same thing
> for a function, because most users probably only create functions
> occasionally, but creating tables seems like it's likely to be a more
> common operation. Plus, a function is basically a program, so it sort
> of feels reasonable that you might need to give the system some hints
> about what the program does, but that doesn't apply to a table.
>
> Now, if we forget about partitioned tables here for a moment, I don't
> really see why we couldn't do this computation based on the relcache
> entry, and then just cache the flag there?
>

Do we invalidate relcache entry if someone changes say trigger or some
index AM function property via Alter Function (in our case from safe
to unsafe or vice-versa)? Tsunakawa-San has mentioned this as the
reason in his email [1] why we can't rely on caching this property in
relcache entry. I also don't see anything in AlterFunction which would
suggest that we invalidate the relation with which the function might
be associated via trigger.

The other idea in this regard was to validate the parallel safety
during DDL instead of relying completely on the user but that also
seems to have similar hazards as pointed by Tom in his email [2].

I think it would be good if there is a way we can do this without
asking for user input but if not then we can try to provide
parallel-safety info about relation which will slightly ease the
user's job. Such a function would check relation (and its partitions)
to see if there exists any parallel-unsafe clause and accordingly
return the same to the user. Now, again if the user changes the
parallel-safe property later we won't be able to automatically reflect
the same for rel.

[1] -
https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/1030301.1616560249%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Fri, Jun 11, 2021 at 12:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Do we invalidate relcache entry if someone changes say trigger or some
> index AM function property via Alter Function (in our case from safe
> to unsafe or vice-versa)? Tsunakawa-San has mentioned this as the
> reason in his email [1] why we can't rely on caching this property in
> relcache entry. I also don't see anything in AlterFunction which would
> suggest that we invalidate the relation with which the function might
> be associated via trigger.

Hmm. I am not sure index that AM functions really need to be checked,
but triggers certainly do. I think if you are correct that an ALTER
FUNCTION wouldn't invalidate the relcache entry, which is I guess
pretty much the same problem Tom was pointing out in the thread to
which you linked.

But ... thinking out of the box as Tom suggests, what if we came up
with some new kind of invalidation message that is only sent when a
function's parallel-safety marking is changed? And every backend in
the same database then needs to re-evaluate the parallel-safety of
every relation for which it has cached a value. Such recomputations
might be expensive, but they would probably also occur very
infrequently. And you might even be able to make it a bit more
fine-grained if it's worth the effort to worry about that: say that in
addition to caching the parallel-safety of the relation, we also cache
a list of the pg_proc OIDs upon which that determination depends. Then
when we hear that the flag's been changed for OID 123456, we only need
to invalidate the cached value for relations that depended on that
pg_proc entry. There are ways that a relation could become
parallel-unsafe without changing the parallel-safety marking of any
function, but perhaps all of the other ways involve a relcache
invalidation?

Just brainstorming here. I might be off-track.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Sat, Jun 12, 2021 at 1:56 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jun 11, 2021 at 12:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Do we invalidate relcache entry if someone changes say trigger or some
> > index AM function property via Alter Function (in our case from safe
> > to unsafe or vice-versa)? Tsunakawa-San has mentioned this as the
> > reason in his email [1] why we can't rely on caching this property in
> > relcache entry. I also don't see anything in AlterFunction which would
> > suggest that we invalidate the relation with which the function might
> > be associated via trigger.
>
> Hmm. I am not sure index that AM functions really need to be checked,
> but triggers certainly do.
>

Why do you think we don't need to check index AM functions? Say we
have an index expression that uses function and if its parallel safety
is changed then probably that also impacts whether we can do insert in
parallel. Because otherwise, we will end up executing some parallel
unsafe function in parallel mode during index insertion.

> I think if you are correct that an ALTER
> FUNCTION wouldn't invalidate the relcache entry, which is I guess
> pretty much the same problem Tom was pointing out in the thread to
> which you linked.
>
> But ... thinking out of the box as Tom suggests, what if we came up
> with some new kind of invalidation message that is only sent when a
> function's parallel-safety marking is changed? And every backend in
> the same database then needs to re-evaluate the parallel-safety of
> every relation for which it has cached a value. Such recomputations
> might be expensive, but they would probably also occur very
> infrequently. And you might even be able to make it a bit more
> fine-grained if it's worth the effort to worry about that: say that in
> addition to caching the parallel-safety of the relation, we also cache
> a list of the pg_proc OIDs upon which that determination depends. Then
> when we hear that the flag's been changed for OID 123456, we only need
> to invalidate the cached value for relations that depended on that
> pg_proc entry.
>

Yeah, this could be one idea but I think even if we use pg_proc OID,
we still need to check all the rel cache entries to find which one
contains the invalidated OID and that could be expensive. I wonder
can't we directly find the relation involved and register invalidation
for the same? We are able to find the relation to which trigger
function is associated during drop function via findDependentObjects
by scanning pg_depend. Assuming, we are able to find the relation for
trigger function by scanning pg_depend, what kinds of problems do we
envision in registering the invalidation for the same?

I think we probably need to worry about the additional cost to find
dependent objects and if there are any race conditions in doing so as
pointed out by Tom in his email [1]. The concern related to cost could
be addressed by your idea of registering such an invalidation only
when the user changes the parallel safety of the function which we
don't expect to be a frequent operation. Now, here the race condition,
I could think of could be that by the time we change parallel-safety
(say making it unsafe) of a function, some of the other sessions might
have already started processing insert on a relation where that
function is associated via trigger or check constraint in which case
there could be a problem. I think to avoid that we need to acquire an
Exclusive lock on the relation as we are doing in Rename Policy kind
of operations.


> There are ways that a relation could become
> parallel-unsafe without changing the parallel-safety marking of any
> function, but perhaps all of the other ways involve a relcache
> invalidation?
>

Probably, but I guess we can once investigate/test those cases as well
if we find/agree on the solution for the functions stuff.

[1] - https://www.postgresql.org/message-id/1030301.1616560249%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Tom Lane
Date:
Amit Kapila <amit.kapila16@gmail.com> writes:
> Why do you think we don't need to check index AM functions?

Primarily because index AMs and opclasses can only be defined by
superusers, and the superuser is assumed to know what she's doing.

More generally, we've never made any provisions for the properties
of index AMs or opclasses to change on-the-fly.  I doubt that doing
so could possibly be justified on a cost-benefit basis.

            regards, tom lane



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Mon, Jun 14, 2021 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Why do you think we don't need to check index AM functions? Say we
> have an index expression that uses function and if its parallel safety
> is changed then probably that also impacts whether we can do insert in
> parallel. Because otherwise, we will end up executing some parallel
> unsafe function in parallel mode during index insertion.

I'm not saying that we don't need to check index expressions. I agree
that we need to check those. The index AM functions are things like
btint4cmp(). I don't think that a function like that should ever be
parallel-unsafe.

> Yeah, this could be one idea but I think even if we use pg_proc OID,
> we still need to check all the rel cache entries to find which one
> contains the invalidated OID and that could be expensive. I wonder
> can't we directly find the relation involved and register invalidation
> for the same? We are able to find the relation to which trigger
> function is associated during drop function via findDependentObjects
> by scanning pg_depend. Assuming, we are able to find the relation for
> trigger function by scanning pg_depend, what kinds of problems do we
> envision in registering the invalidation for the same?

I don't think that finding the relation involved and registering an
invalidation for the same will work properly. Suppose there is a
concurrently-running transaction which has created a new table and
attached a trigger function to it. You can't see any of the catalog
entries for that relation yet, so you can't invalidate it, but
invalidation needs to happen. Even if you used some snapshot that can
see those catalog entries before they are committed, I doubt it fixes
the race condition. You can't hold any lock on that relation, because
the creating transaction holds AccessExclusiveLock, but the whole
invalidation mechanism is built around the assumption that the sender
puts messages into the shared queue first and then releases locks,
while the receiver first acquires a conflicting lock and then
processes messages from the queue. Without locks, that synchronization
algorithm can't work reliably. As a consequence of all that, I believe
that, not just in this particular case but in general, the
invalidation message needs to describe the thing that has actually
changed, rather than any derived property. We can make invalidations
that say "some function's parallel-safety flag has changed" or "this
particular function's parallel-safety flag has changed" or "this
particular function has changed in some way" (this one, we have
already), but anything that involves trying to figure out what the
consequences of such a change might be and saying "hey, you, please
update XYZ because I changed something somewhere that could affect
that" is not going to be correct.

> I think we probably need to worry about the additional cost to find
> dependent objects and if there are any race conditions in doing so as
> pointed out by Tom in his email [1]. The concern related to cost could
> be addressed by your idea of registering such an invalidation only
> when the user changes the parallel safety of the function which we
> don't expect to be a frequent operation. Now, here the race condition,
> I could think of could be that by the time we change parallel-safety
> (say making it unsafe) of a function, some of the other sessions might
> have already started processing insert on a relation where that
> function is associated via trigger or check constraint in which case
> there could be a problem. I think to avoid that we need to acquire an
> Exclusive lock on the relation as we are doing in Rename Policy kind
> of operations.

Well, the big issue here is that we don't actually lock functions
while they are in use. So there's absolutely nothing that prevents a
function from being altered in any arbitrary way, or even dropped,
while code that uses it is running. I don't really know what happens
in practice if you do that sort of thing: can you get the same query
to run with one function definition for the first part of execution
and some other definition for the rest of execution? I tend to doubt
it, because I suspect we cache the function definition at some point.
If that's the case, caching the parallel-safety marking at the same
point seems OK too, or at least no worse than what we're doing
already. But on the other hand if it is possible for a query's notion
of the function definition to shift while the query is in flight, then
this is just another example of that and no worse than any other.
Instead of changing the parallel-safety flag, somebody could redefine
the function so that it divides by zero or produces a syntax error and
kaboom, running queries break. Either way, I don't see what the big
deal is. As long as we make the handling of parallel-safety consistent
with other ways the function could be concurrently redefined, it won't
suck any more than the current system already does, or in any
fundamentally new ways.

Even if this line of thinking is correct, there's a big issue for
partitioning hierarchies because there you need to know stuff about
relations that you don't have any other reason to open. I'm just
arguing that if there's no partitioning, the problem seems reasonably
solvable. Either you changed something about the relation, in which
case you've got to lock it and issue invalidations, or you've changed
something about the function, which could be handled via a new type of
invalidation. I don't really see why the cost would be particularly
bad. Suppose that for every relation, you have a flag which is either
PARALLEL_DML_SAFE, PARALLEL_DML_RESTRICTED, PARALLEL_DML_UNSAFE, or
PARALLEL_DML_SAFETY_UNKNOWN. When someone sends a message saying "some
existing function's parallel-safety changed!" you reset that flag for
every relation in the relcache to PARALLEL_DML_SAFETY_UNKNOWN. Then if
somebody does DML on that relation and we want to consider
parallelism, it's got to recompute that flag. None of that sounds
horribly expensive.

I mean, it could be somewhat annoying if you have 100k relations open
and sit around all day flipping parallel-safety markings on and off
and then doing a single-row insert after each flip, but if that's the
only scenario where we incur significant extra overhead from this kind
of design, it seems clearly better than forcing users to set a flag
manually. Maybe it isn't, but I don't really see what the other
problem would be right now. Except, of course, for partitioning, which
I'm not quite sure what to do about.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Mon, Jun 14, 2021 at 9:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jun 14, 2021 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Why do you think we don't need to check index AM functions? Say we
> > have an index expression that uses function and if its parallel safety
> > is changed then probably that also impacts whether we can do insert in
> > parallel. Because otherwise, we will end up executing some parallel
> > unsafe function in parallel mode during index insertion.
>
> I'm not saying that we don't need to check index expressions. I agree
> that we need to check those. The index AM functions are things like
> btint4cmp(). I don't think that a function like that should ever be
> parallel-unsafe.
>

Okay, but I think if we go with your suggested model where whenever
there is a change in parallel-safety of any function, we need to send
the new invalidation then I think it won't matter whether the function
is associated with index expression, check constraint in the table, or
is used in any other way.

>
> Even if this line of thinking is correct, there's a big issue for
> partitioning hierarchies because there you need to know stuff about
> relations that you don't have any other reason to open. I'm just
> arguing that if there's no partitioning, the problem seems reasonably
> solvable. Either you changed something about the relation, in which
> case you've got to lock it and issue invalidations, or you've changed
> something about the function, which could be handled via a new type of
> invalidation. I don't really see why the cost would be particularly
> bad. Suppose that for every relation, you have a flag which is either
> PARALLEL_DML_SAFE, PARALLEL_DML_RESTRICTED, PARALLEL_DML_UNSAFE, or
> PARALLEL_DML_SAFETY_UNKNOWN. When someone sends a message saying "some
> existing function's parallel-safety changed!" you reset that flag for
> every relation in the relcache to PARALLEL_DML_SAFETY_UNKNOWN. Then if
> somebody does DML on that relation and we want to consider
> parallelism, it's got to recompute that flag. None of that sounds
> horribly expensive.
>

Sounds reasonable. I will think more on this and see if anything else
comes to mind apart from what you have mentioned.

> I mean, it could be somewhat annoying if you have 100k relations open
> and sit around all day flipping parallel-safety markings on and off
> and then doing a single-row insert after each flip, but if that's the
> only scenario where we incur significant extra overhead from this kind
> of design, it seems clearly better than forcing users to set a flag
> manually. Maybe it isn't, but I don't really see what the other
> problem would be right now. Except, of course, for partitioning, which
> I'm not quite sure what to do about.
>

Yeah, dealing with partitioned tables is tricky. I think if we don't
want to check upfront the parallel safety of all the partitions then
the other option as discussed could be to ask the user to specify the
parallel safety of partitioned tables. We can additionally check the
parallel safety of partitions when we are trying to insert into a
particular partition and error out if we detect any parallel-unsafe
clause and we are in parallel-mode. So, in this case, we won't be
completely relying on the users. Users can either change the parallel
safe option of the table or remove/change the parallel-unsafe clause
after error. The new invalidation message as we are discussing would
invalidate the parallel-safety for individual partitions but not the
root partition (partitioned table itself). For root partition, we will
rely on information specified by the user.

I am not sure if we have a simple way to check the parallel safety of
partitioned tables. In some way, we need to rely on user either (a) by
providing an option to specify whether parallel Inserts (and/or other
DMLs) can be performed, or (b) by providing a guc and/or rel option
which indicate that we can check the parallel-safety of all the
partitions. Yet another option that I don't like could be to
parallelize inserts on non-partitioned tables.

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Tue, Jun 15, 2021 at 7:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Okay, but I think if we go with your suggested model where whenever
> there is a change in parallel-safety of any function, we need to send
> the new invalidation then I think it won't matter whether the function
> is associated with index expression, check constraint in the table, or
> is used in any other way.

No, it will still matter, because I'm proposing that the
parallel-safety of functions that we only access via operator classes
will not even be checked. Also, if we decided to make the system more
fine-grained - e.g. by invalidating on the specific OID of the
function that was changed rather than doing something that is
database-wide or global - then it matters even more.

> Yeah, dealing with partitioned tables is tricky. I think if we don't
> want to check upfront the parallel safety of all the partitions then
> the other option as discussed could be to ask the user to specify the
> parallel safety of partitioned tables.

Just to be clear here, I don't think it really matters what we *want*
to do. I don't think it's reasonably *possible* to check all the
partitions, because we don't hold locks on them. When we're assessing
a bunch of stuff related to an individual relation, we have a lock on
it. I think - though we should double-check tablecmds.c - that this is
enough to prevent all of the dependent objects - triggers,
constraints, etc. - from changing. So the stuff we care about is
stable. But the situation with a partitioned table is different. In
that case, we can't even examine that stuff without locking all the
partitions. And even if we do lock all the partitions, the stuff could
change immediately afterward and we wouldn't know. So I think it would
be difficult to make it correct.

Now, maybe it could be done, and I think that's worth a little more
thought. For example, perhaps whenever we invalidate a relation, we
could also somehow send some new, special kind of invalidation for its
parent saying, essentially, "hey, one of your children has changed in
a way you might care about." But that's not good enough, because it
only goes up one level. The grandparent would still be unaware that a
change it potentially cares about has occurred someplace down in the
partitioning hierarchy. That seems hard to patch up, again because of
the locking rules. The child can know the OID of its parent without
locking the parent, but it can't know the OID of its grandparent
without locking its parent. Walking up the whole partitioning
hierarchy might be an issue for a number of reasons, including
possible deadlocks, and possible race conditions where we don't emit
all of the right invalidations in the face of concurrent changes. So I
don't quite see a way around this part of the problem, but I may well
be missing something. In fact I hope I am missing something, because
solving this problem would be really nice.

> We can additionally check the
> parallel safety of partitions when we are trying to insert into a
> particular partition and error out if we detect any parallel-unsafe
> clause and we are in parallel-mode. So, in this case, we won't be
> completely relying on the users. Users can either change the parallel
> safe option of the table or remove/change the parallel-unsafe clause
> after error. The new invalidation message as we are discussing would
> invalidate the parallel-safety for individual partitions but not the
> root partition (partitioned table itself). For root partition, we will
> rely on information specified by the user.

Yeah, that may be the best we can do. Just to be clear, I think we
would want to check whether the relation is still parallel-safe at the
start of the operation, but not have a run-time check at each function
call.

> I am not sure if we have a simple way to check the parallel safety of
> partitioned tables. In some way, we need to rely on user either (a) by
> providing an option to specify whether parallel Inserts (and/or other
> DMLs) can be performed, or (b) by providing a guc and/or rel option
> which indicate that we can check the parallel-safety of all the
> partitions. Yet another option that I don't like could be to
> parallelize inserts on non-partitioned tables.

If we figure out a way to check the partitions automatically that
actually works, we don't need a switch for it; we can (and should)
just do it that way all the time. But if we can't come up with a
correct algorithm for that, then we'll need to add some kind of option
where the user declares whether it's OK.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Mon, Jun 14, 2021 at 9:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jun 14, 2021 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Yeah, this could be one idea but I think even if we use pg_proc OID,
> > we still need to check all the rel cache entries to find which one
> > contains the invalidated OID and that could be expensive. I wonder
> > can't we directly find the relation involved and register invalidation
> > for the same? We are able to find the relation to which trigger
> > function is associated during drop function via findDependentObjects
> > by scanning pg_depend. Assuming, we are able to find the relation for
> > trigger function by scanning pg_depend, what kinds of problems do we
> > envision in registering the invalidation for the same?
>
> I don't think that finding the relation involved and registering an
> invalidation for the same will work properly. Suppose there is a
> concurrently-running transaction which has created a new table and
> attached a trigger function to it. You can't see any of the catalog
> entries for that relation yet, so you can't invalidate it, but
> invalidation needs to happen. Even if you used some snapshot that can
> see those catalog entries before they are committed, I doubt it fixes
> the race condition. You can't hold any lock on that relation, because
> the creating transaction holds AccessExclusiveLock, but the whole
> invalidation mechanism is built around the assumption that the sender
> puts messages into the shared queue first and then releases locks,
> while the receiver first acquires a conflicting lock and then
> processes messages from the queue.
>

Won't such messages be proceesed at start transaction time
(AtStart_Cache->AcceptInvalidationMessages)?

> Without locks, that synchronization
> algorithm can't work reliably. As a consequence of all that, I believe
> that, not just in this particular case but in general, the
> invalidation message needs to describe the thing that has actually
> changed, rather than any derived property. We can make invalidations
> that say "some function's parallel-safety flag has changed" or "this
> particular function's parallel-safety flag has changed" or "this
> particular function has changed in some way" (this one, we have
> already), but anything that involves trying to figure out what the
> consequences of such a change might be and saying "hey, you, please
> update XYZ because I changed something somewhere that could affect
> that" is not going to be correct.
>
> > I think we probably need to worry about the additional cost to find
> > dependent objects and if there are any race conditions in doing so as
> > pointed out by Tom in his email [1]. The concern related to cost could
> > be addressed by your idea of registering such an invalidation only
> > when the user changes the parallel safety of the function which we
> > don't expect to be a frequent operation. Now, here the race condition,
> > I could think of could be that by the time we change parallel-safety
> > (say making it unsafe) of a function, some of the other sessions might
> > have already started processing insert on a relation where that
> > function is associated via trigger or check constraint in which case
> > there could be a problem. I think to avoid that we need to acquire an
> > Exclusive lock on the relation as we are doing in Rename Policy kind
> > of operations.
>
> Well, the big issue here is that we don't actually lock functions
> while they are in use. So there's absolutely nothing that prevents a
> function from being altered in any arbitrary way, or even dropped,
> while code that uses it is running. I don't really know what happens
> in practice if you do that sort of thing: can you get the same query
> to run with one function definition for the first part of execution
> and some other definition for the rest of execution? I tend to doubt
> it, because I suspect we cache the function definition at some point.
>

It is possible that in the same statement execution a different
function definition can be executed. Say, in session-1 we are
inserting three rows, on first-row execution definition-1 of function
in index expression gets executed. Now, from session-2, we change the
definition of the function to definition-2. Now, in session-1, on
second-row insertion, while executing definition-1 of function, we
insert in another table that will accept the invalidation message
registered in session-2. Now, on third-row insertion, the new
definition (definition-2) of function will be executed.

> If that's the case, caching the parallel-safety marking at the same
> point seems OK too, or at least no worse than what we're doing
> already. But on the other hand if it is possible for a query's notion
> of the function definition to shift while the query is in flight, then
> this is just another example of that and no worse than any other.
> Instead of changing the parallel-safety flag, somebody could redefine
> the function so that it divides by zero or produces a syntax error and
> kaboom, running queries break. Either way, I don't see what the big
> deal is. As long as we make the handling of parallel-safety consistent
> with other ways the function could be concurrently redefined, it won't
> suck any more than the current system already does, or in any
> fundamentally new ways.
>

Okay, so, in this scheme, we have allowed changing the function
definition during statement execution but even though the rel's
parallel-safe property gets modified (say to parallel-unsafe), we will
still proceed in parallel-mode as if it's not changed. I guess this
may not be a big deal as we can anyway allow breaking the running
statement by changing its definition and users may be okay if the
parallel statement errors out or behave in an unpredictable way in
such corner cases.

-- 
With Regards,
Amit Kapila.



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jun 15, 2021 at 7:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Yeah, dealing with partitioned tables is tricky. I think if we don't
> > want to check upfront the parallel safety of all the partitions then
> > the other option as discussed could be to ask the user to specify the
> > parallel safety of partitioned tables.
> 
> Just to be clear here, I don't think it really matters what we *want* to do. I don't
> think it's reasonably *possible* to check all the partitions, because we don't
> hold locks on them. When we're assessing a bunch of stuff related to an
> individual relation, we have a lock on it. I think - though we should
> double-check tablecmds.c - that this is enough to prevent all of the dependent
> objects - triggers, constraints, etc. - from changing. So the stuff we care about
> is stable. But the situation with a partitioned table is different. In that case, we
> can't even examine that stuff without locking all the partitions. And even if we
> do lock all the partitions, the stuff could change immediately afterward and we
> wouldn't know. So I think it would be difficult to make it correct.
> 
> Now, maybe it could be done, and I think that's worth a little more thought. For
> example, perhaps whenever we invalidate a relation, we could also somehow
> send some new, special kind of invalidation for its parent saying, essentially,
> "hey, one of your children has changed in a way you might care about." But
> that's not good enough, because it only goes up one level. The grandparent
> would still be unaware that a change it potentially cares about has occurred
> someplace down in the partitioning hierarchy. That seems hard to patch up,
> again because of the locking rules. The child can know the OID of its parent
> without locking the parent, but it can't know the OID of its grandparent without
> locking its parent. Walking up the whole partitioning hierarchy might be an
> issue for a number of reasons, including possible deadlocks, and possible race
> conditions where we don't emit all of the right invalidations in the face of
> concurrent changes. So I don't quite see a way around this part of the problem,
> but I may well be missing something. In fact I hope I am missing something,
> because solving this problem would be really nice.

I think the check of partition could be even more complicated if we need to
check the parallel safety of partition key expression. If user directly insert into
a partition, then we need invoke ExecPartitionCheck which will execute all it's
parent's and grandparent's partition key expressions. It means if we change a
parent table's partition key expression(by 1) change function in expr or 2) attach
the parent table as partition of another parent table), then we need to invalidate
all its child's relcache.

BTW, currently, If user attach a partitioned table 'A' to be partition of another
partitioned table 'B', the child of 'A' will not be invalidated.

Best regards,
houzj

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Tue, Jun 15, 2021 at 7:31 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Jun 15, 2021 at 7:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Okay, but I think if we go with your suggested model where whenever
> > there is a change in parallel-safety of any function, we need to send
> > the new invalidation then I think it won't matter whether the function
> > is associated with index expression, check constraint in the table, or
> > is used in any other way.
>
> No, it will still matter, because I'm proposing that the
> parallel-safety of functions that we only access via operator classes
> will not even be checked.
>

I am not very clear on what exactly you have in your mind in this
regard. I understand that while computing parallel-safety for a rel we
don't need to consider functions that we only access via operator
class but how do we distinguish such functions during Alter Function?
Is there a simple way to deduce that this is an operator class
function so don't register invalidation for it? Shall we check it via
pg_depend?

>
> > We can additionally check the
> > parallel safety of partitions when we are trying to insert into a
> > particular partition and error out if we detect any parallel-unsafe
> > clause and we are in parallel-mode. So, in this case, we won't be
> > completely relying on the users. Users can either change the parallel
> > safe option of the table or remove/change the parallel-unsafe clause
> > after error. The new invalidation message as we are discussing would
> > invalidate the parallel-safety for individual partitions but not the
> > root partition (partitioned table itself). For root partition, we will
> > rely on information specified by the user.
>
> Yeah, that may be the best we can do. Just to be clear, I think we
> would want to check whether the relation is still parallel-safe at the
> start of the operation, but not have a run-time check at each function
> call.
>

Agreed, that is what I also had in mind.

> > I am not sure if we have a simple way to check the parallel safety of
> > partitioned tables. In some way, we need to rely on user either (a) by
> > providing an option to specify whether parallel Inserts (and/or other
> > DMLs) can be performed, or (b) by providing a guc and/or rel option
> > which indicate that we can check the parallel-safety of all the
> > partitions. Yet another option that I don't like could be to
> > parallelize inserts on non-partitioned tables.
>
> If we figure out a way to check the partitions automatically that
> actually works, we don't need a switch for it; we can (and should)
> just do it that way all the time. But if we can't come up with a
> correct algorithm for that, then we'll need to add some kind of option
> where the user declares whether it's OK.
>

Yeah, so let us think for some more time and see if we can come up
with something better for partitions, otherwise, we can sort out
things further in this direction.

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Tue, Jun 15, 2021 at 8:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jun 14, 2021 at 9:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Mon, Jun 14, 2021 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > Yeah, this could be one idea but I think even if we use pg_proc OID,
> > > we still need to check all the rel cache entries to find which one
> > > contains the invalidated OID and that could be expensive. I wonder
> > > can't we directly find the relation involved and register invalidation
> > > for the same? We are able to find the relation to which trigger
> > > function is associated during drop function via findDependentObjects
> > > by scanning pg_depend. Assuming, we are able to find the relation for
> > > trigger function by scanning pg_depend, what kinds of problems do we
> > > envision in registering the invalidation for the same?
> >
> > I don't think that finding the relation involved and registering an
> > invalidation for the same will work properly. Suppose there is a
> > concurrently-running transaction which has created a new table and
> > attached a trigger function to it. You can't see any of the catalog
> > entries for that relation yet, so you can't invalidate it, but
> > invalidation needs to happen. Even if you used some snapshot that can
> > see those catalog entries before they are committed, I doubt it fixes
> > the race condition. You can't hold any lock on that relation, because
> > the creating transaction holds AccessExclusiveLock, but the whole
> > invalidation mechanism is built around the assumption that the sender
> > puts messages into the shared queue first and then releases locks,
> > while the receiver first acquires a conflicting lock and then
> > processes messages from the queue.
> >
>
> Won't such messages be proceesed at start transaction time
> (AtStart_Cache->AcceptInvalidationMessages)?
>

Even if accept invalidation at the start transaction time, we need to
accept and execute it after taking a lock on relation to ensure that
relation doesn't change afterward. I think what I mentioned didn't
break this assumption because after finding a relation we will take a
lock on it before registering the invalidation, so in the above
scenario, it should wait before registering the invalidation.

-- 
With Regards,
Amit Kapila.



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jun 15, 2021 at 7:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Okay, but I think if we go with your suggested model where whenever
> > there is a change in parallel-safety of any function, we need to send
> > the new invalidation then I think it won't matter whether the function
> > is associated with index expression, check constraint in the table, or
> > is used in any other way.
> 
> No, it will still matter, because I'm proposing that the parallel-safety of
> functions that we only access via operator classes will not even be checked.
> Also, if we decided to make the system more fine-grained - e.g. by invalidating
> on the specific OID of the function that was changed rather than doing
> something that is database-wide or global - then it matters even more.
> 
> > Yeah, dealing with partitioned tables is tricky. I think if we don't
> > want to check upfront the parallel safety of all the partitions then
> > the other option as discussed could be to ask the user to specify the
> > parallel safety of partitioned tables.
> 
> Just to be clear here, I don't think it really matters what we *want* to do. I don't
> think it's reasonably *possible* to check all the partitions, because we don't
> hold locks on them. When we're assessing a bunch of stuff related to an
> individual relation, we have a lock on it. I think - though we should
> double-check tablecmds.c - that this is enough to prevent all of the dependent
> objects - triggers, constraints, etc. - from changing. So the stuff we care about
> is stable. But the situation with a partitioned table is different. In that case, we
> can't even examine that stuff without locking all the partitions. And even if we
> do lock all the partitions, the stuff could change immediately afterward and we
> wouldn't know. So I think it would be difficult to make it correct.
> 
> Now, maybe it could be done, and I think that's worth a little more thought. For
> example, perhaps whenever we invalidate a relation, we could also somehow
> send some new, special kind of invalidation for its parent saying, essentially,
> "hey, one of your children has changed in a way you might care about." But
> that's not good enough, because it only goes up one level. The grandparent
> would still be unaware that a change it potentially cares about has occurred
> someplace down in the partitioning hierarchy. That seems hard to patch up,
> again because of the locking rules. The child can know the OID of its parent
> without locking the parent, but it can't know the OID of its grandparent without
> locking its parent. Walking up the whole partitioning hierarchy might be an
> issue for a number of reasons, including possible deadlocks, and possible race
> conditions where we don't emit all of the right invalidations in the face of
> concurrent changes. So I don't quite see a way around this part of the problem,
> but I may well be missing something. In fact I hope I am missing something,
> because solving this problem would be really nice.

For partition, I think postgres already have the logic about recursively finding
the parent table[1]. Can we copy that logic to send serval invalid messages that
invalidate the parent and grandparent... relcache if change a partition's parallel safety ?
Although, it means we need more lock(on its parents) when the parallel safety
changed, but it seems it's not a frequent scenario and looks acceptable.

[1] In generate_partition_qual()
    parentrelid = get_partition_parent(RelationGetRelid(rel), true);
    parent = relation_open(parentrelid, AccessShareLock);
    ...
    /* Add the parent's quals to the list (if any) */
    if (parent->rd_rel->relispartition)
        result = list_concat(generate_partition_qual(parent), my_qual);


Besides, I have a possible crazy idea that maybe it's not necessary to invalidate the
relcache when parallel safety of function is changed.

I take a look at what postgres currently behaves, and found that even if user changes
a function (CREATE OR REPLACE/ALTER FUNCTION) which is used in
objects(like: constraint or index expression or partition key expression),
the data in the relation won't be rechecked. And as the doc said[2], It is *not recommended*
to change the function which is already used in some other objects. The
recommended way to handle such a change is to drop the object, adjust the function
definition, and re-add the objects. Maybe we only care about the parallel safety
change when create or drop an object(constraint or index or partition or trigger). And
we can check the parallel safety when insert into a particular table, if find functions
not allowed in parallel mode which means someone change the function's parallel safety,
then we can invalidate the relcache and error out.

[2]https://www.postgresql.org/docs/14/ddl-constraints.html

Best regards,
houzj

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Tue, Jun 15, 2021 at 10:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I don't think that finding the relation involved and registering an
> > invalidation for the same will work properly. Suppose there is a
> > concurrently-running transaction which has created a new table and
> > attached a trigger function to it. You can't see any of the catalog
> > entries for that relation yet, so you can't invalidate it, but
> > invalidation needs to happen. Even if you used some snapshot that can
> > see those catalog entries before they are committed, I doubt it fixes
> > the race condition. You can't hold any lock on that relation, because
> > the creating transaction holds AccessExclusiveLock, but the whole
> > invalidation mechanism is built around the assumption that the sender
> > puts messages into the shared queue first and then releases locks,
> > while the receiver first acquires a conflicting lock and then
> > processes messages from the queue.
>
> Won't such messages be proceesed at start transaction time
> (AtStart_Cache->AcceptInvalidationMessages)?

Only if they show up in the queue before that. But there's nothing
forcing that to happen. You don't seem to understand how important
heavyweight locking is to the whole shared invalidation message
system....

> Okay, so, in this scheme, we have allowed changing the function
> definition during statement execution but even though the rel's
> parallel-safe property gets modified (say to parallel-unsafe), we will
> still proceed in parallel-mode as if it's not changed. I guess this
> may not be a big deal as we can anyway allow breaking the running
> statement by changing its definition and users may be okay if the
> parallel statement errors out or behave in an unpredictable way in
> such corner cases.

Yeah, I mean, it's no different than leaving the parallel-safety
marking exactly as it was and changing the body of the function to
call some other function marked parallel-unsafe. I don't think we've
gotten any complaints about that, because I don't think it would
normally have any really bad consequences; most likely you'd just get
an error saying that something-or-other isn't allowed in parallel
mode. If it does have bad consequences, then I guess we'll have to fix
it when we find out about it, but in the meantime there's no reason to
hold the parallel-safety flag to a stricter standard than the function
body.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Wed, Jun 16, 2021 at 9:22 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Jun 15, 2021 at 10:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > I don't think that finding the relation involved and registering an
> > > invalidation for the same will work properly. Suppose there is a
> > > concurrently-running transaction which has created a new table and
> > > attached a trigger function to it. You can't see any of the catalog
> > > entries for that relation yet, so you can't invalidate it, but
> > > invalidation needs to happen. Even if you used some snapshot that can
> > > see those catalog entries before they are committed, I doubt it fixes
> > > the race condition. You can't hold any lock on that relation, because
> > > the creating transaction holds AccessExclusiveLock, but the whole
> > > invalidation mechanism is built around the assumption that the sender
> > > puts messages into the shared queue first and then releases locks,
> > > while the receiver first acquires a conflicting lock and then
> > > processes messages from the queue.
> >
> > Won't such messages be proceesed at start transaction time
> > (AtStart_Cache->AcceptInvalidationMessages)?
>
> Only if they show up in the queue before that. But there's nothing
> forcing that to happen. You don't seem to understand how important
> heavyweight locking is to the whole shared invalidation message
> system....
>

I have responded about heavy-weight locking stuff in my next email [1]
and why I think the approach I mentioned will work. I don't deny that
I might be missing something here.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BT2CWqp40YqYttDA1Skk7wK6yDrkCD5GZ80QGr5ze-6g%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Thu, Jun 17, 2021 at 4:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I have responded about heavy-weight locking stuff in my next email [1]
> and why I think the approach I mentioned will work. I don't deny that
> I might be missing something here.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2BT2CWqp40YqYttDA1Skk7wK6yDrkCD5GZ80QGr5ze-6g%40mail.gmail.com

I mean I saw that but I don't see how it addresses the visibility
issue. There could be a relation that is not visible to your snapshot
and upon which AccessExclusiveLock is held which needs to be
invalidated. You can't lock it because it's AccessExclusiveLock'd
already.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Thu, Jun 17, 2021 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jun 17, 2021 at 4:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I have responded about heavy-weight locking stuff in my next email [1]
> > and why I think the approach I mentioned will work. I don't deny that
> > I might be missing something here.
> >
> > [1] - https://www.postgresql.org/message-id/CAA4eK1%2BT2CWqp40YqYttDA1Skk7wK6yDrkCD5GZ80QGr5ze-6g%40mail.gmail.com
>
> I mean I saw that but I don't see how it addresses the visibility
> issue.
>

I thought if we scan a system catalog using DirtySnapshot, then we
should be able to find such a relation. But, if the system catalog is
updated after our scan then surely we won't be able to see it and in
that case, we won't be able to send invalidation. Now, say if the rel
is not visible to us because of the snapshot we used or due to some
race condition then we won't be able to send the invalidation but why
we want to consider it worse than the case where we miss such
invalidations (invalidations due to change of parallel-safe property)
when the insertion into relation is in-progress.

> There could be a relation that is not visible to your snapshot
> and upon which AccessExclusiveLock is held which needs to be
> invalidated. You can't lock it because it's AccessExclusiveLock'd
> already.
>

Yeah, the session in which we are doing Alter Function won't be able
to lock it but it will wait for the AccessExclusiveLock on the rel to
be released because it will also try to acquire it before sending
invalidation.

-- 
With Regards,
Amit Kapila.



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
On Wednesday, June 16, 2021 11:27 AM houzj.fnst@fujitsu.com wrote:
> On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > Just to be clear here, I don't think it really matters what we *want*
> > to do. I don't think it's reasonably *possible* to check all the
> > partitions, because we don't hold locks on them. When we're assessing
> > a bunch of stuff related to an individual relation, we have a lock on
> > it. I think - though we should double-check tablecmds.c - that this is
> > enough to prevent all of the dependent objects - triggers,
> > constraints, etc. - from changing. So the stuff we care about is
> > stable. But the situation with a partitioned table is different. In
> > that case, we can't even examine that stuff without locking all the
> > partitions. And even if we do lock all the partitions, the stuff could change
> immediately afterward and we wouldn't know. So I think it would be difficult to
> >make it correct.
> > Now, maybe it could be done, and I think that's worth a little more
> > thought. For example, perhaps whenever we invalidate a relation, we
> > could also somehow send some new, special kind of invalidation for its
> > parent saying, essentially, "hey, one of your children has changed in
> > a way you might care about." But that's not good enough, because it
> > only goes up one level. The grandparent would still be unaware that a
> > change it potentially cares about has occurred someplace down in the
> > partitioning hierarchy. That seems hard to patch up, again because of
> > the locking rules. The child can know the OID of its parent without
> > locking the parent, but it can't know the OID of its grandparent
> > without locking its parent. Walking up the whole partitioning
> > hierarchy might be an issue for a number of reasons, including
> > possible deadlocks, and possible race conditions where we don't emit
> > all of the right invalidations in the face of concurrent changes. So I
> > don't quite see a way around this part of the problem, but I may well be
> missing something. In fact I hope I am missing something, because solving this
> problem would be really nice.
> 
> I think the check of partition could be even more complicated if we need to
> check the parallel safety of partition key expression. If user directly insert into a
> partition, then we need invoke ExecPartitionCheck which will execute all it's
> parent's and grandparent's partition key expressions. It means if we change a
> parent table's partition key expression(by 1) change function in expr or 2)
> attach the parent table as partition of another parent table), then we need to
> invalidate all its child's relcache.
> 
> BTW, currently, If user attach a partitioned table 'A' to be partition of another
> partitioned table 'B', the child of 'A' will not be invalidated.

To be honest, I didn't find a cheap way to invalidate partitioned table's
parallel safety automatically. For one thing, We need to recurse higher
in the partition tree to invalid all the parent table's relcache(and perhaps
all its children's relcache) not only when alter function parallel safety,
but also for DDLs which will invalid the partition's relcache, such as
CREATE/DROP INDEX/TRIGGER/CONSTRAINT. It seems too expensive. For another,
even if we can invalidate the partitioned table's parallel safety
automatically, we still need to lock all the partition when checking table's
parallel safety, because the partition's parallel safety could be changed
after checking the parallel safety.

So, IMO, at least for partitioned table, an explicit flag looks more acceptable.
For regular table, It seems we can work it out automatically, although 
I am not sure does anyone think it looks a bit inconsistent.

Best regards,
houzj

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Mon, Jun 21, 2021 at 12:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I thought if we scan a system catalog using DirtySnapshot, then we
> should be able to find such a relation. But, if the system catalog is
> updated after our scan then surely we won't be able to see it and in
> that case, we won't be able to send invalidation. Now, say if the rel
> is not visible to us because of the snapshot we used or due to some
> race condition then we won't be able to send the invalidation but why
> we want to consider it worse than the case where we miss such
> invalidations (invalidations due to change of parallel-safe property)
> when the insertion into relation is in-progress.

A concurrent change is something quite different than a change that
happened some time in the past. We all know that DROP TABLE blocks if
it is run while the table is in use, and everybody considers that
acceptable, but if DROP TABLE were to block because the table was in
use at some previous time, everybody would complain, and rightly so.
The same principle applies here. It's not possible to react to a
change that happens in the middle of the query. Somebody could argue
that we ought to lock all the functions we're using against concurrent
changes so that attempts to change their properties block on a lock
rather than succeeding. But given that that's not how it works, we can
hardly go back in time and switch to a non-parallel plan after we've
already decided on a parallel one. On the other hand, we should be
able to notice a change that has *already completed* at the time we do
planning. I don't see how we can blame failure to do that on anything
other than bad coding.

> Yeah, the session in which we are doing Alter Function won't be able
> to lock it but it will wait for the AccessExclusiveLock on the rel to
> be released because it will also try to acquire it before sending
> invalidation.

I think users would not be very happy with such behavior. Users accept
that if they try to access a relation, they might end up needing to
wait for a lock on it, but what you are proposing here might make a
session block waiting for a lock on a relation which it never
attempted to access.

I think this whole line of attack is a complete dead-end. We can
invent new types of invalidations if we want, but they need to be sent
based on which objects actually got changed, not based on what we
think might be affected indirectly as a result of those changes. It's
reasonable to regard something like a trigger or constraint as a
property of the table because it is really a dependent object. It is
associated with precisely one table when it is created and the
association can never be changed. On the other hand, functions clearly
have their own existence. They can be created and dropped
independently of any table and the tables with which they are
associated can change at any time. In that kind of situation,
invalidating the table based on changes to the function is riddled
with problems which I am pretty convinced we're never going to be able
to solve. I'm not 100% sure what we ought to do here, but I'm pretty
sure that looking up the tables that happen to be associated with the
function in the session that is modifying the function is not it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
On Monday, June 21, 2021 11:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jun 21, 2021 at 12:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Yeah, the session in which we are doing Alter Function won't be able
> > to lock it but it will wait for the AccessExclusiveLock on the rel to
> > be released because it will also try to acquire it before sending
> > invalidation.
> 
> I think users would not be very happy with such behavior. Users accept that if
> they try to access a relation, they might end up needing to wait for a lock on it,
> but what you are proposing here might make a session block waiting for a lock
> on a relation which it never attempted to access.
> 
> I think this whole line of attack is a complete dead-end. We can invent new
> types of invalidations if we want, but they need to be sent based on which
> objects actually got changed, not based on what we think might be affected
> indirectly as a result of those changes. It's reasonable to regard something like
> a trigger or constraint as a property of the table because it is really a
> dependent object. It is associated with precisely one table when it is created
> and the association can never be changed. On the other hand, functions clearly
> have their own existence. They can be created and dropped independently of
> any table and the tables with which they are associated can change at any time.
> In that kind of situation, invalidating the table based on changes to the function
> is riddled with problems which I am pretty convinced we're never going to be
> able to solve. I'm not 100% sure what we ought to do here, but I'm pretty sure
> that looking up the tables that happen to be associated with the function in the
> session that is modifying the function is not it.

I agree that we should send invalid message like
" function OID's parallel safety has changed ". And when each session accept
this invalid message, each session needs to invalid the related table. Based on
previous mails, we only want to invalid the table that use this function in the
index expression/trigger/constraints. The problem is how to get all the related
tables. Robert-san suggested cache a list of pg_proc OIDs, that means we need
to rebuild the list everytime if the relcache is invalidated. The cost to do that
could be expensive, especially for extracting pg_proc OIDs from index expression,
because we need to invoke index_open(index, lock) to get the index expression.

Or, maybe we can let each session uses the pg_depend to get the related table and
invalid them after accepting the new type invalid message.

Best regards,
houzj

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Wed, Jun 16, 2021 at 8:57 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> I think the check of partition could be even more complicated if we need to
> check the parallel safety of partition key expression. If user directly insert into
> a partition, then we need invoke ExecPartitionCheck which will execute all it's
> parent's and grandparent's partition key expressions. It means if we change a
> parent table's partition key expression(by 1) change function in expr or 2) attach
> the parent table as partition of another parent table), then we need to invalidate
> all its child's relcache.
>

I think we already invalidate the child entries when we add/drop
constraints on a parent table. See ATAddCheckConstraint,
ATExecDropConstraint. If I am not missing anything then this case
shouldn't be a problem. Do you have something else in mind?

-- 
With Regards,
Amit Kapila.



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, June 22, 2021 8:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jun 16, 2021 at 8:57 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> >
> > I think the check of partition could be even more complicated if we
> > need to check the parallel safety of partition key expression. If user
> > directly insert into a partition, then we need invoke
> > ExecPartitionCheck which will execute all it's parent's and
> > grandparent's partition key expressions. It means if we change a
> > parent table's partition key expression(by 1) change function in expr
> > or 2) attach the parent table as partition of another parent table), then we
> need to invalidate all its child's relcache.
> >
> 
> I think we already invalidate the child entries when we add/drop constraints on
> a parent table. See ATAddCheckConstraint, ATExecDropConstraint. If I am not
> missing anything then this case shouldn't be a problem. Do you have
> something else in mind?

Currently, attach/detach a partition doesn't invalidate the child entries
recursively, except when detach a partition concurrently which will add a
constraint to all the child. Do you mean we can add the logic about
invalidating the child entries recursively when attach/detach a partition ?

Best regards,
houzj

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Wed, Jun 23, 2021 at 6:35 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, June 22, 2021 8:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Wed, Jun 16, 2021 at 8:57 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> > >
> > > I think the check of partition could be even more complicated if we
> > > need to check the parallel safety of partition key expression. If user
> > > directly insert into a partition, then we need invoke
> > > ExecPartitionCheck which will execute all it's parent's and
> > > grandparent's partition key expressions. It means if we change a
> > > parent table's partition key expression(by 1) change function in expr
> > > or 2) attach the parent table as partition of another parent table), then we
> > need to invalidate all its child's relcache.
> > >
> >
> > I think we already invalidate the child entries when we add/drop constraints on
> > a parent table. See ATAddCheckConstraint, ATExecDropConstraint. If I am not
> > missing anything then this case shouldn't be a problem. Do you have
> > something else in mind?
>
> Currently, attach/detach a partition doesn't invalidate the child entries
> recursively, except when detach a partition concurrently which will add a
> constraint to all the child. Do you mean we can add the logic about
> invalidating the child entries recursively when attach/detach a partition ?
>

I was talking about adding/dropping CHECK or other constraints on
partitioned tables via Alter Table. I think if attach/detach leads to
change in constraints of child tables then either they should
invalidate child rels to avoid problems in the existing sessions or if
it is not doing due to a reason then probably it might not matter. I
see that you have started a separate thread [1] to confirm the
behavior of attach/detach partition and we might want to decide based
on the conclusion of that thread.

[1] -
https://www.postgresql.org/message-id/OS3PR01MB5718DA1C4609A25186D1FBF194089%40OS3PR01MB5718.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Wed, Jun 16, 2021 at 6:10 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > Now, maybe it could be done, and I think that's worth a little more thought. For
> > example, perhaps whenever we invalidate a relation, we could also somehow
> > send some new, special kind of invalidation for its parent saying, essentially,
> > "hey, one of your children has changed in a way you might care about." But
> > that's not good enough, because it only goes up one level. The grandparent
> > would still be unaware that a change it potentially cares about has occurred
> > someplace down in the partitioning hierarchy. That seems hard to patch up,
> > again because of the locking rules. The child can know the OID of its parent
> > without locking the parent, but it can't know the OID of its grandparent without
> > locking its parent. Walking up the whole partitioning hierarchy might be an
> > issue for a number of reasons, including possible deadlocks, and possible race
> > conditions where we don't emit all of the right invalidations in the face of
> > concurrent changes. So I don't quite see a way around this part of the problem,
> > but I may well be missing something. In fact I hope I am missing something,
> > because solving this problem would be really nice.
>
> For partition, I think postgres already have the logic about recursively finding
> the parent table[1]. Can we copy that logic to send serval invalid messages that
> invalidate the parent and grandparent... relcache if change a partition's parallel safety ?
> Although, it means we need more lock(on its parents) when the parallel safety
> changed, but it seems it's not a frequent scenario and looks acceptable.
>
> [1] In generate_partition_qual()
>         parentrelid = get_partition_parent(RelationGetRelid(rel), true);
>         parent = relation_open(parentrelid, AccessShareLock);
>         ...
>         /* Add the parent's quals to the list (if any) */
>         if (parent->rd_rel->relispartition)
>                 result = list_concat(generate_partition_qual(parent), my_qual);
>

As shown by me in another email [1], such a coding pattern can lead to
deadlock. It is because in some DDL operations we walk the partition
hierarchy from top to down and if we walk from bottom to upwards, then
that can lead to deadlock. I think this is a dangerous coding pattern
and we shouldn't try to replicate it.

[1] - https://www.postgresql.org/message-id/CAA4eK1LsFpjK5gL%2B0HEvoqB2DJVOi19vGAWbZBEx8fACOi5%2B_A%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Zhihong Yu
Date:


On Wed, Jun 23, 2021 at 8:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 16, 2021 at 6:10 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > Now, maybe it could be done, and I think that's worth a little more thought. For
> > example, perhaps whenever we invalidate a relation, we could also somehow
> > send some new, special kind of invalidation for its parent saying, essentially,
> > "hey, one of your children has changed in a way you might care about." But
> > that's not good enough, because it only goes up one level. The grandparent
> > would still be unaware that a change it potentially cares about has occurred
> > someplace down in the partitioning hierarchy. That seems hard to patch up,
> > again because of the locking rules. The child can know the OID of its parent
> > without locking the parent, but it can't know the OID of its grandparent without
> > locking its parent. Walking up the whole partitioning hierarchy might be an
> > issue for a number of reasons, including possible deadlocks, and possible race
> > conditions where we don't emit all of the right invalidations in the face of
> > concurrent changes. So I don't quite see a way around this part of the problem,
> > but I may well be missing something. In fact I hope I am missing something,
> > because solving this problem would be really nice.
>
> For partition, I think postgres already have the logic about recursively finding
> the parent table[1]. Can we copy that logic to send serval invalid messages that
> invalidate the parent and grandparent... relcache if change a partition's parallel safety ?
> Although, it means we need more lock(on its parents) when the parallel safety
> changed, but it seems it's not a frequent scenario and looks acceptable.
>
> [1] In generate_partition_qual()
>         parentrelid = get_partition_parent(RelationGetRelid(rel), true);
>         parent = relation_open(parentrelid, AccessShareLock);
>         ...
>         /* Add the parent's quals to the list (if any) */
>         if (parent->rd_rel->relispartition)
>                 result = list_concat(generate_partition_qual(parent), my_qual);
>

As shown by me in another email [1], such a coding pattern can lead to
deadlock. It is because in some DDL operations we walk the partition
hierarchy from top to down and if we walk from bottom to upwards, then
that can lead to deadlock. I think this is a dangerous coding pattern
and we shouldn't try to replicate it.

[1] - https://www.postgresql.org/message-id/CAA4eK1LsFpjK5gL%2B0HEvoqB2DJVOi19vGAWbZBEx8fACOi5%2B_A%40mail.gmail.com

--
With Regards,
Amit Kapila.


Hi,
How about walking the partition hierarchy bottom up, recording the parents but not taking the locks.
Once top-most parent is found, take the locks in reverse order (top down) ?

Cheers

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Greg Nancarrow
Date:
On Thu, Jun 24, 2021 at 1:38 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> How about walking the partition hierarchy bottom up, recording the parents but not taking the locks.
> Once top-most parent is found, take the locks in reverse order (top down) ?
>

Is it safe to walk up the partition hierarchy (to record the parents
for the eventual locking in reverse order) without taking locks?


Regards,
Greg Nancarrow
Fujitsu Australia



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
On Thursday, June 24, 2021 11:44 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> Hi,
> How about walking the partition hierarchy bottom up, recording the parents but not taking the locks.
> Once top-most parent is found, take the locks in reverse order (top down) ?

IMO, When we directly INSERT INTO a partition, postgres already lock the partition
as the target table before execution which means we cannot postpone the lock
on partition to find the parent table.

Best regards,
houzj

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Mon, Jun 21, 2021 at 4:40 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> To be honest, I didn't find a cheap way to invalidate partitioned table's
> parallel safety automatically.
>

I also don't see the feasibility for doing parallelism checks for
partitioned tables both because it is expensive due to
traversing/locking all the partitions and then the invalidations are
difficult to handle due to deadlock hazards as discussed above.

Let me try to summarize the discussion so far and see if we can have
any better ideas than what we have discussed so far or we want to go
with one of the ideas discussed till now. I think we have broadly
discussed two approaches (a) to automatically decide whether
parallelism can be enabled for inserts, (b) provide an option to the
user to specify whether inserts can be parallelized on a relation.

For the first approach (a), we have evaluated both the partitioned and
non-partitioned relation cases. For non-partitioned relations, we can
compute the parallel-safety of relation during the planning and save
it in the relation cache entry. This is normally safe because we have
a lock on the relation and any change to the relation should raise an
invalidation which will lead to re-computation of parallel-safety
information for a relation. Now, there are cases where the
parallel-safety of some trigger function or a function used in index
expression can be changed by the user which won't register an
invalidation for a relation. To handle such cases, we can register a
new kind of invalidation only when a function's parallel-safety
information is changed. And every backend in the same database then
needs to re-evaluate the parallel-safety of every relation for which
it has cached a value. For partitioned relations, the similar idea
won't work because of multiple reasons (a) We need to traverse and
lock all the partitions to compute the parallel-safety of the root
relation which could be very expensive; (b) Whenever we invalidate a
particular partition, we need to invalidate its parent hierarchy as
well. We can't traverse the parent hierarchy without taking locks on
the parent table which can lead to deadlock. The alternative could be
that for partitioned relations we can rely on the user-specified
information about parallel-safety (like the approach-b mentioned in
the previous paragraph). We can additionally check the parallel safety
of partitions when we are trying to insert into a particular partition
and error out if we detect any parallel-unsafe clause and we are in
parallel-mode. So, in this case, we won't be completely relying on the
users. Users can either change the parallel safe option of the table
or remove/change the parallel-unsafe clause after an error.

For the second approach (b), we can provide an option to the user to
specify whether inserts (or other dml's) can be parallelized for a
relation. One of the ideas is to provide some options like below to
the user:
CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE };
ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE };

This property is recorded in pg_class's relparallel column as 'u',
'r', or 's', just like pg_proc's proparallel. The default is UNSAFE.
Additionally, provide a function pg_get_parallel_safety(oid) using
which users can determine whether it is safe to enable parallelism.
Surely, after the user has checked with that function, one can add
some unsafe constraints to the table by altering the table but it will
still be an aid to enable parallelism on a relation.

The first approach (a) has an appeal because it would allow to
automatically parallelize inserts in many cases but might have some
overhead in some cases due to processing of relcache entries after the
parallel-safety of the relation is changed. The second approach (b)
has an appeal because of its consistent behavior for partitioned and
non-partitioned relations.

Among the above options, I would personally prefer (b) mainly because
of the consistent handling for partition and non-partition table cases
but I am fine with approach (a) as well if that is what other people
feel is better.

Thoughts?

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Greg Nancarrow
Date:
On Mon, Jun 28, 2021 at 7:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Among the above options, I would personally prefer (b) mainly because
> of the consistent handling for partition and non-partition table cases
> but I am fine with approach (a) as well if that is what other people
> feel is better.
>
> Thoughts?
>

I personally think "(b) provide an option to the user to specify
whether inserts can be parallelized on a relation" is the preferable
option.
There seems to be too many issues with the alternative of trying to
determine the parallel-safety of a partitioned table automatically.
I think (b) is the simplest and most consistent approach, working the
same way for all table types, and without the overhead of (a).
Also, I don't think (b) is difficult for the user. At worst, the user
can use the provided utility-functions at development-time to verify
the intended declared table parallel-safety.
I can't really see some mixture of (a) and (b) being acceptable.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Wed, Jun 30, 2021 at 11:46 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> I personally think "(b) provide an option to the user to specify
> whether inserts can be parallelized on a relation" is the preferable
> option.
> There seems to be too many issues with the alternative of trying to
> determine the parallel-safety of a partitioned table automatically.
> I think (b) is the simplest and most consistent approach, working the
> same way for all table types, and without the overhead of (a).
> Also, I don't think (b) is difficult for the user. At worst, the user
> can use the provided utility-functions at development-time to verify
> the intended declared table parallel-safety.
> I can't really see some mixture of (a) and (b) being acceptable.

Yeah, I'd like to have it be automatic, but I don't have a clear idea
how to make that work nicely. It's possible somebody (Tom?) can
suggest something that I'm overlooking, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Dilip Kumar
Date:
On Fri, Jul 2, 2021 at 8:16 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jun 30, 2021 at 11:46 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > I personally think "(b) provide an option to the user to specify
> > whether inserts can be parallelized on a relation" is the preferable
> > option.
> > There seems to be too many issues with the alternative of trying to
> > determine the parallel-safety of a partitioned table automatically.
> > I think (b) is the simplest and most consistent approach, working the
> > same way for all table types, and without the overhead of (a).
> > Also, I don't think (b) is difficult for the user. At worst, the user
> > can use the provided utility-functions at development-time to verify
> > the intended declared table parallel-safety.
> > I can't really see some mixture of (a) and (b) being acceptable.
>
> Yeah, I'd like to have it be automatic, but I don't have a clear idea
> how to make that work nicely. It's possible somebody (Tom?) can
> suggest something that I'm overlooking, though.

In general, for the non-partitioned table, where we don't have much
overhead of checking the parallel safety and invalidation is also not
a big problem so I am tempted to provide an automatic parallel safety
check.  This would enable parallelism for more cases wherever it is
suitable without user intervention.  OTOH, I understand that providing
automatic checking might be very costly if the number of partitions is
more.  Can't we provide some mid-way where the parallelism is enabled
by default for the normal table but for the partitioned table it is
disabled by default and the user has to set it safe for enabling
parallelism?  I agree that such behavior might sound a bit hackish.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
On Sunday, July 4, 2021 1:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Fri, Jul 2, 2021 at 8:16 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Wed, Jun 30, 2021 at 11:46 PM Greg Nancarrow <gregn4422@gmail.com>
> wrote:
> > > I personally think "(b) provide an option to the user to specify
> > > whether inserts can be parallelized on a relation" is the preferable
> > > option.
> > > There seems to be too many issues with the alternative of trying to
> > > determine the parallel-safety of a partitioned table automatically.
> > > I think (b) is the simplest and most consistent approach, working
> > > the same way for all table types, and without the overhead of (a).
> > > Also, I don't think (b) is difficult for the user. At worst, the
> > > user can use the provided utility-functions at development-time to
> > > verify the intended declared table parallel-safety.
> > > I can't really see some mixture of (a) and (b) being acceptable.
> >
> > Yeah, I'd like to have it be automatic, but I don't have a clear idea
> > how to make that work nicely. It's possible somebody (Tom?) can
> > suggest something that I'm overlooking, though.
> 
> In general, for the non-partitioned table, where we don't have much overhead
> of checking the parallel safety and invalidation is also not a big problem so I am
> tempted to provide an automatic parallel safety check.  This would enable
> parallelism for more cases wherever it is suitable without user intervention.
> OTOH, I understand that providing automatic checking might be very costly if
> the number of partitions is more.  Can't we provide some mid-way where the
> parallelism is enabled by default for the normal table but for the partitioned
> table it is disabled by default and the user has to set it safe for enabling
> parallelism?  I agree that such behavior might sound a bit hackish.

About the invalidation for non-partitioned table, I think it still has a
problem: When a function's parallel safety changed, it's expensive to judge
whether the function is related to index or trigger or some table-related
objects by using pg_depend, because we can only do the judgement in each
backend when accept a invalidation message.  If we don't do that, it means
whenever a function's parallel safety changed, we invalidate every relation's
cached safety which looks not very nice to me.

So, I personally think "(b) provide an option to the user to specify whether
inserts can be parallelized on a relation" is the preferable option.

Best regards,
houzj

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Sun, Jul 4, 2021 at 1:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> In general, for the non-partitioned table, where we don't have much
> overhead of checking the parallel safety and invalidation is also not
> a big problem so I am tempted to provide an automatic parallel safety
> check.  This would enable parallelism for more cases wherever it is
> suitable without user intervention.  OTOH, I understand that providing
> automatic checking might be very costly if the number of partitions is
> more.  Can't we provide some mid-way where the parallelism is enabled
> by default for the normal table but for the partitioned table it is
> disabled by default and the user has to set it safe for enabling
> parallelism?  I agree that such behavior might sound a bit hackish.

I think that's basically the proposal that Amit and I have been discussing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Wed, Jul 21, 2021 at 12:30 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, Jul 4, 2021 at 1:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > In general, for the non-partitioned table, where we don't have much
> > overhead of checking the parallel safety and invalidation is also not
> > a big problem so I am tempted to provide an automatic parallel safety
> > check.  This would enable parallelism for more cases wherever it is
> > suitable without user intervention.  OTOH, I understand that providing
> > automatic checking might be very costly if the number of partitions is
> > more.  Can't we provide some mid-way where the parallelism is enabled
> > by default for the normal table but for the partitioned table it is
> > disabled by default and the user has to set it safe for enabling
> > parallelism?  I agree that such behavior might sound a bit hackish.
>
> I think that's basically the proposal that Amit and I have been discussing.
>

I see here we have a mix of opinions from various people. Dilip seems
to be favoring the approach where we provide some option to the user
for partitioned tables and automatic behavior for non-partitioned
tables but he also seems to have mild concerns about this behavior.
OTOH, Greg and Hou-San seem to favor an approach where we can provide
an option to the user for both partitioned and non-partitioned tables.
I am also in favor of providing an option to the user for the sake of
consistency in behavior and not trying to introduce a special kind of
invalidation as it doesn't serve the purpose for partitioned tables.
Robert seems to be in favor of automatic behavior but it is not very
clear to me if he is fine with dealing differently for partitioned and
non-partitioned relations. Robert, can you please provide your opinion
on what do you think is the best way to move forward here?

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Wed, Jul 21, 2021 at 11:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I see here we have a mix of opinions from various people. Dilip seems
> to be favoring the approach where we provide some option to the user
> for partitioned tables and automatic behavior for non-partitioned
> tables but he also seems to have mild concerns about this behavior.
> OTOH, Greg and Hou-San seem to favor an approach where we can provide
> an option to the user for both partitioned and non-partitioned tables.
> I am also in favor of providing an option to the user for the sake of
> consistency in behavior and not trying to introduce a special kind of
> invalidation as it doesn't serve the purpose for partitioned tables.
> Robert seems to be in favor of automatic behavior but it is not very
> clear to me if he is fine with dealing differently for partitioned and
> non-partitioned relations. Robert, can you please provide your opinion
> on what do you think is the best way to move forward here?

I thought we had agreed on handling partitioned and unpartitioned
tables differently, but maybe I misunderstood the discussion.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Fri, Jul 23, 2021 at 6:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jul 21, 2021 at 11:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I see here we have a mix of opinions from various people. Dilip seems
> > to be favoring the approach where we provide some option to the user
> > for partitioned tables and automatic behavior for non-partitioned
> > tables but he also seems to have mild concerns about this behavior.
> > OTOH, Greg and Hou-San seem to favor an approach where we can provide
> > an option to the user for both partitioned and non-partitioned tables.
> > I am also in favor of providing an option to the user for the sake of
> > consistency in behavior and not trying to introduce a special kind of
> > invalidation as it doesn't serve the purpose for partitioned tables.
> > Robert seems to be in favor of automatic behavior but it is not very
> > clear to me if he is fine with dealing differently for partitioned and
> > non-partitioned relations. Robert, can you please provide your opinion
> > on what do you think is the best way to move forward here?
>
> I thought we had agreed on handling partitioned and unpartitioned
> tables differently, but maybe I misunderstood the discussion.
>

I think for the consistency argument how about allowing users to
specify a parallel-safety option for both partitioned and
non-partitioned relations but for non-partitioned relations if users
didn't specify, it would be computed automatically? If the user has
specified parallel-safety option for non-partitioned relation then we
would consider that instead of computing the value by ourselves.

Another reason for hesitation to do automatically for non-partitioned
relations was the new invalidation which will invalidate the cached
parallel-safety for all relations in relcache for a particular
database. As mentioned by Hou-San [1], it seems we need to do this
whenever any function's parallel-safety is changed. OTOH, changing
parallel-safety for a function is probably not that often to matter in
practice which is why I think you seem to be fine with this idea. So,
I think, on that premise, it is okay to go ahead with different
handling for partitioned and non-partitioned relations here.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB5716EC1D07ACCA24373C2557941B9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Robert Haas
Date:
On Sat, Jul 24, 2021 at 5:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I think for the consistency argument how about allowing users to
> specify a parallel-safety option for both partitioned and
> non-partitioned relations but for non-partitioned relations if users
> didn't specify, it would be computed automatically? If the user has
> specified parallel-safety option for non-partitioned relation then we
> would consider that instead of computing the value by ourselves.

Having the option for both partitioned and non-partitioned tables
doesn't seem like the worst idea ever, but I am also not entirely sure
that I understand the point.

> Another reason for hesitation to do automatically for non-partitioned
> relations was the new invalidation which will invalidate the cached
> parallel-safety for all relations in relcache for a particular
> database. As mentioned by Hou-San [1], it seems we need to do this
> whenever any function's parallel-safety is changed. OTOH, changing
> parallel-safety for a function is probably not that often to matter in
> practice which is why I think you seem to be fine with this idea.

Right. I think it should be quite rare, and invalidation events are
also not crazy expensive. We can test what the worst case is, but if
you have to sit there and run ALTER FUNCTION in a tight loop to see a
measurable performance impact, it's not a real problem. There may be a
code complexity argument against trying to figure it out
automatically, perhaps, but I don't think there's a big performance
issue.

What bothers me is that if this is something people have to set
manually then many people won't and will not get the benefit of the
feature. And some of them will also set it incorrectly and have
problems. So I am in favor of trying to determine it automatically
where possible, to make it easy for people. However, other people may
feel differently, and I'm not trying to say they're necessarily wrong.
I'm just telling you what I think.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Mon, Jul 26, 2021 at 8:33 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Jul 24, 2021 at 5:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I think for the consistency argument how about allowing users to
> > specify a parallel-safety option for both partitioned and
> > non-partitioned relations but for non-partitioned relations if users
> > didn't specify, it would be computed automatically? If the user has
> > specified parallel-safety option for non-partitioned relation then we
> > would consider that instead of computing the value by ourselves.
>
> Having the option for both partitioned and non-partitioned tables
> doesn't seem like the worst idea ever, but I am also not entirely sure
> that I understand the point.
>

Consider below ways to allow the user to specify the parallel-safety option:

(a)
CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ...
ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ..

OR

(b)
CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true)
ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true)

The point was what should we do if the user specifies the option for a
non-partitioned table. Do we just ignore it or give an error that this
is not a valid syntax/option when used with non-partitioned tables? I
find it slightly odd that this option works for partitioned tables but
gives an error for non-partitioned tables but maybe we can document
it.

With the above syntax, even if the user doesn't specify the
parallelism option for non-partitioned relations, we will determine it
automatically. Now, in some situations, users might want to force
parallelism even when we wouldn't have chosen it automatically. It is
possible that she might face an error due to some parallel-unsafe
function but OTOH, she might have ensured that it is safe to choose
parallelism in her particular case.

> > Another reason for hesitation to do automatically for non-partitioned
> > relations was the new invalidation which will invalidate the cached
> > parallel-safety for all relations in relcache for a particular
> > database. As mentioned by Hou-San [1], it seems we need to do this
> > whenever any function's parallel-safety is changed. OTOH, changing
> > parallel-safety for a function is probably not that often to matter in
> > practice which is why I think you seem to be fine with this idea.
>
> Right. I think it should be quite rare, and invalidation events are
> also not crazy expensive. We can test what the worst case is, but if
> you have to sit there and run ALTER FUNCTION in a tight loop to see a
> measurable performance impact, it's not a real problem. There may be a
> code complexity argument against trying to figure it out
> automatically, perhaps, but I don't think there's a big performance
> issue.
>

True, there could be some code complexity but I think we can see once
the patch is ready for review.

> What bothers me is that if this is something people have to set
> manually then many people won't and will not get the benefit of the
> feature. And some of them will also set it incorrectly and have
> problems. So I am in favor of trying to determine it automatically
> where possible, to make it easy for people. However, other people may
> feel differently, and I'm not trying to say they're necessarily wrong.
> I'm just telling you what I think.
>

Thanks for all your suggestions and feedback.

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Dilip Kumar
Date:
On Tue, Jul 27, 2021 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jul 26, 2021 at 8:33 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Consider below ways to allow the user to specify the parallel-safety option:
>
> (a)
> CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ...
> ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ..
>
> OR
>
> (b)
> CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true)
> ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true)
>
> The point was what should we do if the user specifies the option for a
> non-partitioned table. Do we just ignore it or give an error that this
> is not a valid syntax/option when used with non-partitioned tables? I
> find it slightly odd that this option works for partitioned tables but
> gives an error for non-partitioned tables but maybe we can document
> it.

IMHO, for a non-partitioned table, we should be default allow the
parallel safely checking so that users don't have to set it for
individual tables, OTOH, I don't think that there is any point in
blocking the syntax for the non-partitioned table, So I think for the
non-partitioned table if the user hasn't set it we should do automatic
safety checking and if the user has defined the safety externally then
we should respect that.  And for the partitioned table, we will never
do the automatic safety checking and we should always respect what the
user has set.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Tue, Jul 27, 2021 at 11:28 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jul 27, 2021 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Jul 26, 2021 at 8:33 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > Consider below ways to allow the user to specify the parallel-safety option:
> >
> > (a)
> > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ...
> > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ..
> >
> > OR
> >
> > (b)
> > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true)
> > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true)
> >
> > The point was what should we do if the user specifies the option for a
> > non-partitioned table. Do we just ignore it or give an error that this
> > is not a valid syntax/option when used with non-partitioned tables? I
> > find it slightly odd that this option works for partitioned tables but
> > gives an error for non-partitioned tables but maybe we can document
> > it.
>
> IMHO, for a non-partitioned table, we should be default allow the
> parallel safely checking so that users don't have to set it for
> individual tables, OTOH, I don't think that there is any point in
> blocking the syntax for the non-partitioned table, So I think for the
> non-partitioned table if the user hasn't set it we should do automatic
> safety checking and if the user has defined the safety externally then
> we should respect that.  And for the partitioned table, we will never
> do the automatic safety checking and we should always respect what the
> user has set.
>

This is exactly what I am saying. BTW, do you have any preference for
the syntax among (a) or (b)?

-- 
With Regards,
Amit Kapila.



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Greg Nancarrow
Date:
On Tue, Jul 27, 2021 at 3:58 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> IMHO, for a non-partitioned table, we should be default allow the
> parallel safely checking so that users don't have to set it for
> individual tables, OTOH, I don't think that there is any point in
> blocking the syntax for the non-partitioned table, So I think for the
> non-partitioned table if the user hasn't set it we should do automatic
> safety checking and if the user has defined the safety externally then
> we should respect that.  And for the partitioned table, we will never
> do the automatic safety checking and we should always respect what the
> user has set.
>

Provided it is possible to distinguish between the default
parallel-safety (unsafe) and that default being explicitly specified
by the user, it should be OK.
In the case of performing the automatic parallel-safety checking and
the table using something that is parallel-unsafe, there will be a
performance degradation compared to the current code (hopefully only
small). That can be avoided by the user explicitly specifying that
it's parallel-unsafe.


Regards,
Greg Nancarrow
Fujitsu Australia



Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Amit Kapila
Date:
On Tue, Jul 27, 2021 at 4:00 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> On Tue, Jul 27, 2021 at 3:58 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > IMHO, for a non-partitioned table, we should be default allow the
> > parallel safely checking so that users don't have to set it for
> > individual tables, OTOH, I don't think that there is any point in
> > blocking the syntax for the non-partitioned table, So I think for the
> > non-partitioned table if the user hasn't set it we should do automatic
> > safety checking and if the user has defined the safety externally then
> > we should respect that.  And for the partitioned table, we will never
> > do the automatic safety checking and we should always respect what the
> > user has set.
> >
>
> Provided it is possible to distinguish between the default
> parallel-safety (unsafe) and that default being explicitly specified
> by the user, it should be OK.
>

Offhand, I don't see any problem with this. Do you have something
specific in mind?

> In the case of performing the automatic parallel-safety checking and
> the table using something that is parallel-unsafe, there will be a
> performance degradation compared to the current code (hopefully only
> small). That can be avoided by the user explicitly specifying that
> it's parallel-unsafe.
>

True, but I guess this should be largely addressed by caching the
value of parallel safety at the relation level. Sure, there will be
some cost the first time we compute it but on consecutive accesses, it
should be quite cheap.

-- 
With Regards,
Amit Kapila.



RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From
"houzj.fnst@fujitsu.com"
Date:
On July 27, 2021 1:14 PM Amit Kapila <amit.kapila16@gmail.com>
> On Mon, Jul 26, 2021 at 8:33 PM Robert Haas <robertmhaas@gmail.com>
> wrote:
> >
> > On Sat, Jul 24, 2021 at 5:52 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > I think for the consistency argument how about allowing users to
> > > specify a parallel-safety option for both partitioned and
> > > non-partitioned relations but for non-partitioned relations if users
> > > didn't specify, it would be computed automatically? If the user has
> > > specified parallel-safety option for non-partitioned relation then we
> > > would consider that instead of computing the value by ourselves.
> >
> > Having the option for both partitioned and non-partitioned tables
> > doesn't seem like the worst idea ever, but I am also not entirely sure
> > that I understand the point.
> >
> 
> Consider below ways to allow the user to specify the parallel-safety option:
> 
> (a)
> CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ...
> ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ..
> 
> OR
> 
> (b)
> CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true)
> ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true)

Personally, I think the approach (a) might be better. Since it's similar to
ALTER FUNCTION PARALLEL XXX which user might be more familiar with.

Besides, I think we need a new default value about parallel dml safety. Maybe
'auto' or 'null'(different from safe/restricted/unsafe). Because, user is
likely to alter the safety to the default value to get the automatic safety
check, a independent default value can make it more clear.

Best regards,
Houzj


Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From
Greg Nancarrow
Date:
On Wed, Jul 28, 2021 at 12:52 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> > Consider below ways to allow the user to specify the parallel-safety option:
> >
> > (a)
> > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ...
> > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ..
> >
> > OR
> >
> > (b)
> > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true)
> > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true)
>
> Personally, I think the approach (a) might be better. Since it's similar to
> ALTER FUNCTION PARALLEL XXX which user might be more familiar with.
>

I think so too.

> Besides, I think we need a new default value about parallel dml safety. Maybe
> 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is
> likely to alter the safety to the default value to get the automatic safety
> check, a independent default value can make it more clear.
>

Yes, I was thinking something similar when I said "Provided it is
possible to distinguish between the default parallel-safety (unsafe)
and that default being explicitly specified by the user". If we don't
have a new default value, then we need to distinguish these cases, but
I'm not sure Postgres does something similar elsewhere (for example,
for function parallel-safety, it's not currently recorded whether
parallel-safety=unsafe is because of the default or because the user
specifically set it to what is the default value).
Opinions?


Regards,
Greg Nancarrow
Fujitsu Australia



Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
Amit Kapila
Date:
Note: Changing the subject as I felt the topic has diverted from the
original reported case and also it might help others to pay attention.

On Wed, Jul 28, 2021 at 8:22 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> >
> > Consider below ways to allow the user to specify the parallel-safety option:
> >
> > (a)
> > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ...
> > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ..
> >
> > OR
> >
> > (b)
> > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true)
> > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true)
>
> Personally, I think the approach (a) might be better. Since it's similar to
> ALTER FUNCTION PARALLEL XXX which user might be more familiar with.
>

Okay, and I think for (b) true/false won't be sufficient because one
might want to specify restricted.

> Besides, I think we need a new default value about parallel dml safety. Maybe
> 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is
> likely to alter the safety to the default value to get the automatic safety
> check, a independent default value can make it more clear.
>

Hmm, but auto won't work for partitioned tables, right? If so, that
might appear like an inconsistency to the user and we need to document
the same. Let me summarize the discussion so far in this thread so
that it is helpful to others.

We would like to parallelize INSERT SELECT (first step INSERT +
parallel SELECT and then Parallel (INSERT + SELECT)) and for that, we
have explored a couple of ways. The first approach is to automatically
detect if it is safe to parallelize insert and then do it without user
intervention. To detect automatically, we need to determine the
parallel-safety of various expressions (like default column
expressions, check constraints, index expressions, etc.) at the
planning time which can be costly but we can avoid most of the cost if
we cache the parallel safety for the relation. So, the cost needs to
be paid just once. Now, we can't cache this for partitioned relations
because it can be very costly (as we need to lock all the partitions)
and has deadlock risks (while processing invalidation), this has been
explained in email [1].

Now, as we can't think of a nice way to determine parallel safety
automatically for partitioned relations, we thought of providing an
option to the user. The next thing to decide here is that if we are
providing an option to the user in one of the ways as mentioned above
in the email, what should we do if the user uses that option for
non-partitioned relations, shall we just ignore it or give an error
that this is not a valid syntax/option? The one idea which Dilip and I
are advocating is to respect the user's input for non-partitioned
relations and if it is not given then compute the parallel safety and
cache it.

To facilitate users for providing a parallel-safety option, we are
thinking to provide a utility function
"pg_get_table_parallel_dml_safety(regclass)" that
returns records of (objid, classid, parallel_safety) for all parallel
unsafe/restricted table-related objects from which the table's
parallel DML safety is determined. This will allow user to identify
unsafe objects and if the required user can change the parallel safety
of required functions and then use the parallel safety option for the
table.

Thoughts?

Note - This topic has been discussed in another thread as well [2] but
as many of the key technical points have been discussed here I thought
it is better to continue here.

[1] - https://www.postgresql.org/message-id/CAA4eK1Jwz8xGss4b0-33eyX0i5W_1CnqT16DjB9snVC--DoOsQ%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709%40TYAPR01MB2990.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
Greg Nancarrow
Date:
On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Besides, I think we need a new default value about parallel dml safety. Maybe
> > 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is
> > likely to alter the safety to the default value to get the automatic safety
> > check, a independent default value can make it more clear.
> >
>
> Hmm, but auto won't work for partitioned tables, right? If so, that
> might appear like an inconsistency to the user and we need to document
> the same. Let me summarize the discussion so far in this thread so
> that it is helpful to others.
>

To avoid that inconsistency, UNSAFE could be the default for
partitioned tables (and we would disallow setting AUTO for these).
So then AUTO is the default for non-partitioned tables only.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
"houzj.fnst@fujitsu.com"
Date:
On Friday, July 30, 2021 2:52 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > Besides, I think we need a new default value about parallel dml
> > > safety. Maybe 'auto' or 'null'(different from
> > > safe/restricted/unsafe). Because, user is likely to alter the safety
> > > to the default value to get the automatic safety check, a independent default
> > > value can make it more clear.
> > >
> >
> > Hmm, but auto won't work for partitioned tables, right? If so, that
> > might appear like an inconsistency to the user and we need to document
> > the same. Let me summarize the discussion so far in this thread so
> > that it is helpful to others.
> >
> 
> To avoid that inconsistency, UNSAFE could be the default for partitioned tables
> (and we would disallow setting AUTO for these).
> So then AUTO is the default for non-partitioned tables only.

I think this approach is reasonable, +1.

Best regards,
houzj 

Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
Amit Kapila
Date:
On Fri, Jul 30, 2021 at 6:53 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, July 30, 2021 2:52 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > > Besides, I think we need a new default value about parallel dml
> > > > safety. Maybe 'auto' or 'null'(different from
> > > > safe/restricted/unsafe). Because, user is likely to alter the safety
> > > > to the default value to get the automatic safety check, a independent default
> > > > value can make it more clear.
> > > >
> > >
> > > Hmm, but auto won't work for partitioned tables, right? If so, that
> > > might appear like an inconsistency to the user and we need to document
> > > the same. Let me summarize the discussion so far in this thread so
> > > that it is helpful to others.
> > >
> >
> > To avoid that inconsistency, UNSAFE could be the default for partitioned tables
> > (and we would disallow setting AUTO for these).
> > So then AUTO is the default for non-partitioned tables only.
>
> I think this approach is reasonable, +1.
>

I see the need to change to default via Alter Table but I am not sure
if Auto is the most appropriate way to handle that. How about using
DEFAULT itself as we do in the case of REPLICA IDENTITY? So, if users
have to alter parallel safety value to default, they need to just say
Parallel DML DEFAULT. The default would mean automatic behavior for
non-partitioned relations and ignore parallelism for partitioned
tables.

-- 
With Regards,
Amit Kapila.



Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
Greg Nancarrow
Date:
On Mon, Aug 2, 2021 at 2:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 30, 2021 at 6:53 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Friday, July 30, 2021 2:52 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > > Besides, I think we need a new default value about parallel dml
> > > > > safety. Maybe 'auto' or 'null'(different from
> > > > > safe/restricted/unsafe). Because, user is likely to alter the safety
> > > > > to the default value to get the automatic safety check, a independent default
> > > > > value can make it more clear.
> > > > >
> > > >
> > > > Hmm, but auto won't work for partitioned tables, right? If so, that
> > > > might appear like an inconsistency to the user and we need to document
> > > > the same. Let me summarize the discussion so far in this thread so
> > > > that it is helpful to others.
> > > >
> > >
> > > To avoid that inconsistency, UNSAFE could be the default for partitioned tables
> > > (and we would disallow setting AUTO for these).
> > > So then AUTO is the default for non-partitioned tables only.
> >
> > I think this approach is reasonable, +1.
> >
>
> I see the need to change to default via Alter Table but I am not sure
> if Auto is the most appropriate way to handle that. How about using
> DEFAULT itself as we do in the case of REPLICA IDENTITY? So, if users
> have to alter parallel safety value to default, they need to just say
> Parallel DML DEFAULT. The default would mean automatic behavior for
> non-partitioned relations and ignore parallelism for partitioned
> tables.
>

Hmm, I'm not so sure I'm sold on that.
I personally think "DEFAULT" here is vague, and users then need to
know what DEFAULT equates to, based on the type of table (partitioned
or non-partitioned table).
Also, then there's two ways to set the actual "default" DML
parallel-safety for partitioned tables: DEFAULT or UNSAFE.
At least "AUTO" is a meaningful default option name for
non-partitioned tables - "automatic" parallel-safety checking, and the
fact that it isn't the default (and can't be set) for partitioned
tables highlights the difference in the way being proposed to treat
them (i.e. use automatic checking only for non-partitioned tables).
I'd be interested to hear what others think.
I think a viable alternative would be to record whether an explicit
DML parallel-safety has been specified, and if not, apply default
behavior (i.e. by default use automatic checking for non-partitioned
tables and treat partitioned tables as UNSAFE). I'm just not sure
whether this kind of distinction (explicit vs implicit default) has
been used before in Postgres options.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
"houzj.fnst@fujitsu.com"
Date:
On August 2, 2021 2:04 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Mon, Aug 2, 2021 at 2:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jul 30, 2021 at 6:53 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Friday, July 30, 2021 2:52 PM Greg Nancarrow <gregn4422@gmail.com>
> wrote:
> > > > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > > >
> > > > > > Besides, I think we need a new default value about parallel
> > > > > > dml safety. Maybe 'auto' or 'null'(different from
> > > > > > safe/restricted/unsafe). Because, user is likely to alter the
> > > > > > safety to the default value to get the automatic safety check,
> > > > > > a independent default value can make it more clear.
> > > > > >
> > > > >
> > > > > Hmm, but auto won't work for partitioned tables, right? If so,
> > > > > that might appear like an inconsistency to the user and we need
> > > > > to document the same. Let me summarize the discussion so far in
> > > > > this thread so that it is helpful to others.
> > > > >
> > > >
> > > > To avoid that inconsistency, UNSAFE could be the default for
> > > > partitioned tables (and we would disallow setting AUTO for these).
> > > > So then AUTO is the default for non-partitioned tables only.
> > >
> > > I think this approach is reasonable, +1.
> > >
> >
> > I see the need to change to default via Alter Table but I am not sure
> > if Auto is the most appropriate way to handle that. How about using
> > DEFAULT itself as we do in the case of REPLICA IDENTITY? So, if users
> > have to alter parallel safety value to default, they need to just say
> > Parallel DML DEFAULT. The default would mean automatic behavior for
> > non-partitioned relations and ignore parallelism for partitioned
> > tables.
> >
> 
> Hmm, I'm not so sure I'm sold on that.
> I personally think "DEFAULT" here is vague, and users then need to know what
> DEFAULT equates to, based on the type of table (partitioned or non-partitioned
> table).
> Also, then there's two ways to set the actual "default" DML parallel-safety for
> partitioned tables: DEFAULT or UNSAFE.
> At least "AUTO" is a meaningful default option name for non-partitioned tables
> - "automatic" parallel-safety checking, and the fact that it isn't the default (and
> can't be set) for partitioned tables highlights the difference in the way being
> proposed to treat them (i.e. use automatic checking only for non-partitioned
> tables).
> I'd be interested to hear what others think.
> I think a viable alternative would be to record whether an explicit DML
> parallel-safety has been specified, and if not, apply default behavior (i.e. by
> default use automatic checking for non-partitioned tables and treat partitioned
> tables as UNSAFE). I'm just not sure whether this kind of distinction (explicit vs
> implicit default) has been used before in Postgres options.

I think both approaches are fine, but using "DEFAULT" might has a disadvantage
that if we somehow support automatic safety check for partitioned table in the
future, then the meaning of "DEFAULT" for partitioned table will change from
UNSAFE to automatic check. It could also bring some burden on the user to
modify their sql script.

Best regards,
houzj

RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
"houzj.fnst@fujitsu.com"
Date:
Based on the discussion here, I implemented the auto-safety-check feature.
Since most of the technical discussion happened here,I attatched the patches in
this thread.

The patches allow users to specify a parallel-safety option for both
partitioned and non-partitioned relations, and for non-partitioned relations if
users didn't specify, it would be computed automatically. If the user has
specified parallel-safety option then we would consider that instead of
computing the value by ourselves. But for partitioned table, if users didn't
specify the parallel dml safety, it will treat is as unsafe.

For non-partitioned relations, after computing the parallel-safety of relation
during the planning, we save it in the relation cache entry and invalidate the
cached parallel-safety for all relations in relcache for a particular database
whenever any function's parallel-safety is changed.

To make it possible for user to alter the safety to a not specified value to
get the automatic safety check, add a new default option(temporarily named
'DEFAULT' in addition to safe/unsafe/restricted) about parallel dml safety.

To facilitate users for providing a parallel-safety option, provide a utility
functionr "pg_get_table_parallel_dml_safety(regclass)" that returns records of
(objid, classid, parallel_safety) for all parallel unsafe/restricted
table-related objects from which the table's parallel DML safety is determined.
This will allow user to identify unsafe objects and if the required user can
change the parallel safety of required functions and then use the parallel
safety option for the table.

Best regards,
houzj

Attachment

RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
"houzj.fnst@fujitsu.com"
Date:
On Tues, August 3, 2021 3:40 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> Based on the discussion here, I implemented the auto-safety-check feature.
> Since most of the technical discussion happened here,I attatched the patches in
> this thread.
> 
> The patches allow users to specify a parallel-safety option for both partitioned
> and non-partitioned relations, and for non-partitioned relations if users didn't
> specify, it would be computed automatically. If the user has specified
> parallel-safety option then we would consider that instead of computing the
> value by ourselves. But for partitioned table, if users didn't specify the parallel
> dml safety, it will treat is as unsafe.
> 
> For non-partitioned relations, after computing the parallel-safety of relation
> during the planning, we save it in the relation cache entry and invalidate the
> cached parallel-safety for all relations in relcache for a particular database
> whenever any function's parallel-safety is changed.
> 
> To make it possible for user to alter the safety to a not specified value to get the
> automatic safety check, add a new default option(temporarily named 'DEFAULT'
> in addition to safe/unsafe/restricted) about parallel dml safety.
> 
> To facilitate users for providing a parallel-safety option, provide a utility
> functionr "pg_get_table_parallel_dml_safety(regclass)" that returns records of
> (objid, classid, parallel_safety) for all parallel unsafe/restricted table-related
> objects from which the table's parallel DML safety is determined.
> This will allow user to identify unsafe objects and if the required user can change
> the parallel safety of required functions and then use the parallel safety option
> for the table.

Update the commit message in patches to make it easier for others to review.

Best regards,
Houzj

Attachment

RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
"houzj.fnst@fujitsu.com"
Date:
On Fri, Aug 6, 2021 4:23 PM Hou zhijie <houzj.fnst@fujitsu.com> wrote:
> 
> Update the commit message in patches to make it easier for others to review.

CFbot reported a compile error due to recent commit 3aafc03.
Attach rebased patches which fix the error.

Best regards,
Hou zj

Attachment

RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
"houzj.fnst@fujitsu.com"
Date:
Thursday, August 19, 2021 4:16 PM Hou zhijie <houzj.fnst@fujitsu.com> wrote:
> On Fri, Aug 6, 2021 4:23 PM Hou zhijie <houzj.fnst@fujitsu.com> wrote:
> >
> > Update the commit message in patches to make it easier for others to review.
> 
> CFbot reported a compile error due to recent commit 3aafc03.
> Attach rebased patches which fix the error.

The patch can't apply to the HEAD branch due a recent commit.
Attach rebased patches.

Best regards,
Hou zj

Attachment

RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
"houzj.fnst@fujitsu.com"
Date:
From: Wednesday, September 1, 2021 5:24 PM Hou Zhijie<houzj.fnst@fujitsu.com>
> Thursday, August 19, 2021 4:16 PM Hou zhijie <houzj.fnst@fujitsu.com> wrote:
> > On Fri, Aug 6, 2021 4:23 PM Hou zhijie <houzj.fnst@fujitsu.com> wrote:
> > >
> > > Update the commit message in patches to make it easier for others to
> review.
> >
> > CFbot reported a compile error due to recent commit 3aafc03.
> > Attach rebased patches which fix the error.
> 
> The patch can't apply to the HEAD branch due a recent commit.
> Attach rebased patches.

In the past, the rewriter could generate a re-written query with a modifying
CTE does not have hasModifyingCTE flag set and this bug cause the regression
test(force_parallel_mode=regress) failure when enable parallel select for
insert, so , we had a workaround 0006.patch for it. But now, the bug has been
fixed in commit 362e2d and we don't need the workaround patch anymore.

Attach new version patch set which remove the workaround patch.

Best regards,
Hou zj

Attachment

Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
Julien Rouhaud
Date:
Hi,

On Thu, Sep 09, 2021 at 02:12:08AM +0000, houzj.fnst@fujitsu.com wrote:
> 
> Attach new version patch set which remove the workaround patch.

This version of the patchset doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3143.log
=== Applying patches on top of PostgreSQL commit ID a18b6d2dc288dfa6e7905ede1d4462edd6a8af47 ===
=== applying patch ./v19-0001-CREATE-ALTER-TABLE-PARALLEL-DML.patch
[...]
patching file src/backend/commands/tablecmds.c
Hunk #1 FAILED at 40.
Hunk #2 succeeded at 624 (offset 21 lines).
Hunk #3 succeeded at 670 (offset 21 lines).
Hunk #4 succeeded at 947 (offset 19 lines).
Hunk #5 succeeded at 991 (offset 19 lines).
Hunk #6 succeeded at 4256 (offset 40 lines).
Hunk #7 succeeded at 4807 (offset 40 lines).
Hunk #8 succeeded at 5217 (offset 40 lines).
Hunk #9 succeeded at 6193 (offset 42 lines).
Hunk #10 succeeded at 19278 (offset 465 lines).
1 out of 10 hunks FAILED -- saving rejects to file src/backend/commands/tablecmds.c.rej
[...]
patching file src/bin/pg_dump/pg_dump.c
Hunk #1 FAILED at 6253.
Hunk #2 FAILED at 6358.
Hunk #3 FAILED at 6450.
Hunk #4 FAILED at 6503.
Hunk #5 FAILED at 6556.
Hunk #6 FAILED at 6609.
Hunk #7 FAILED at 6660.
Hunk #8 FAILED at 6708.
Hunk #9 FAILED at 6756.
Hunk #10 FAILED at 6803.
Hunk #11 FAILED at 6872.
Hunk #12 FAILED at 6927.
Hunk #13 succeeded at 15524 (offset -1031 lines).
12 out of 13 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej
[...]
patching file src/bin/psql/describe.c
Hunk #1 succeeded at 1479 (offset -177 lines).
Hunk #2 succeeded at 1493 (offset -177 lines).
Hunk #3 succeeded at 1631 (offset -241 lines).
Hunk #4 succeeded at 3374 (offset -277 lines).
Hunk #5 succeeded at 3731 (offset -310 lines).
Hunk #6 FAILED at 4109.
1 out of 6 hunks FAILED -- saving rejects to file src/bin/psql/describe.c.rej

Could you send a rebased version?  In the meantime I will switch the entry to
Waiting on Author.



Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

From
Jacob Champion
Date:
On Thu, Jul 28, 2022 at 8:43 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Could you send a rebased version?  In the meantime I will switch the entry to
> Waiting on Author.

By request in [1] I'm marking this Returned with Feedback for now.
Whenever you're ready, you can resurrect the patch entry by visiting

    https://commitfest.postgresql.org/38/3143/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob

[1]
https://www.postgresql.org/message-id/OS0PR01MB571696D623F35A09AB51903A94969%40OS0PR01MB5716.jpnprd01.prod.outlook.com