Thread: [bug?] Missed parallel safety checks, and wrong parallel safety
[bug?] Missed parallel safety checks, and wrong parallel safety
From
"tsunakawa.takay@fujitsu.com"
Date:
Hello, I think we've found a few existing problems with handling the parallel safety of functions while doing an experiment. CouldI hear your opinions on what we should do? I'd be willing to create and submit a patch to fix them. The experiment is to add a parallel safety check in FunctionCallInvoke() and run the regression test with force_parallel_mode=regress. The added check errors out with ereport(ERROR) when the about-to-be-called function is parallelunsafe and the process is currently in parallel mode. 6 test cases failed because the following parallel-unsafefunctions were called: dsnowball_init balkifnull int44out text_w_default_out widget_out The first function is created in src/backend/snowball/snowball_create.sql for full text search. The remaining functionsare created during the regression test run. The relevant issues follow. (1) All the above functions are actually parallel safe looking at their implementations. It seems that their CREATE FUNCTIONstatements are just missing PARALLEL SAFE specifications, so I think I'll add them. dsnowball_lexize() may alsobe parallel safe. (2) I'm afraid the above phenomenon reveals that postgres overlooks parallel safety checks in some places. Specifically, wenoticed the following: * User-defined aggregate CREATE AGGREGATE allows to specify parallel safety of the aggregate itself and the planner checks it, but the support functionof the aggregate is not checked. OTOH, the document clearly says: https://www.postgresql.org/docs/devel/xaggr.html "Worth noting also is that for an aggregate to be executed in parallel, the aggregate itself must be marked PARALLEL SAFE.The parallel-safety markings on its support functions are not consulted." https://www.postgresql.org/docs/devel/sql-createaggregate.html "An aggregate will not be considered for parallelization if it is marked PARALLEL UNSAFE (which is the default!) or PARALLELRESTRICTED. Note that the parallel-safety markings of the aggregate's support functions are not consulted by theplanner, only the marking of the aggregate itself." Can we check the parallel safety of aggregate support functions during statement execution and error out? Is there any reasonnot to do so? * User-defined data type The input, output, send,receive, and other functions of a UDT are not checked for parallel safety. Is there any good reasonto not check them other than the concern about performance? * Functions for full text search Should CREATE TEXT SEARCH TEMPLATE ensure that the functions are parallel safe? (Those functions could be changed to parallelunsafe later with ALTER FUNCTION, though.) (3) Built-in UDFs are not checked for parallel safety The functions defined in fmgr_builtins[], which are derived from pg_proc.dat, are not checked. Most of them are marked parallelsafe, but some are paralel unsaferestricted. Besides, changing their parallel safety with ALTER FUNCTION PARALLEL does not affect the selection of query plan. This isbecause fmgr_builtins[] does not have a member for parallel safety. Should we add a member for parallel safety in fmgr_builtins[], and disallow ALTER FUNCTION to change the parallel safetyof builtin UDFs? Regards Takayuki Tsunakawa
On Tue, Apr 20, 2021 at 2:23 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > (2) > I'm afraid the above phenomenon reveals that postgres overlooks parallel safety checks in some places. Specifically, wenoticed the following: > > * User-defined aggregate > CREATE AGGREGATE allows to specify parallel safety of the aggregate itself and the planner checks it, but the support functionof the aggregate is not checked. OTOH, the document clearly says: > > https://www.postgresql.org/docs/devel/xaggr.html > > "Worth noting also is that for an aggregate to be executed in parallel, the aggregate itself must be marked PARALLEL SAFE.The parallel-safety markings on its support functions are not consulted." > > https://www.postgresql.org/docs/devel/sql-createaggregate.html > > "An aggregate will not be considered for parallelization if it is marked PARALLEL UNSAFE (which is the default!) or PARALLELRESTRICTED. Note that the parallel-safety markings of the aggregate's support functions are not consulted by theplanner, only the marking of the aggregate itself." IMO, the reason for not checking the parallel safety of the support functions is that the functions themselves can have whole lot of other functions (can be nested as well) which might be quite hard to check at the planning time. That is why the job of marking an aggregate as parallel safe is best left to the user. They have to mark the aggreage parallel unsafe if at least one support function is parallel unsafe, otherwise parallel safe. > Can we check the parallel safety of aggregate support functions during statement execution and error out? Is there anyreason not to do so? And if we were to do above, within the function execution API, we need to know where the function got called from(?). It is best left to the user to decide whether a function/aggregate is parallel safe or not. This is the main reason we have declarative constructs like parallel safe/unsafe/restricted. For core functions, we definitely should properly mark parallel safe/restricted/unsafe tags wherever possible. Please correct me If I miss something. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > On Tue, Apr 20, 2021 at 2:23 PM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: >> https://www.postgresql.org/docs/devel/xaggr.html >> >> "Worth noting also is that for an aggregate to be executed in parallel, the aggregate itself must be marked PARALLEL SAFE.The parallel-safety markings on its support functions are not consulted." > IMO, the reason for not checking the parallel safety of the support > functions is that the functions themselves can have whole lot of other > functions (can be nested as well) which might be quite hard to check > at the planning time. That is why the job of marking an aggregate as > parallel safe is best left to the user. Yes. I think the documentation is perfectly clear that this is intentional; I don't see a need to change it. >> Should we add a member for parallel safety in fmgr_builtins[], and disallow ALTER FUNCTION to change the parallel safetyof builtin UDFs? No. You'd have to be superuser anyway to do that, and we're not in the habit of trying to put training wheels on superusers. Don't have an opinion about the other points yet. regards, tom lane
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tom Lane <tgl@sss.pgh.pa.us> > Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > > IMO, the reason for not checking the parallel safety of the support > > functions is that the functions themselves can have whole lot of other > > functions (can be nested as well) which might be quite hard to check > > at the planning time. That is why the job of marking an aggregate as > > parallel safe is best left to the user. > > Yes. I think the documentation is perfectly clear that this is > intentional; I don't see a need to change it. OK, that's what I expected. I understood from this that the Postgres's stance toward parallel safety is that Postgres doesits best effort to check parallel safety (as far as it doesn't hurt performance much, and perhaps the core code doesn'tget very complex), and the user should be responsible for the actual parallel safety of ancillary objects (in thiscase, support functions for an aggregate) of the target object that he/she marked as parallel safe. > >> Should we add a member for parallel safety in fmgr_builtins[], and disallow > ALTER FUNCTION to change the parallel safety of builtin UDFs? > > No. You'd have to be superuser anyway to do that, and we're not in the > habit of trying to put training wheels on superusers. Understood. However, we may add the parallel safety member in fmgr_builtins[] in another thread for parallel INSERT SELECT. I'd appreciate your comment on this if you see any concern. > Don't have an opinion about the other points yet. I'd like to have your comments on them, too. But I understand you must be so busy at least until the beta release of PG14. Regards Takayuki Tsunakawa
"tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> writes: > From: Tom Lane <tgl@sss.pgh.pa.us> >> No. You'd have to be superuser anyway to do that, and we're not in the >> habit of trying to put training wheels on superusers. > Understood. However, we may add the parallel safety member in fmgr_builtins[] in another thread for parallel INSERT SELECT. I'd appreciate your comment on this if you see any concern. [ raised eyebrow... ] I find it very hard to understand why that would be necessary, or even a good idea. Not least because there's no spare room there; you'd have to incur a substantial enlargement of the array to add another flag. But also, that would indeed lock down the value of the parallel-safety flag, and that seems like a fairly bad idea. regards, tom lane
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tom Lane <tgl@sss.pgh.pa.us> > [ raised eyebrow... ] I find it very hard to understand why that would > be necessary, or even a good idea. Not least because there's no spare > room there; you'd have to incur a substantial enlargement of the > array to add another flag. But also, that would indeed lock down > the value of the parallel-safety flag, and that seems like a fairly > bad idea. You're right, FmgrBuiltins is already fully packed (24 bytes on 64-bit machines). Enlarging the frequently accessed fmgr_builtinsarray may wreak unexpectedly large adverse effect on performance. I wanted to check the parallel safety of functions, which various objects (data type, index, trigger, etc.) come down to,in FunctionCallInvoke() and other few places. But maybe we skip the check for built-in functions. That's a matter ofwhere we draw a line between where we check and where we don't. Regards Takayuki Tsunakawa
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
> I think we've found a few existing problems with handling the parallel safety of > functions while doing an experiment. Could I hear your opinions on what we > should do? I'd be willing to create and submit a patch to fix them. > > The experiment is to add a parallel safety check in FunctionCallInvoke() and run > the regression test with force_parallel_mode=regress. The added check > errors out with ereport(ERROR) when the about-to-be-called function is > parallel unsafe and the process is currently in parallel mode. 6 test cases failed > because the following parallel-unsafe functions were called: > > dsnowball_init > balkifnull > int44out > text_w_default_out > widget_out > > The first function is created in src/backend/snowball/snowball_create.sql for > full text search. The remaining functions are created during the regression > test run. > > (1) > All the above functions are actually parallel safe looking at their > implementations. It seems that their CREATE FUNCTION statements are just > missing PARALLEL SAFE specifications, so I think I'll add them. > dsnowball_lexize() may also be parallel safe. I agree that it's better to mark the function with correct parallel safety lable. Especially for the above functions which will be executed in parallel mode. It will be friendly to developer and user who is working on something related to parallel test. So, I attached the patch to mark the above functions parallel safe. Best regards, houzj
Attachment
On Wed, Apr 21, 2021 at 8:12 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Tom Lane <tgl@sss.pgh.pa.us> > > [ raised eyebrow... ] I find it very hard to understand why that would > > be necessary, or even a good idea. Not least because there's no spare > > room there; you'd have to incur a substantial enlargement of the > > array to add another flag. But also, that would indeed lock down > > the value of the parallel-safety flag, and that seems like a fairly > > bad idea. > > You're right, FmgrBuiltins is already fully packed (24 bytes on 64-bit machines). Enlarging the frequently accessed fmgr_builtinsarray may wreak unexpectedly large adverse effect on performance. > > I wanted to check the parallel safety of functions, which various objects (data type, index, trigger, etc.) come down to,in FunctionCallInvoke() and other few places. But maybe we skip the check for built-in functions. That's a matter ofwhere we draw a line between where we check and where we don't. > IIUC, the idea here is to check for parallel safety of functions at someplace in the code during function invocation so that if we execute any parallel unsafe/restricted function via parallel worker then we error out. If so, isn't it possible to deal with built-in and non-built-in functions in the same way? I think we want to have some safety checks for functions as we have for transaction id in AssignTransactionId(), command id in CommandCounterIncrement(), for write operations in heap_prepare_insert(), etc. Is that correct? -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: > On Wed, Apr 21, 2021 at 8:12 AM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: >> From: Tom Lane <tgl@sss.pgh.pa.us> >>> [ raised eyebrow... ] I find it very hard to understand why that would >>> be necessary, or even a good idea. > IIUC, the idea here is to check for parallel safety of functions at > someplace in the code during function invocation so that if we execute > any parallel unsafe/restricted function via parallel worker then we > error out. If so, isn't it possible to deal with built-in and > non-built-in functions in the same way? Yeah, one of the reasons I doubt this is a great idea is that you'd still have to fetch the pg_proc row for non-built-in functions. The obvious place to install such a check is fmgr_info(), which is fetching said row anyway for other purposes, so it's really hard to see how adding anything to FmgrBuiltin is going to help. regards, tom lane
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tom Lane <tgl@sss.pgh.pa.us> > Amit Kapila <amit.kapila16@gmail.com> writes: > > IIUC, the idea here is to check for parallel safety of functions at > > someplace in the code during function invocation so that if we execute > > any parallel unsafe/restricted function via parallel worker then we > > error out. If so, isn't it possible to deal with built-in and > > non-built-in functions in the same way? > > Yeah, one of the reasons I doubt this is a great idea is that you'd > still have to fetch the pg_proc row for non-built-in functions. > > The obvious place to install such a check is fmgr_info(), which is > fetching said row anyway for other purposes, so it's really hard to > see how adding anything to FmgrBuiltin is going to help. Thank you, fmgr_info() looks like the best place to do the parallel safety check. Having a quick look at its callers, Ididn't find any concerning place (of course, we can't be relieved until the regression test succeeds.) Also, with fmgr_info(),we don't have to find other places to add the check to deal with functions calls in execExpr.c and execExprInterp.c. This is beautiful. But the current fmgr_info() does not check the parallel safety of builtin functions. It does not have information to dothat. There are two options. Which do you think is better? I think 2. 1) fmgr_info() reads pg_proc like for non-builtin functions This ruins the effort for the fast path for builtin functions. I can't imagine how large the adverse impact on performancewould be, but I'm worried. The benefit is that ALTER FUNCTION on builtin functions takes effect. But such operations are nonsensical, so I don't thinkwe want to gain such a benefit. 2) Gen_fmgrtab.pl adds a member for proparallel in FmgrBuiltin But we don't want to enlarge FmgrBuiltin struct. So, change the existing bool members strict and and retset into one memberof type char, and represent the original values with some bit flags. Then we use that member for proparallel as well. (As a result, one byte is left for future use.) I think we'll try 2). I'd be grateful if you could point out anything I need to be careful to. Regards Takayuki Tsunakawa
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Hou, Zhijie/侯 志杰 <houzj.fnst@fujitsu.com> > I agree that it's better to mark the function with correct parallel safety lable. > Especially for the above functions which will be executed in parallel mode. > It will be friendly to developer and user who is working on something related to > parallel test. > > So, I attached the patch to mark the above functions parallel safe. Thank you, the patch looks good. Please register it with the next CF if not yet. Regards Takayuki Tsunakawa
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
> Thank you, fmgr_info() looks like the best place to do the parallel safety check. > Having a quick look at its callers, I didn't find any concerning place (of course, > we can't be relieved until the regression test succeeds.) Also, with fmgr_info(), > we don't have to find other places to add the check to deal with functions calls > in execExpr.c and execExprInterp.c. This is beautiful. > > But the current fmgr_info() does not check the parallel safety of builtin > functions. It does not have information to do that. There are two options. > Which do you think is better? I think 2. > > 1) fmgr_info() reads pg_proc like for non-builtin functions This ruins the effort > for the fast path for builtin functions. I can't imagine how large the adverse > impact on performance would be, but I'm worried. For approach 1): I think it could result in infinite recursion. For example: If we first access one built-in function A which have not been cached, it need access the pg_proc, When accessing the pg_proc, it internally still need some built-in function B to scan. At this time, if B is not cached , it still need to fetch function B's parallel flag by accessing the pg_proc.proparallel. Then it could result in infinite recursion. So, I think we can consider the approach 2) Best regards, houzj
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Hou, Zhijie/侯 志杰 <houzj.fnst@fujitsu.com> > For approach 1): I think it could result in infinite recursion. > > For example: > If we first access one built-in function A which have not been cached, > it need access the pg_proc, When accessing the pg_proc, it internally still need > some built-in function B to scan. > At this time, if B is not cached , it still need to fetch function B's parallel flag by > accessing the pg_proc.proparallel. > Then it could result in infinite recursion. > > So, I think we can consider the approach 2) Hmm, that makes sense. That's a problem structure similar to that of relcache. Only one choice is left already, unlessthere's another better idea. Regards Takayuki Tsunakawa
On Wed, Apr 21, 2021 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> writes: > > From: Tom Lane <tgl@sss.pgh.pa.us> > >> No. You'd have to be superuser anyway to do that, and we're not in the > >> habit of trying to put training wheels on superusers. > > > Understood. However, we may add the parallel safety member in fmgr_builtins[] in another thread for parallel INSERTSELECT. I'd appreciate your comment on this if you see any concern. > > [ raised eyebrow... ] I find it very hard to understand why that would > be necessary, or even a good idea. Not least because there's no spare > room there; you'd have to incur a substantial enlargement of the > array to add another flag. But also, that would indeed lock down > the value of the parallel-safety flag, and that seems like a fairly > bad idea. > I'm curious. The FmgrBuiltin struct includes the "strict" flag, so that would "lock down the value" of the strict flag, wouldn't it? Regards, Greg Nancarrow Fujitsu Australia
On Wed, Apr 21, 2021 at 7:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > On Wed, Apr 21, 2021 at 8:12 AM tsunakawa.takay@fujitsu.com > > <tsunakawa.takay@fujitsu.com> wrote: > >> From: Tom Lane <tgl@sss.pgh.pa.us> > >>> [ raised eyebrow... ] I find it very hard to understand why that would > >>> be necessary, or even a good idea. > > > IIUC, the idea here is to check for parallel safety of functions at > > someplace in the code during function invocation so that if we execute > > any parallel unsafe/restricted function via parallel worker then we > > error out. If so, isn't it possible to deal with built-in and > > non-built-in functions in the same way? > > Yeah, one of the reasons I doubt this is a great idea is that you'd > still have to fetch the pg_proc row for non-built-in functions. > So, are you suggesting that we should fetch the pg_proc row for built-in functions as well for this purpose? If not, then how to identify parallel safety of built-in functions in fmgr_info()? Another idea could be that we check parallel safety of built-in functions based on some static information. As we know the func_ids of non-parallel-safe built-in functions, we can have a function fmgr_builtin_parallel_safe() which check if the func_id is not one among the predefined func_ids of non-parallel-safe built-in functions, it returns true, otherwise, false. Then, we can call this new function in fmgr_info for built-in functions. Thoughts? -- With Regards, Amit Kapila.
Greg Nancarrow <gregn4422@gmail.com> writes: > I'm curious. The FmgrBuiltin struct includes the "strict" flag, so > that would "lock down the value" of the strict flag, wouldn't it? It does, but that's much more directly a property of the function's C code than parallel-safety is. regards, tom lane
On Fri, Apr 23, 2021 at 9:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Nancarrow <gregn4422@gmail.com> writes: > > I'm curious. The FmgrBuiltin struct includes the "strict" flag, so > > that would "lock down the value" of the strict flag, wouldn't it? > > It does, but that's much more directly a property of the function's > C code than parallel-safety is. I'm not sure I agree with that, but I think having the "strict" flag in FmgrBuiltin isn't that nice either. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Apr 23, 2021 at 9:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Greg Nancarrow <gregn4422@gmail.com> writes: >>> I'm curious. The FmgrBuiltin struct includes the "strict" flag, so >>> that would "lock down the value" of the strict flag, wouldn't it? >> It does, but that's much more directly a property of the function's >> C code than parallel-safety is. > I'm not sure I agree with that, but I think having the "strict" flag > in FmgrBuiltin isn't that nice either. Yeah, if we could readily do without it, we probably would. But the function call mechanism itself is responsible for implementing strictness, so it *has* to have that flag available. regards, tom lane
On Fri, Apr 23, 2021 at 6:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Greg Nancarrow <gregn4422@gmail.com> writes: > > I'm curious. The FmgrBuiltin struct includes the "strict" flag, so > > that would "lock down the value" of the strict flag, wouldn't it? > > It does, but that's much more directly a property of the function's > C code than parallel-safety is. > Isn't parallel safety also the C code property? I mean unless someone changes the built-in function code, changing that property would be dangerous. The other thing is even if a user is allowed to change one function's property, how will they know which other functions are called by that function and whether they are parallel-safe or not. For example, say if the user wants to change the parallel safe property of a built-in function brin_summarize_new_values, unless she changes its code and the functions called by it like brin_summarize_range, it would be dangerous. So, isn't it better to disallow changing parallel safety for built-in functions? Also, if the strict property of built-in functions is fixed internally, why we allow users to change it and is that of any help? -- With Regards, Amit Kapila.
On Sat, Apr 24, 2021 at 12:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 23, 2021 at 6:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Greg Nancarrow <gregn4422@gmail.com> writes: > > > I'm curious. The FmgrBuiltin struct includes the "strict" flag, so > > > that would "lock down the value" of the strict flag, wouldn't it? > > > > It does, but that's much more directly a property of the function's > > C code than parallel-safety is. > > > > Isn't parallel safety also the C code property? I mean unless someone > changes the built-in function code, changing that property would be > dangerous. The other thing is even if a user is allowed to change one > function's property, how will they know which other functions are > called by that function and whether they are parallel-safe or not. For > example, say if the user wants to change the parallel safe property of > a built-in function brin_summarize_new_values, unless she changes its > code and the functions called by it like brin_summarize_range, it > would be dangerous. So, isn't it better to disallow changing parallel > safety for built-in functions? > > Also, if the strict property of built-in functions is fixed > internally, why we allow users to change it and is that of any help? > Yes, I'd like to know too. I think it would make more sense to disallow changing properties like strict/parallel-safety on built-in functions. Also, with sufficient privileges, a built-in function can be redefined, yet the original function (whose info is cached in FmgrBuiltins[], from build-time) is always invoked, not the newly-defined version. Regards, Greg Nancarrow Fujitsu Australia
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
> >>> I'm curious. The FmgrBuiltin struct includes the "strict" flag, so > >>> that would "lock down the value" of the strict flag, wouldn't it? > > >> It does, but that's much more directly a property of the function's C > >> code than parallel-safety is. > > > I'm not sure I agree with that, but I think having the "strict" flag > > in FmgrBuiltin isn't that nice either. > > Yeah, if we could readily do without it, we probably would. But the function > call mechanism itself is responsible for implementing strictness, so it *has* to > have that flag available. So, If we do not want to lock down the parallel safety of built-in functions. It seems we can try to fetch the proparallel from pg_proc for built-in function in fmgr_info_cxt_security too. To avoid recursive safety check when fetching proparallel from pg_proc, we can add a Global variable to mark is it in a recursive state. And we skip safety check in a recursive state, In this approach, parallel safety will not be locked, and there are no new members in FmgrBuiltin. Attaching the patch about this approach [0001-approach-1]. Thoughts ? I also attached another approach patch [0001-approach-2] about adding parallel safety in FmgrBuiltin, because this approach seems faster and we can combine some bool member into a bitflag to avoid enlarging the FmgrBuiltin array, though this approach will lock down the parallel safety of built-in function. Best regards, houzj
Attachment
On Fri, Apr 23, 2021 at 10:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Isn't parallel safety also the C code property? In my opinion, yes. > So, isn't it better to disallow changing parallel > safety for built-in functions? Superusers can do a lot of DML operations on the system catalogs that are manifestly unsafe. I think we should really consider locking that down somehow, but I doubt it makes sense to treat this case separately from all the others. What do you think will happen if you change proargtypes? > Also, if the strict property of built-in functions is fixed > internally, why we allow users to change it and is that of any help? One real application of allowing these sorts of changes is letting users correct things that were done wrong originally without waiting for a new major release. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Apr 28, 2021 at 9:42 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > So, If we do not want to lock down the parallel safety of built-in functions. > It seems we can try to fetch the proparallel from pg_proc for built-in function > in fmgr_info_cxt_security too. To avoid recursive safety check when fetching > proparallel from pg_proc, we can add a Global variable to mark is it in a recursive state. > And we skip safety check in a recursive state, In this approach, parallel safety > will not be locked, and there are no new members in FmgrBuiltin. > > Attaching the patch about this approach [0001-approach-1]. > Thoughts ? This seems to be full of complicated if-tests that don't seem necessary and aren't explained by the comments. Also, introducing a system cache lookup here seems completely unacceptable from a reliability point of view, and I bet it's not too good for performance, either. > I also attached another approach patch [0001-approach-2] about adding > parallel safety in FmgrBuiltin, because this approach seems faster and > we can combine some bool member into a bitflag to avoid enlarging the > FmgrBuiltin array, though this approach will lock down the parallel safety > of built-in function. This doesn't seem like a good idea either. I really don't understand what problem any of this is intended to solve. Bharath's analysis above seems right on point to me. I think if anybody is writing a patch that requires that this be changed in this way, that person is probably doing something wrong. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, May 5, 2021 at 5:09 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Apr 23, 2021 at 10:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Isn't parallel safety also the C code property? > > > Also, if the strict property of built-in functions is fixed > > internally, why we allow users to change it and is that of any help? > > One real application of allowing these sorts of changes is letting > users correct things that were done wrong originally without waiting > for a new major release. > Problem is, for built-in functions, the changes are allowed, but for some properties (like strict) the allowed changes don't actually take effect (this is what Amit was referring to - so why allow those changes?). It's because some of the function properties are cached in FmgrBuiltins[] (for a "fast-path" lookup for built-ins), according to their state at build time (from pg_proc.dat), but ALTER FUNCTION is just changing it in the system catalogs. Also, with sufficient privileges, a built-in function can be redefined, yet the original function (whose info is cached in FmgrBuiltins[]) is always invoked, not the newly-defined version. Regards, Greg Nancarrow Fujitsu Australia
On Tue, May 4, 2021 at 11:47 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > Problem is, for built-in functions, the changes are allowed, but for > some properties (like strict) the allowed changes don't actually take > effect (this is what Amit was referring to - so why allow those > changes?). > It's because some of the function properties are cached in > FmgrBuiltins[] (for a "fast-path" lookup for built-ins), according to > their state at build time (from pg_proc.dat), but ALTER FUNCTION is > just changing it in the system catalogs. Also, with sufficient > privileges, a built-in function can be redefined, yet the original > function (whose info is cached in FmgrBuiltins[]) is always invoked, > not the newly-defined version. I agree. I think that's not ideal. I think we should consider putting some more restrictions on updating system catalog changes, and I also think that if we can get out of having strict need to be part of FmgrBuiltins[] that would be good. But what I don't agree with is the idea that since strict already has this problem, it's OK to do the same thing with parallel-safety. That seems to me to be making a bad situation worse, and I can't see what problem it actually solves. -- Robert Haas EDB: http://www.enterprisedb.com
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com> > On Tue, May 4, 2021 at 11:47 PM Greg Nancarrow <gregn4422@gmail.com> > wrote: > > Problem is, for built-in functions, the changes are allowed, but for > > some properties (like strict) the allowed changes don't actually take > > effect (this is what Amit was referring to - so why allow those > > changes?). > > It's because some of the function properties are cached in > > FmgrBuiltins[] (for a "fast-path" lookup for built-ins), according to > > their state at build time (from pg_proc.dat), but ALTER FUNCTION is > > just changing it in the system catalogs. Also, with sufficient > > privileges, a built-in function can be redefined, yet the original > > function (whose info is cached in FmgrBuiltins[]) is always invoked, > > not the newly-defined version. > > I agree. I think that's not ideal. I think we should consider putting > some more restrictions on updating system catalog changes, and I also > think that if we can get out of having strict need to be part of > FmgrBuiltins[] that would be good. But what I don't agree with is the > idea that since strict already has this problem, it's OK to do the > same thing with parallel-safety. That seems to me to be making a bad > situation worse, and I can't see what problem it actually solves. Let me divide the points: (1) Is it better to get hardcoded function properties out of fmgr_builtins[]? It's little worth doing so or thinking about that. It's no business for users to change system objects, in this case systemfunctions. Also, hardcoding is a worthwhile strategy for good performance or other inevitable reasons. Postgres is using it as in thesystem catalog relcache below. [relcache.c] /* * hardcoded tuple descriptors, contents generated by genbki.pl */ static const FormData_pg_attribute Desc_pg_class[Natts_pg_class] = {Schema_pg_class}; static const FormData_pg_attribute Desc_pg_attribute[Natts_pg_attribute] = {Schema_pg_attribute}; ... (2) Should it be disallowed for users to change system function properties with ALTER FUNCTION? Maybe yes, but it's not an important issue for achieving parallel INSERT SELECT at the moment. So, I think this can be discussedin an independent separate thread. As a reminder, Postgres have safeguards against modifying system objects as follows. test=# drop table^C test=# drop function pg_wal_replay_pause(); ERROR: cannot drop function pg_wal_replay_pause() because it is required by the database system test=# drop table pg_largeobject; ERROR: permission denied: "pg_largeobject" is a system catalog OTOH, Postgres doesn't disallow changing the system table column values directly, such as UPDATE pg_proc SET .... But it'swarned in the manual that such operations are dangerous. So, we don't have to care about it. Chapter 52. System Catalogs https://www.postgresql.org/docs/devel/catalogs.html "You can drop and recreate the tables, add columns, insert and update values, and severely mess up your system that way.Normally, one should not change the system catalogs by hand, there are normally SQL commands to do that. (For example,CREATE DATABASE inserts a row into the pg_database catalog — and actually creates the database on disk.) There aresome exceptions for particularly esoteric operations, but many of those have been made available as SQL commands overtime, and so the need for direct manipulation of the system catalogs is ever decreasing." (3) Why do we want to have parallel-safety in fmgr_builtins[]? As proposed in this thread and/or "Parallel INSERT SELECT take 2", we thought of detecting parallel unsafe function executionduring SQL statement execution, instead of imposing much overhead to check parallel safety during query planning. Specifically, we add parallel safety check in fmgr_info() and/or FunctionCallInvoke(). (Alternatively, I think we can conclude that we assume parallel unsafe built-in functions won't be used in parallel DML. In that case, we don't change FmgrBuiltin and we just skip the parallel safety check for built-in functions when thefunction is called. Would you buy this?) Regards Takayuki Tsunakawa
On Wed, May 5, 2021 at 7:39 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, May 4, 2021 at 11:47 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > Problem is, for built-in functions, the changes are allowed, but for > > some properties (like strict) the allowed changes don't actually take > > effect (this is what Amit was referring to - so why allow those > > changes?). > > It's because some of the function properties are cached in > > FmgrBuiltins[] (for a "fast-path" lookup for built-ins), according to > > their state at build time (from pg_proc.dat), but ALTER FUNCTION is > > just changing it in the system catalogs. Also, with sufficient > > privileges, a built-in function can be redefined, yet the original > > function (whose info is cached in FmgrBuiltins[]) is always invoked, > > not the newly-defined version. > > I agree. I think that's not ideal. I think we should consider putting > some more restrictions on updating system catalog changes, and I also > think that if we can get out of having strict need to be part of > FmgrBuiltins[] that would be good. But what I don't agree with is the > idea that since strict already has this problem, it's OK to do the > same thing with parallel-safety. That seems to me to be making a bad > situation worse, and I can't see what problem it actually solves. > The idea here is to check for parallel safety of functions at someplace in the code during function invocation so that if we execute any parallel unsafe/restricted function via parallel worker then we error out. I think that is a good safety net especially if we can do it with some simple check. Now, we already have pg_proc information in fmgr_info_cxt_security for non-built-in functions, so we can check that and error out if the unsafe function is invoked in parallel mode. It has been observed that we were calling some unsafe functions in parallel-mode in the regression tests which is caught by such a check. I think here the main challenge is to do a similar check for built-in functions and one of the ideas to do that was to extend FmgrBuiltins to cache that information. I see why that idea is not good and maybe we can see if there is some other place where we already fetch pg_proc for built-in functions and can we have such a check at that place? If that is not feasible then we can probably have such a check just for non-built-in functions as that seems straightforward. -- With Regards, Amit Kapila.
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com> > On Wed, Apr 28, 2021 at 9:42 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > So, If we do not want to lock down the parallel safety of built-in functions. > > It seems we can try to fetch the proparallel from pg_proc for built-in function > > in fmgr_info_cxt_security too. To avoid recursive safety check when fetching > > proparallel from pg_proc, we can add a Global variable to mark is it in a > recursive state. > > And we skip safety check in a recursive state, In this approach, parallel safety > > will not be locked, and there are no new members in FmgrBuiltin. > > > > Attaching the patch about this approach [0001-approach-1]. > > Thoughts ? > > This seems to be full of complicated if-tests that don't seem > necessary and aren't explained by the comments. Also, introducing a > system cache lookup here seems completely unacceptable from a > reliability point of view, and I bet it's not too good for > performance, either. Agreed. Also, PG_TRY() would be relatively heavyweight here. I'm inclined to avoid this approach. > > I also attached another approach patch [0001-approach-2] about adding > > parallel safety in FmgrBuiltin, because this approach seems faster and > > we can combine some bool member into a bitflag to avoid enlarging the > > FmgrBuiltin array, though this approach will lock down the parallel safety > > of built-in function. > > This doesn't seem like a good idea either. This looks good to me. What makes you think so? That said, I actually think we want to avoid even this change. That is, I'm wondering if we can skip the parallel safetyof built-in functions. Can anyone think of the need to check the parallel safety of built-in functions in the context of parallel INSERT SELECT? The planner already checks (or can check) the parallel safety of the SELECT part with max_parallel_hazard(). Regardingthe INSERT part, we're trying to rely on the parallel safety of the target table that the user specified with CREATE/ALTERTABLE. I don't see where we need to check the parallel safety of uilt-in functions. Regards Takayuki Tsunakawa
On Wed, May 5, 2021 at 10:54 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > (1) Is it better to get hardcoded function properties out of fmgr_builtins[]? > It's little worth doing so or thinking about that. It's no business for users to change system objects, in this case systemfunctions. I don't entirely agree with this. Whether or not users have any business changing system functions, it's better to have one source of truth than two. Now that being said, this is not a super-important problem for us to go solve, and hard-coding a certain amount of stuff is probably necessary to allow the system to bootstrap itself. So for me it's one of those things that is in.a grey area: if someone showed up with a patch to make it better, I'd be happy. But I probably wouldn't spend much time on writing such a patch unless it solved some other problem that I cared about. > (3) Why do we want to have parallel-safety in fmgr_builtins[]? > As proposed in this thread and/or "Parallel INSERT SELECT take 2", we thought of detecting parallel unsafe function executionduring SQL statement execution, instead of imposing much overhead to check parallel safety during query planning. Specifically, we add parallel safety check in fmgr_info() and/or FunctionCallInvoke(). I haven't read that thread, but I don't understand how that can work. The reason we need to detect it at plan time is because we might need to use a different plan. At execution time it's too late for that. Also, it seems potentially quite expensive. A query may be planned once and executed many times. Also, a single query execution may call the same SQL function many times. I think we don't want to incur the overhead of an extra syscache lookup every time anyone calls any function. A very simple expression like a+b+c+d+e involves four function calls, and + need not be a built-in, if the data type is user-defined. And that might be happening for every row in a table with millions of rows. > (Alternatively, I think we can conclude that we assume parallel unsafe built-in functions won't be used in parallel DML. In that case, we don't change FmgrBuiltin and we just skip the parallel safety check for built-in functions when thefunction is called. Would you buy this?) I don't really understand this idea. There's no such thing as parallel DML, is there? There's just DML, which we must to decide whether can be done in parallel or not based on, among other things, the parallel-safety markings of the functions it contains. Maybe I am not understanding you correctly, but it seems like you're suggesting that in some cases we can just assume that the user hasn't done something parallel-unsafe without making any attempt to check it. I don't think I could support that. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, May 6, 2021 at 3:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > The idea here is to check for parallel safety of functions at > someplace in the code during function invocation so that if we execute > any parallel unsafe/restricted function via parallel worker then we > error out. I think that is a good safety net especially if we can do > it with some simple check. Now, we already have pg_proc information in > fmgr_info_cxt_security for non-built-in functions, so we can check > that and error out if the unsafe function is invoked in parallel mode. > It has been observed that we were calling some unsafe functions in > parallel-mode in the regression tests which is caught by such a check. I see your point, but I am not convinced. As I said to Tsunakawa-san, doing the check here seems expensive. Also, I had the idea in mind that parallel-safety should work like volatility. We don't check at runtime whether a volatile function is being called in a context where volatile functions are not supposed to be used. If for example you try to call a volatile function directly from an index expression I believe you will get an error. But if the index expression calls an immutable function and then that function internally calls something volatile, you don't get an error. Now it might not be a good idea: you could end up with a broken index. But that's your fault for mislabeling the function you used. Sometimes this is actually quite useful. You might know that, while the function is in general volatile, it is immutable in the particular way that you are using it. Or, perhaps, you are using the volatile function incidentally and it doesn't affect the output of your function at all. Or, maybe you actually want to build an index that might break, and then it's up to you to rebuild the index if and when that is required. Users do this kind of thing all the time, I think, and would be unhappy if we started checking it more rigorously than we do today. Now, I don't see why the same idea can't or shouldn't apply to parallel-safety. If you call a parallel-unsafe function in a parallel context, it's pretty likely that you are going to get an error, and so you might not want to do it. If the function is written in C, it could even cause horrible things to happen so that you crash the whole backend or something, but I tried to set things up so that for built-in functions you'll just get an error. But on the other hand, maybe the parallel-unsafe function you are calling is not parallel-unsafe in all cases. If you want to create a wrapper function that is labelled parallel-safe and try to make that it only calls the parallel-unsafe function in the cases where there's no safety problem, that's up to you! It's possible that I had the wrong idea here, so maybe the question deserves more thought, but I wanted to explain what my thought process was. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, May 6, 2021 at 5:26 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > Can anyone think of the need to check the parallel safety of built-in functions in the context of parallel INSERT SELECT? The planner already checks (or can check) the parallel safety of the SELECT part with max_parallel_hazard(). Regardingthe INSERT part, we're trying to rely on the parallel safety of the target table that the user specified with CREATE/ALTERTABLE. I don't see where we need to check the parallel safety of uilt-in functions. > Yes, I certainly can think of a reason to do this. The idea is, for the approach being discussed, is to allow the user to declare parallel-safety on a table, but then to catch any possible violations of this at runtime (as opposed to adding additional parallel-safety checks at planning time). So for INSERT with parallel SELECT for example (which runs in parallel-mode), then the execution of index expressions, column-default expressions, check constraints etc. may end up invoking functions (built-in or otherwise) that are NOT parallel-safe - so we could choose to error-out in this case when these violations are detected. As far as I can see, this checking of function parallel-safety can be done with little overhead to the current code - it already gets proc information from the system cache for non-built-in-functions, and for built-in functions it could store the parallel-safety status in FmgrBuiltin and simply get it from there (I don't think we should be allowing changes to built-in function properties - currently it is allowed, but it doesn't work properly). The other option is to just blindly trust the parallel-safety declaration on tables and whatever happens at runtime happens. Regards, Greg Nancarrow Fujitsu Australia
On Thu, May 6, 2021 at 4:35 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, May 6, 2021 at 3:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > The idea here is to check for parallel safety of functions at > > someplace in the code during function invocation so that if we execute > > any parallel unsafe/restricted function via parallel worker then we > > error out. I think that is a good safety net especially if we can do > > it with some simple check. Now, we already have pg_proc information in > > fmgr_info_cxt_security for non-built-in functions, so we can check > > that and error out if the unsafe function is invoked in parallel mode. > > It has been observed that we were calling some unsafe functions in > > parallel-mode in the regression tests which is caught by such a check. > > I see your point, but I am not convinced. As I said to Tsunakawa-san, > doing the check here seems expensive. > If I read your email correctly then you are saying it is expensive based on the idea that we need to perform extra syscache lookup but actually for non-built-in functions, we already have parallel-safety information so such a check should not incur a significant cost. > Also, I had the idea in mind > that parallel-safety should work like volatility. We don't check at > runtime whether a volatile function is being called in a context where > volatile functions are not supposed to be used. If for example you try > to call a volatile function directly from an index expression I > believe you will get an error. But if the index expression calls an > immutable function and then that function internally calls something > volatile, you don't get an error. Now it might not be a good idea: you > could end up with a broken index. But that's your fault for > mislabeling the function you used. > > Sometimes this is actually quite useful. You might know that, while > the function is in general volatile, it is immutable in the particular > way that you are using it. Or, perhaps, you are using the volatile > function incidentally and it doesn't affect the output of your > function at all. Or, maybe you actually want to build an index that > might break, and then it's up to you to rebuild the index if and when > that is required. Users do this kind of thing all the time, I think, > and would be unhappy if we started checking it more rigorously than we > do today. > > Now, I don't see why the same idea can't or shouldn't apply to > parallel-safety. If you call a parallel-unsafe function in a parallel > context, it's pretty likely that you are going to get an error, and so > you might not want to do it. If the function is written in C, it could > even cause horrible things to happen so that you crash the whole > backend or something, but I tried to set things up so that for > built-in functions you'll just get an error. But on the other hand, > maybe the parallel-unsafe function you are calling is not > parallel-unsafe in all cases. If you want to create a wrapper function > that is labelled parallel-safe and try to make that it only calls the > parallel-unsafe function in the cases where there's no safety problem, > that's up to you! > I think it is difficult to say for what purpose parallel-unsafe function got called in parallel context so if we give an error in cases where otherwise it could lead to a crash or caused other horrible things, users will probably appreciate us. OTOH, if the parallel-safety labeling is wrong (parallel-safe function is marked parallel-unsafe) and we gave an error in such a case, the user can always change the parallel-safety attribute by using Alter Function. Now, if adding such a check is costly or needs some major re-design then probably it might not be worth whereas I don't think that is the case for non-built-in function invocation. -- With Regards, Amit Kapila.
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"tsunakawa.takay@fujitsu.com"
Date:
From: Robert Haas <robertmhaas@gmail.com> > On Wed, May 5, 2021 at 10:54 PM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > As proposed in this thread and/or "Parallel INSERT SELECT take 2", we > thought of detecting parallel unsafe function execution during SQL statement > execution, instead of imposing much overhead to check parallel safety during > query planning. Specifically, we add parallel safety check in fmgr_info() > and/or FunctionCallInvoke(). > > I haven't read that thread, but I don't understand how that can work. > The reason we need to detect it at plan time is because we might need > to use a different plan. At execution time it's too late for that. (I forgot to say this in my previous email. Robert-san, thank you very much for taking time to look at this and giving feedback. It was sad that we had to revert our parallel INSERT SELECT for redesign at the very end of the last CF. We needadvice and suggestions from knowledgeable and thoughtful people like Tom-san, Andres-san and you in early stages to notrepeat the tragedy.) I'd really like you to have a look at the first mail in [1], and to get your feedback like "this part should be like ...instead" and "this part would probably work, I think." Without feedback from leading developers, I'm somewhat at a lossif and how we can proceed with the proposed approach. To put it shortly, we found that it can take non-negligible time for the planner to check the parallel safety of the targettable of INSERT SELECT when it has many (hundreds or thousands of) partitions. The check also added much complicatedcode, too. So, we got inclined to take Tom-san's suggestion -- let the user specify the parallel safety of thetarget table with CREATE/ALTER TABLE and the planner just decides a query plan based on it. Caching the results of parallelsafety checks in relcache or a new shared hash table didn't seem to work well to me, or it should be beyond my brainat least. We may think that it's okay to just believe the user-specified parallel safety. But I thought we could step up and do ourbest to check the parallel safety during statement execution, if it's not very invasive in terms of performance and codecomplexity. The aforementioned idea is that if the parallel processes find the called functions parallel unsafe, theyerror out. All ancillary objects of the target table, data types, constraints, indexes, triggers, etc., come down tosome UDF, so it should be enough to check the parallel safety when the UDF is called. > Also, it seems potentially quite expensive. A query may be planned > once and executed many times. Also, a single query execution may call > the same SQL function many times. I think we don't want to incur the > overhead of an extra syscache lookup every time anyone calls any > function. A very simple expression like a+b+c+d+e involves four > function calls, and + need not be a built-in, if the data type is > user-defined. And that might be happening for every row in a table > with millions of rows. We (optimistically) expect that the overhead won't be serious, because the parallel safety information is already at handin the FmgrInfo struct when the function is called. We don't have to look up the syscache every time the function iscalled. Of course, adding even a single if statement may lead to a disaster in a critical path, so we need to assess the performance. I'd also appreciate if you could suggest some good workload we should experiment in the thread above. [1] Parallel INSERT SELECT take 2 https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com Regards Takayuki Tsunakawa
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
> > Sometimes this is actually quite useful. You might know that, while > > the function is in general volatile, it is immutable in the particular > > way that you are using it. Or, perhaps, you are using the volatile > > function incidentally and it doesn't affect the output of your > > function at all. Or, maybe you actually want to build an index that > > might break, and then it's up to you to rebuild the index if and when > > that is required. Users do this kind of thing all the time, I think, > > and would be unhappy if we started checking it more rigorously than we > > do today. > > > > Now, I don't see why the same idea can't or shouldn't apply to > > parallel-safety. If you call a parallel-unsafe function in a parallel > > context, it's pretty likely that you are going to get an error, and so > > you might not want to do it. If the function is written in C, it could > > even cause horrible things to happen so that you crash the whole > > backend or something, but I tried to set things up so that for > > built-in functions you'll just get an error. But on the other hand, > > maybe the parallel-unsafe function you are calling is not > > parallel-unsafe in all cases. If you want to create a wrapper function > > that is labelled parallel-safe and try to make that it only calls the > > parallel-unsafe function in the cases where there's no safety problem, > > that's up to you! > > > > I think it is difficult to say for what purpose parallel-unsafe function got called in > parallel context so if we give an error in cases where otherwise it could lead to > a crash or caused other horrible things, users will probably appreciate us. > OTOH, if the parallel-safety labeling is wrong (parallel-safe function is marked > parallel-unsafe) and we gave an error in such a case, the user can always change > the parallel-safety attribute by using Alter Function. > Now, if adding such a check is costly or needs some major re-design then > probably it might not be worth whereas I don't think that is the case for > non-built-in function invocation. Temporarily, Just in case someone want to take a look at the patch for the safety check. I splited the patch into 0001(parallel safety check for user define function), 0003(parallel safety check for builtin function) and the fix for testcases. IMO, With such a check to give an error when detecting parallel unsafe function in parallel mode, it will be easier for users to discover potential threats(parallel unsafe function) in parallel mode. I think users is likely to invoke parallel unsafe function inner a parallel safe function unintentionally. Such a check can help they detect the problem easier. Although, the strict check limits some usages(intentionally wrapper function) like Robert-san said. To mitigate the effect of the limit, I was thinking can we do the safety check conditionally, such as only check the topfunction invoke and/or introduce a guc option to control whether do the strict parallel safety check? Thoughts ? Best regards, houzj
Attachment
On Tue, May 11, 2021 at 12:28 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > Temporarily, Just in case someone want to take a look at the patch for the safety check. > I am not sure still there is a consensus on which cases exactly need to be dealt with. Let me try to summarize the discussion and see if that helps. As per my understanding, the main reasons for this work are: a. Ensure parallel-unsafe functions don't get executed in parallel mode. We do have checks to ensure that we don't select parallel-mode for most cases where the parallel-unsafe function is used but we don't have checks for input/output funcs, aggregate funcs, etc. This proposal is to detect such cases during function invocation and return an error. I think if, for some cases like aggregate or another type of functions we allow selecting parallelism relying on the user, it is not a bad idea to detect and return an error if some parallel-unsafe function is executed in parallel mode. b. Detect wrong parallel-safety markings. Say the user has declared some function as parallel-safe but it invokes another parallel-unsafe function. c. The other motive is that this work can help us to enable parallelism for inserts (and maybe update/delete in the future). As being discussed in another thread [1], we are considering allowing parallel inserts on a table based on user input and then at runtime detect if the insert is invoking any parallel-unsafe expression. The idea is that the user will be able to specify whether a write operation is allowed in parallel on a specified relation and we allow to select parallelism for such writes based on that and do the checks for Select as we are doing now. There are other options like determine the parallel-safety of writes in a planner and then only allow parallelism but those all seem costly. Now, I think it is not compulsory to have such checks for this particular reason as we are relying on user input but it will be good if we have it. I think the last purpose (c) is still debatable even though we couldn't come up with anything better till now but even if leave that aside for now, I think the other reasons are good enough to have some form of checks. Now, the proposal being discussed is to add a parallel-safety check in fmgr_info which seems to be invoked during all function executions. We need to have access to proparallel attribute of the function to check the parallel-safety and that is readily available in fmgr_info for non-built-in functions because we already get the pg_proc information from sys cache. So, I guess there is no harm in checking it when the information is readily available. However, for built-in functions that information is not readily available as we get required function information from FmgrBuiltin (which doesn't have parallel-safety information). For built-in functions, the following options have been discussed: a. Extend FmgrBuiltin without increasing its size to include parallel information. b. Enquire pg_proc cache to get the information. Accessing this for each invocation of builtin could be costly. We can probably incur this cost only when built-in is invoked in parallel-mode. c. Don't add check for builtins. I think if we can't think of any other better way to have checks for builtins and don't like any of (a) or (b) then there is no harm in (c). This will at least allow us to have parallel-safety check for user-defined functions. Thoughts? [1] - https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
On Fri, Jun 4, 2021 at 6:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Thoughts? As far as I can see, trying to error out at function call time if the function is parallel-safe doesn't fix any problem we have, and just makes the design of this part of the system less consistent with what we've done elsewhere. For example, if you create a stable function that internally calls a volatile function, you don't get an error. You can use your stable function in an index definition if you wish. That may break, but if so, that's your problem. Also, when it breaks, it probably won't blow up the entire world; you'll just have a messed-up index. Currently, the parallel-safety stuff works the same way. If we notice that something is marked parallel-unsafe, we'll skip parallelism. But you can lie to us and claim that things are safe when they're not, and if you do, it may break, but that's your problem. Mostly likely your query will just error out, and there will be no worse consequences than that, though if your parallel-unsafe function is written in C, it could do horrible things like crash, which is unavoidable because C code can do anything. Now, the reason for all of this work, as I understand it, is because we want to enable parallel inserts, and the problem there is that a parallel insert could involve a lot of different things: it might need to compute expressions, or fire triggers, or check constraints, and any of those things could be parallel-unsafe. If we enable parallelism and then find out that we need to do to one of those things, we have a problem. Something probably will error out. The thing is, with this proposal, that issue is not solved. Something will definitely error out. You'll probably get the error in a different place, but nobody fires off an INSERT hoping to get one error message rather than another. What they want is for it to work. So I'm kind of confused how we ended up going in this direction which seems to me at least to be a tangent from the real issue, and somewhat at odds with the way the rest of PostgreSQL is designed. It seems to me that we could simply add a flag to each relation saying whether or not we think that INSERT operations - or perhaps DML operations generally - are believed to be parallel-safe for that relation. Like the marking on functions, it would be the user's responsibility to get that marking correct. If they don't, they might call a parallel-unsafe function in parallel mode, and that will probably error out. But that's no worse than what we already have in existing cases, so I don't see why it requires doing what's proposed here first. Now, it does have the advantage of being not very convenient for users, who, I'm sure, would prefer that the system figure out for them automatically whether or not parallel inserts are likely to be safe, rather than making them declare it, especially since presumably the default declaration would have to be "unsafe," as it is for functions. But I don't have a better idea right now. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jun 7, 2021 at 7:29 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Jun 4, 2021 at 6:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Thoughts? > > As far as I can see, trying to error out at function call time if the > function is parallel-safe doesn't fix any problem we have, and just > makes the design of this part of the system less consistent with what > we've done elsewhere. For example, if you create a stable function > that internally calls a volatile function, you don't get an error. You > can use your stable function in an index definition if you wish. That > may break, but if so, that's your problem. Also, when it breaks, it > probably won't blow up the entire world; you'll just have a messed-up > index. Currently, the parallel-safety stuff works the same way. If we > notice that something is marked parallel-unsafe, we'll skip > parallelism. > This is not true in all cases which is one of the reasons for this thread. For example, we don't skip parallelism when I/O functions are parallel-unsafe as is shown in the following case: postgres=# CREATE FUNCTION text_w_default_in(cstring) RETURNS text_w_default AS 'textin' LANGUAGE internal STRICT IMMUTABLE; NOTICE: type "text_w_default" is not yet defined DETAIL: Creating a shell type definition. CREATE FUNCTION postgres=# CREATE FUNCTION text_w_default_out(text_w_default) RETURNS cstring AS 'textout' LANGUAGE internal STRICT IMMUTABLE; NOTICE: argument type text_w_default is only a shell CREATE FUNCTION postgres=# CREATE TYPE text_w_default ( internallength = variable, input = text_w_default_in, output = text_w_default_out, alignment = int4, default = 'zippo'); CREATE TYPE postgres=# CREATE TABLE default_test (f1 text_w_default, f2 int); CREATE TABLE postgres=# INSERT INTO default_test DEFAULT VALUES; INSERT 0 1 postgres=# SELECT * FROM default_test; ERROR: parallel-safety execution violation of function "text_w_default_out" (u) Note the error is raised after applying the patch, without the patch, the above won't show any error (error message could be improved here). Such cases can lead to unpredictable behavior without a patch because we won't be able to detect the execution of parallel-unsafe functions. There are similar examples from regression tests. Now, one way to deal with similar cases could be that we document them and say we don't consider parallel-safety in such cases and the other way is to detect such cases and error out. Yet another way could be that we somehow try to check these cases as well before enabling parallelism but I thought these cases fall in the similar category as aggregate's support functions. > But you can lie to us and claim that things are safe when > they're not, and if you do, it may break, but that's your problem. > Mostly likely your query will just error out, and there will be no > worse consequences than that, though if your parallel-unsafe function > is written in C, it could do horrible things like crash, which is > unavoidable because C code can do anything. > That is true but I was worried for cases where users didn't lie to us but we still allowed those to choose parallelism. > Now, the reason for all of this work, as I understand it, is because > we want to enable parallel inserts, and the problem there is that a > parallel insert could involve a lot of different things: it might need > to compute expressions, or fire triggers, or check constraints, and > any of those things could be parallel-unsafe. If we enable parallelism > and then find out that we need to do to one of those things, we have a > problem. Something probably will error out. The thing is, with this > proposal, that issue is not solved. Something will definitely error > out. You'll probably get the error in a different place, but nobody > fires off an INSERT hoping to get one error message rather than > another. What they want is for it to work. So I'm kind of confused how > we ended up going in this direction which seems to me at least to be a > tangent from the real issue, and somewhat at odds with the way the > rest of PostgreSQL is designed. > > It seems to me that we could simply add a flag to each relation saying > whether or not we think that INSERT operations - or perhaps DML > operations generally - are believed to be parallel-safe for that > relation. > This is exactly the direction we are trying to pursue. The proposal [1] has semantics like: CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE }; ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE }; This property is recorded in pg_class's relparallel column as 'u', 'r', or 's', just like pg_proc's proparallel. The default is UNSAFE. This might require some bike-shedding to decide how exactly we want to expose it to the user but I think it is on the lines of what you have described here. > Like the marking on functions, it would be the user's > responsibility to get that marking correct. If they don't, they might > call a parallel-unsafe function in parallel mode, and that will > probably error out. But that's no worse than what we already have in > existing cases, so I don't see why it requires doing what's proposed > here first. > I agree it is not necessarily required if we give the responsibility to the user but this might give a better user experience, OTOH, without this as well, as you said it won't be any worse than current behavior. But that was not the sole motivation of this proposal as explained above in the email by giving example. > Now, it does have the advantage of being not very > convenient for users, who, I'm sure, would prefer that the system > figure out for them automatically whether or not parallel inserts are > likely to be safe, rather than making them declare it, especially > since presumably the default declaration would have to be "unsafe," as > it is for functions. > To improve the user experience in this regard, the proposal [1] provides a function pg_get_parallel_safety(oid) using which users can determine whether it is safe to enable parallelism. Surely, after the user has checked with that function, one can add some unsafe constraints to the table by altering the table but it will still be an aid to enable parallelism on a relation. [1] - https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
On Mon, Jun 7, 2021 at 11:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Note the error is raised after applying the patch, without the patch, > the above won't show any error (error message could be improved here). > Such cases can lead to unpredictable behavior without a patch because > we won't be able to detect the execution of parallel-unsafe functions. > There are similar examples from regression tests. Now, one way to deal > with similar cases could be that we document them and say we don't > consider parallel-safety in such cases and the other way is to detect > such cases and error out. Yet another way could be that we somehow try > to check these cases as well before enabling parallelism but I thought > these cases fall in the similar category as aggregate's support > functions. I'm not very excited about the idea of checking type input and type output functions. It's hard to imagine someone wanting to do something parallel-unsafe in such a function, unless they're just trying to prove a point. So I don't think checking it would be a good investment of CPU cycles. If we do anything at all, I'd vote for just documenting that such functions should be parallel-safe and that their parallel-safety marks are not checked when they are used as type input/output functions. Perhaps we ought to document the same thing with regard to opclass support functions, another place where it's hard to imagine a realistic use case for doing something parallel-unsafe. In the case of aggregates, I see the issues slightly differently. I don't know that it's super-likely that someone would want to create a parallel-unsafe aggregate function, but I think there should be a way to do it, just in case. However, if somebody wants that, they can just mark the aggregate itself unsafe. There's no benefit for the user to marking the aggregate safe and the support functions unsafe and hoping that the system figures it out somehow. In my opinion, you're basically taking too pure a view of this. We're not trying to create a system that does such a good job checking parallel safety markings that nobody can possibly find a thing that isn't checked no matter how hard they poke around the dark corners of the system. Or at least we shouldn't be trying to do that. We should be trying to create a system that works well in practice, and gives people the flexibility to easily avoid parallelism when they have a query that is parallel-unsafe, while still getting the benefit of parallelism the rest of the time. I don't know what all the cases you've uncovered are, and maybe there's something in there that I'd be more excited about changing if I knew what it was, but the particular problems you're mentioning here seem more theoretical than real to me. -- Robert Haas EDB: http://www.enterprisedb.com
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, June 8, 2021 10:51 PM Robert Haas <robertmhaas@gmail.com> > On Mon, Jun 7, 2021 at 11:33 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > Note the error is raised after applying the patch, without the patch, > > the above won't show any error (error message could be improved here). > > Such cases can lead to unpredictable behavior without a patch because > > we won't be able to detect the execution of parallel-unsafe functions. > > There are similar examples from regression tests. Now, one way to deal > > with similar cases could be that we document them and say we don't > > consider parallel-safety in such cases and the other way is to detect > > such cases and error out. Yet another way could be that we somehow try > > to check these cases as well before enabling parallelism but I thought > > these cases fall in the similar category as aggregate's support > > functions. > > I'm not very excited about the idea of checking type input and type output > functions. It's hard to imagine someone wanting to do something > parallel-unsafe in such a function, unless they're just trying to prove a point. So > I don't think checking it would be a good investment of CPU cycles. If we do > anything at all, I'd vote for just documenting that such functions should be > parallel-safe and that their parallel-safety marks are not checked when they are > used as type input/output functions. Perhaps we ought to document the same > thing with regard to opclass support functions, another place where it's hard to > imagine a realistic use case for doing something parallel-unsafe. > > In the case of aggregates, I see the issues slightly differently. I don't know that > it's super-likely that someone would want to create a parallel-unsafe > aggregate function, but I think there should be a way to do it, just in case. > However, if somebody wants that, they can just mark the aggregate itself > unsafe. There's no benefit for the user to marking the aggregate safe and the > support functions unsafe and hoping that the system figures it out somehow. > > In my opinion, you're basically taking too pure a view of this. We're not trying to > create a system that does such a good job checking parallel safety markings > that nobody can possibly find a thing that isn't checked no matter how hard > they poke around the dark corners of the system. Or at least we shouldn't be > trying to do that. We should be trying to create a system that works well in > practice, and gives people the flexibility to easily avoid parallelism when they > have a query that is parallel-unsafe, while still getting the benefit of parallelism > the rest of the time. > > I don't know what all the cases you've uncovered are, and maybe there's > something in there that I'd be more excited about changing if I knew what it > was, but the particular problems you're mentioning here seem more > theoretical than real to me. I think another case that parallel unsafe function could be invoked in parallel mode is the TEXT SEARCH TEMPLATE's init_function or lexize_function. Because currently, the planner does not check the safety of these function. Please see the example below[1] I am not sure will user use parallel unsafe function in init_function or lexize_function, but if user does, it could cause unexpected result. Does it make sense to add some check for init_function or lexize_function or document this together with type input/output and opclass support functions ? [1]----------------------------EXAMPLE------------------------------------ CREATE FUNCTION dsnowball_init(INTERNAL) RETURNS INTERNAL AS '$libdir/dict_snowball', 'dsnowball_init' LANGUAGE C STRICT; CREATE FUNCTION dsnowball_lexize(INTERNAL, INTERNAL, INTERNAL, INTERNAL) RETURNS INTERNAL AS '$libdir/dict_snowball', 'dsnowball_lexize' LANGUAGE C STRICT; CREATE TEXT SEARCH TEMPLATE snowball (INIT = dsnowball_init, LEXIZE = dsnowball_lexize); COMMENT ON TEXT SEARCH TEMPLATE snowball IS 'snowball stemmer'; create table pendtest (ts tsvector); create index pendtest_idx on pendtest using gin(ts); insert into pendtest select (to_tsvector('Lore ipsum')) from generate_series(1,10000000,1); analyze; set enable_bitmapscan = off; postgres=# explain select * from pendtest where to_tsquery('345&qwerty') @@ ts; QUERY PLAN -------------------------------------------------------------------------------- Gather (cost=1000.00..1168292.86 rows=250 width=31) Workers Planned: 2 -> Parallel Seq Scan on pendtest (cost=0.00..1167267.86 rows=104 width=31) Filter: (to_tsquery('345&qwerty'::text) @@ ts) -- In the example above, dsnowball_init() and dsnowball_lexize() will be executed in parallel mode. ----------------------------EXAMPLE------------------------------------ Best regards, houzj
"houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com> writes: > On Tuesday, June 8, 2021 10:51 PM Robert Haas <robertmhaas@gmail.com> > wrote: >> In my opinion, you're basically taking too pure a view of this. We're >> not trying to create a system that does such a good job checking >> parallel safety markings that nobody can possibly find a thing that >> isn't checked no matter how hard they poke around the dark corners of >> the system. Or at least we shouldn't be trying to do that. > I think another case that parallel unsafe function could be invoked in > parallel mode is the TEXT SEARCH TEMPLATE's init_function or > lexize_function. Another point worth making in this connection is what I cited earlier today in ba2c6d6ce: : ... We could imagine prohibiting SCROLL when : the query contains volatile functions, but that would be : expensive to enforce. Moreover, it could break applications : that work just fine, if they have functions that are in fact : stable but the user neglected to mark them so. So settle for : documenting the hazard. If you break an application that used to work, because the developer was careless about marking a function PARALLEL SAFE even though it actually is, I do not think you have made any friends or improved anyone's life. In fact, you could easily make things far worse, by encouraging people to mark things PARALLEL SAFE that are not. (We just had a thread about somebody marking a function immutable because they wanted effect X of that, and then whining because they also got effect Y.) There are specific cases where there's a good reason to worry. For example, if we assume blindly that domain_in() is parallel safe, we will have cause to regret that. But I don't find that to be a reason why we need to lock down everything everywhere. We need to understand the tradeoffs involved in what we check, and apply checks that are likely to avoid problems, while not being too nanny-ish. regards, tom lane
On Wed, Jun 9, 2021 at 2:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > There are specific cases where there's a good reason to worry. > For example, if we assume blindly that domain_in() is parallel > safe, we will have cause to regret that. But I don't find that > to be a reason why we need to lock down everything everywhere. > We need to understand the tradeoffs involved in what we check, > and apply checks that are likely to avoid problems, while not > being too nanny-ish. Yeah, that's exactly how I feel about it, too. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Jun 9, 2021 at 9:47 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Jun 9, 2021 at 2:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > There are specific cases where there's a good reason to worry. > > For example, if we assume blindly that domain_in() is parallel > > safe, we will have cause to regret that. But I don't find that > > to be a reason why we need to lock down everything everywhere. > > We need to understand the tradeoffs involved in what we check, > > and apply checks that are likely to avoid problems, while not > > being too nanny-ish. > > Yeah, that's exactly how I feel about it, too. > Fair enough. So, I think there is a consensus to drop this patch and if one wants then we can document these cases. Also, we don't want it to enable parallelism for Inserts where we are trying to pursue the approach to have a flag in pg_class which allows users to specify whether writes are allowed on a specified relation. -- With Regards, Amit Kapila.
On Thu, Jun 10, 2021 at 12:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Fair enough. So, I think there is a consensus to drop this patch and > if one wants then we can document these cases. Also, we don't want it > to enable parallelism for Inserts where we are trying to pursue the > approach to have a flag in pg_class which allows users to specify > whether writes are allowed on a specified relation. +1. The question that's still on my mind a little bit is whether there's a reasonable alternative to forcing users to set a flag manually. It seems less convenient than having to do the same thing for a function, because most users probably only create functions occasionally, but creating tables seems like it's likely to be a more common operation. Plus, a function is basically a program, so it sort of feels reasonable that you might need to give the system some hints about what the program does, but that doesn't apply to a table. Now, if we forget about partitioned tables here for a moment, I don't really see why we couldn't do this computation based on the relcache entry, and then just cache the flag there? I think anything that would change the state for a plain old table would also cause some invalidation that we could notice. And I don't think that the cost of walking over triggers, constraints, etc. and computing the value we need on demand would be exorbitant. For a partitioned table, things are a lot more difficult. For one thing, the cost of computation can be a lot higher; there might be a thousand or more partitions. For another thing, computing the value could have scary side effects, like opening all the partitions, which would also mean taking locks on them and building expensive relcache entries. For a third thing, we'd have no way of knowing whether the value was still current, because an event that produces an invalidation for a partition doesn't necessarily produce any invalidation for the partitioned table. So one idea is maybe we only need an explicit flag for partitioned tables, and regular tables we can just work it out automatically. Another idea is maybe we try to solve the problems somehow so that it can also work with partitioned tables. I don't really have a great idea right at the moment, but maybe it's worth devoting some more thought to the problem. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Jun 10, 2021 at 10:59 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Jun 10, 2021 at 12:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Fair enough. So, I think there is a consensus to drop this patch and > > if one wants then we can document these cases. Also, we don't want it > > to enable parallelism for Inserts where we are trying to pursue the > > approach to have a flag in pg_class which allows users to specify > > whether writes are allowed on a specified relation. > > +1. The question that's still on my mind a little bit is whether > there's a reasonable alternative to forcing users to set a flag > manually. It seems less convenient than having to do the same thing > for a function, because most users probably only create functions > occasionally, but creating tables seems like it's likely to be a more > common operation. Plus, a function is basically a program, so it sort > of feels reasonable that you might need to give the system some hints > about what the program does, but that doesn't apply to a table. > > Now, if we forget about partitioned tables here for a moment, I don't > really see why we couldn't do this computation based on the relcache > entry, and then just cache the flag there? > Do we invalidate relcache entry if someone changes say trigger or some index AM function property via Alter Function (in our case from safe to unsafe or vice-versa)? Tsunakawa-San has mentioned this as the reason in his email [1] why we can't rely on caching this property in relcache entry. I also don't see anything in AlterFunction which would suggest that we invalidate the relation with which the function might be associated via trigger. The other idea in this regard was to validate the parallel safety during DDL instead of relying completely on the user but that also seems to have similar hazards as pointed by Tom in his email [2]. I think it would be good if there is a way we can do this without asking for user input but if not then we can try to provide parallel-safety info about relation which will slightly ease the user's job. Such a function would check relation (and its partitions) to see if there exists any parallel-unsafe clause and accordingly return the same to the user. Now, again if the user changes the parallel-safe property later we won't be able to automatically reflect the same for rel. [1] - https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/1030301.1616560249%40sss.pgh.pa.us -- With Regards, Amit Kapila.
On Fri, Jun 11, 2021 at 12:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Do we invalidate relcache entry if someone changes say trigger or some > index AM function property via Alter Function (in our case from safe > to unsafe or vice-versa)? Tsunakawa-San has mentioned this as the > reason in his email [1] why we can't rely on caching this property in > relcache entry. I also don't see anything in AlterFunction which would > suggest that we invalidate the relation with which the function might > be associated via trigger. Hmm. I am not sure index that AM functions really need to be checked, but triggers certainly do. I think if you are correct that an ALTER FUNCTION wouldn't invalidate the relcache entry, which is I guess pretty much the same problem Tom was pointing out in the thread to which you linked. But ... thinking out of the box as Tom suggests, what if we came up with some new kind of invalidation message that is only sent when a function's parallel-safety marking is changed? And every backend in the same database then needs to re-evaluate the parallel-safety of every relation for which it has cached a value. Such recomputations might be expensive, but they would probably also occur very infrequently. And you might even be able to make it a bit more fine-grained if it's worth the effort to worry about that: say that in addition to caching the parallel-safety of the relation, we also cache a list of the pg_proc OIDs upon which that determination depends. Then when we hear that the flag's been changed for OID 123456, we only need to invalidate the cached value for relations that depended on that pg_proc entry. There are ways that a relation could become parallel-unsafe without changing the parallel-safety marking of any function, but perhaps all of the other ways involve a relcache invalidation? Just brainstorming here. I might be off-track. -- Robert Haas EDB: http://www.enterprisedb.com
On Sat, Jun 12, 2021 at 1:56 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Jun 11, 2021 at 12:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Do we invalidate relcache entry if someone changes say trigger or some > > index AM function property via Alter Function (in our case from safe > > to unsafe or vice-versa)? Tsunakawa-San has mentioned this as the > > reason in his email [1] why we can't rely on caching this property in > > relcache entry. I also don't see anything in AlterFunction which would > > suggest that we invalidate the relation with which the function might > > be associated via trigger. > > Hmm. I am not sure index that AM functions really need to be checked, > but triggers certainly do. > Why do you think we don't need to check index AM functions? Say we have an index expression that uses function and if its parallel safety is changed then probably that also impacts whether we can do insert in parallel. Because otherwise, we will end up executing some parallel unsafe function in parallel mode during index insertion. > I think if you are correct that an ALTER > FUNCTION wouldn't invalidate the relcache entry, which is I guess > pretty much the same problem Tom was pointing out in the thread to > which you linked. > > But ... thinking out of the box as Tom suggests, what if we came up > with some new kind of invalidation message that is only sent when a > function's parallel-safety marking is changed? And every backend in > the same database then needs to re-evaluate the parallel-safety of > every relation for which it has cached a value. Such recomputations > might be expensive, but they would probably also occur very > infrequently. And you might even be able to make it a bit more > fine-grained if it's worth the effort to worry about that: say that in > addition to caching the parallel-safety of the relation, we also cache > a list of the pg_proc OIDs upon which that determination depends. Then > when we hear that the flag's been changed for OID 123456, we only need > to invalidate the cached value for relations that depended on that > pg_proc entry. > Yeah, this could be one idea but I think even if we use pg_proc OID, we still need to check all the rel cache entries to find which one contains the invalidated OID and that could be expensive. I wonder can't we directly find the relation involved and register invalidation for the same? We are able to find the relation to which trigger function is associated during drop function via findDependentObjects by scanning pg_depend. Assuming, we are able to find the relation for trigger function by scanning pg_depend, what kinds of problems do we envision in registering the invalidation for the same? I think we probably need to worry about the additional cost to find dependent objects and if there are any race conditions in doing so as pointed out by Tom in his email [1]. The concern related to cost could be addressed by your idea of registering such an invalidation only when the user changes the parallel safety of the function which we don't expect to be a frequent operation. Now, here the race condition, I could think of could be that by the time we change parallel-safety (say making it unsafe) of a function, some of the other sessions might have already started processing insert on a relation where that function is associated via trigger or check constraint in which case there could be a problem. I think to avoid that we need to acquire an Exclusive lock on the relation as we are doing in Rename Policy kind of operations. > There are ways that a relation could become > parallel-unsafe without changing the parallel-safety marking of any > function, but perhaps all of the other ways involve a relcache > invalidation? > Probably, but I guess we can once investigate/test those cases as well if we find/agree on the solution for the functions stuff. [1] - https://www.postgresql.org/message-id/1030301.1616560249%40sss.pgh.pa.us -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: > Why do you think we don't need to check index AM functions? Primarily because index AMs and opclasses can only be defined by superusers, and the superuser is assumed to know what she's doing. More generally, we've never made any provisions for the properties of index AMs or opclasses to change on-the-fly. I doubt that doing so could possibly be justified on a cost-benefit basis. regards, tom lane
On Mon, Jun 14, 2021 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Why do you think we don't need to check index AM functions? Say we > have an index expression that uses function and if its parallel safety > is changed then probably that also impacts whether we can do insert in > parallel. Because otherwise, we will end up executing some parallel > unsafe function in parallel mode during index insertion. I'm not saying that we don't need to check index expressions. I agree that we need to check those. The index AM functions are things like btint4cmp(). I don't think that a function like that should ever be parallel-unsafe. > Yeah, this could be one idea but I think even if we use pg_proc OID, > we still need to check all the rel cache entries to find which one > contains the invalidated OID and that could be expensive. I wonder > can't we directly find the relation involved and register invalidation > for the same? We are able to find the relation to which trigger > function is associated during drop function via findDependentObjects > by scanning pg_depend. Assuming, we are able to find the relation for > trigger function by scanning pg_depend, what kinds of problems do we > envision in registering the invalidation for the same? I don't think that finding the relation involved and registering an invalidation for the same will work properly. Suppose there is a concurrently-running transaction which has created a new table and attached a trigger function to it. You can't see any of the catalog entries for that relation yet, so you can't invalidate it, but invalidation needs to happen. Even if you used some snapshot that can see those catalog entries before they are committed, I doubt it fixes the race condition. You can't hold any lock on that relation, because the creating transaction holds AccessExclusiveLock, but the whole invalidation mechanism is built around the assumption that the sender puts messages into the shared queue first and then releases locks, while the receiver first acquires a conflicting lock and then processes messages from the queue. Without locks, that synchronization algorithm can't work reliably. As a consequence of all that, I believe that, not just in this particular case but in general, the invalidation message needs to describe the thing that has actually changed, rather than any derived property. We can make invalidations that say "some function's parallel-safety flag has changed" or "this particular function's parallel-safety flag has changed" or "this particular function has changed in some way" (this one, we have already), but anything that involves trying to figure out what the consequences of such a change might be and saying "hey, you, please update XYZ because I changed something somewhere that could affect that" is not going to be correct. > I think we probably need to worry about the additional cost to find > dependent objects and if there are any race conditions in doing so as > pointed out by Tom in his email [1]. The concern related to cost could > be addressed by your idea of registering such an invalidation only > when the user changes the parallel safety of the function which we > don't expect to be a frequent operation. Now, here the race condition, > I could think of could be that by the time we change parallel-safety > (say making it unsafe) of a function, some of the other sessions might > have already started processing insert on a relation where that > function is associated via trigger or check constraint in which case > there could be a problem. I think to avoid that we need to acquire an > Exclusive lock on the relation as we are doing in Rename Policy kind > of operations. Well, the big issue here is that we don't actually lock functions while they are in use. So there's absolutely nothing that prevents a function from being altered in any arbitrary way, or even dropped, while code that uses it is running. I don't really know what happens in practice if you do that sort of thing: can you get the same query to run with one function definition for the first part of execution and some other definition for the rest of execution? I tend to doubt it, because I suspect we cache the function definition at some point. If that's the case, caching the parallel-safety marking at the same point seems OK too, or at least no worse than what we're doing already. But on the other hand if it is possible for a query's notion of the function definition to shift while the query is in flight, then this is just another example of that and no worse than any other. Instead of changing the parallel-safety flag, somebody could redefine the function so that it divides by zero or produces a syntax error and kaboom, running queries break. Either way, I don't see what the big deal is. As long as we make the handling of parallel-safety consistent with other ways the function could be concurrently redefined, it won't suck any more than the current system already does, or in any fundamentally new ways. Even if this line of thinking is correct, there's a big issue for partitioning hierarchies because there you need to know stuff about relations that you don't have any other reason to open. I'm just arguing that if there's no partitioning, the problem seems reasonably solvable. Either you changed something about the relation, in which case you've got to lock it and issue invalidations, or you've changed something about the function, which could be handled via a new type of invalidation. I don't really see why the cost would be particularly bad. Suppose that for every relation, you have a flag which is either PARALLEL_DML_SAFE, PARALLEL_DML_RESTRICTED, PARALLEL_DML_UNSAFE, or PARALLEL_DML_SAFETY_UNKNOWN. When someone sends a message saying "some existing function's parallel-safety changed!" you reset that flag for every relation in the relcache to PARALLEL_DML_SAFETY_UNKNOWN. Then if somebody does DML on that relation and we want to consider parallelism, it's got to recompute that flag. None of that sounds horribly expensive. I mean, it could be somewhat annoying if you have 100k relations open and sit around all day flipping parallel-safety markings on and off and then doing a single-row insert after each flip, but if that's the only scenario where we incur significant extra overhead from this kind of design, it seems clearly better than forcing users to set a flag manually. Maybe it isn't, but I don't really see what the other problem would be right now. Except, of course, for partitioning, which I'm not quite sure what to do about. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jun 14, 2021 at 9:08 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jun 14, 2021 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Why do you think we don't need to check index AM functions? Say we > > have an index expression that uses function and if its parallel safety > > is changed then probably that also impacts whether we can do insert in > > parallel. Because otherwise, we will end up executing some parallel > > unsafe function in parallel mode during index insertion. > > I'm not saying that we don't need to check index expressions. I agree > that we need to check those. The index AM functions are things like > btint4cmp(). I don't think that a function like that should ever be > parallel-unsafe. > Okay, but I think if we go with your suggested model where whenever there is a change in parallel-safety of any function, we need to send the new invalidation then I think it won't matter whether the function is associated with index expression, check constraint in the table, or is used in any other way. > > Even if this line of thinking is correct, there's a big issue for > partitioning hierarchies because there you need to know stuff about > relations that you don't have any other reason to open. I'm just > arguing that if there's no partitioning, the problem seems reasonably > solvable. Either you changed something about the relation, in which > case you've got to lock it and issue invalidations, or you've changed > something about the function, which could be handled via a new type of > invalidation. I don't really see why the cost would be particularly > bad. Suppose that for every relation, you have a flag which is either > PARALLEL_DML_SAFE, PARALLEL_DML_RESTRICTED, PARALLEL_DML_UNSAFE, or > PARALLEL_DML_SAFETY_UNKNOWN. When someone sends a message saying "some > existing function's parallel-safety changed!" you reset that flag for > every relation in the relcache to PARALLEL_DML_SAFETY_UNKNOWN. Then if > somebody does DML on that relation and we want to consider > parallelism, it's got to recompute that flag. None of that sounds > horribly expensive. > Sounds reasonable. I will think more on this and see if anything else comes to mind apart from what you have mentioned. > I mean, it could be somewhat annoying if you have 100k relations open > and sit around all day flipping parallel-safety markings on and off > and then doing a single-row insert after each flip, but if that's the > only scenario where we incur significant extra overhead from this kind > of design, it seems clearly better than forcing users to set a flag > manually. Maybe it isn't, but I don't really see what the other > problem would be right now. Except, of course, for partitioning, which > I'm not quite sure what to do about. > Yeah, dealing with partitioned tables is tricky. I think if we don't want to check upfront the parallel safety of all the partitions then the other option as discussed could be to ask the user to specify the parallel safety of partitioned tables. We can additionally check the parallel safety of partitions when we are trying to insert into a particular partition and error out if we detect any parallel-unsafe clause and we are in parallel-mode. So, in this case, we won't be completely relying on the users. Users can either change the parallel safe option of the table or remove/change the parallel-unsafe clause after error. The new invalidation message as we are discussing would invalidate the parallel-safety for individual partitions but not the root partition (partitioned table itself). For root partition, we will rely on information specified by the user. I am not sure if we have a simple way to check the parallel safety of partitioned tables. In some way, we need to rely on user either (a) by providing an option to specify whether parallel Inserts (and/or other DMLs) can be performed, or (b) by providing a guc and/or rel option which indicate that we can check the parallel-safety of all the partitions. Yet another option that I don't like could be to parallelize inserts on non-partitioned tables. -- With Regards, Amit Kapila.
On Tue, Jun 15, 2021 at 7:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Okay, but I think if we go with your suggested model where whenever > there is a change in parallel-safety of any function, we need to send > the new invalidation then I think it won't matter whether the function > is associated with index expression, check constraint in the table, or > is used in any other way. No, it will still matter, because I'm proposing that the parallel-safety of functions that we only access via operator classes will not even be checked. Also, if we decided to make the system more fine-grained - e.g. by invalidating on the specific OID of the function that was changed rather than doing something that is database-wide or global - then it matters even more. > Yeah, dealing with partitioned tables is tricky. I think if we don't > want to check upfront the parallel safety of all the partitions then > the other option as discussed could be to ask the user to specify the > parallel safety of partitioned tables. Just to be clear here, I don't think it really matters what we *want* to do. I don't think it's reasonably *possible* to check all the partitions, because we don't hold locks on them. When we're assessing a bunch of stuff related to an individual relation, we have a lock on it. I think - though we should double-check tablecmds.c - that this is enough to prevent all of the dependent objects - triggers, constraints, etc. - from changing. So the stuff we care about is stable. But the situation with a partitioned table is different. In that case, we can't even examine that stuff without locking all the partitions. And even if we do lock all the partitions, the stuff could change immediately afterward and we wouldn't know. So I think it would be difficult to make it correct. Now, maybe it could be done, and I think that's worth a little more thought. For example, perhaps whenever we invalidate a relation, we could also somehow send some new, special kind of invalidation for its parent saying, essentially, "hey, one of your children has changed in a way you might care about." But that's not good enough, because it only goes up one level. The grandparent would still be unaware that a change it potentially cares about has occurred someplace down in the partitioning hierarchy. That seems hard to patch up, again because of the locking rules. The child can know the OID of its parent without locking the parent, but it can't know the OID of its grandparent without locking its parent. Walking up the whole partitioning hierarchy might be an issue for a number of reasons, including possible deadlocks, and possible race conditions where we don't emit all of the right invalidations in the face of concurrent changes. So I don't quite see a way around this part of the problem, but I may well be missing something. In fact I hope I am missing something, because solving this problem would be really nice. > We can additionally check the > parallel safety of partitions when we are trying to insert into a > particular partition and error out if we detect any parallel-unsafe > clause and we are in parallel-mode. So, in this case, we won't be > completely relying on the users. Users can either change the parallel > safe option of the table or remove/change the parallel-unsafe clause > after error. The new invalidation message as we are discussing would > invalidate the parallel-safety for individual partitions but not the > root partition (partitioned table itself). For root partition, we will > rely on information specified by the user. Yeah, that may be the best we can do. Just to be clear, I think we would want to check whether the relation is still parallel-safe at the start of the operation, but not have a run-time check at each function call. > I am not sure if we have a simple way to check the parallel safety of > partitioned tables. In some way, we need to rely on user either (a) by > providing an option to specify whether parallel Inserts (and/or other > DMLs) can be performed, or (b) by providing a guc and/or rel option > which indicate that we can check the parallel-safety of all the > partitions. Yet another option that I don't like could be to > parallelize inserts on non-partitioned tables. If we figure out a way to check the partitions automatically that actually works, we don't need a switch for it; we can (and should) just do it that way all the time. But if we can't come up with a correct algorithm for that, then we'll need to add some kind of option where the user declares whether it's OK. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jun 14, 2021 at 9:08 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jun 14, 2021 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Yeah, this could be one idea but I think even if we use pg_proc OID, > > we still need to check all the rel cache entries to find which one > > contains the invalidated OID and that could be expensive. I wonder > > can't we directly find the relation involved and register invalidation > > for the same? We are able to find the relation to which trigger > > function is associated during drop function via findDependentObjects > > by scanning pg_depend. Assuming, we are able to find the relation for > > trigger function by scanning pg_depend, what kinds of problems do we > > envision in registering the invalidation for the same? > > I don't think that finding the relation involved and registering an > invalidation for the same will work properly. Suppose there is a > concurrently-running transaction which has created a new table and > attached a trigger function to it. You can't see any of the catalog > entries for that relation yet, so you can't invalidate it, but > invalidation needs to happen. Even if you used some snapshot that can > see those catalog entries before they are committed, I doubt it fixes > the race condition. You can't hold any lock on that relation, because > the creating transaction holds AccessExclusiveLock, but the whole > invalidation mechanism is built around the assumption that the sender > puts messages into the shared queue first and then releases locks, > while the receiver first acquires a conflicting lock and then > processes messages from the queue. > Won't such messages be proceesed at start transaction time (AtStart_Cache->AcceptInvalidationMessages)? > Without locks, that synchronization > algorithm can't work reliably. As a consequence of all that, I believe > that, not just in this particular case but in general, the > invalidation message needs to describe the thing that has actually > changed, rather than any derived property. We can make invalidations > that say "some function's parallel-safety flag has changed" or "this > particular function's parallel-safety flag has changed" or "this > particular function has changed in some way" (this one, we have > already), but anything that involves trying to figure out what the > consequences of such a change might be and saying "hey, you, please > update XYZ because I changed something somewhere that could affect > that" is not going to be correct. > > > I think we probably need to worry about the additional cost to find > > dependent objects and if there are any race conditions in doing so as > > pointed out by Tom in his email [1]. The concern related to cost could > > be addressed by your idea of registering such an invalidation only > > when the user changes the parallel safety of the function which we > > don't expect to be a frequent operation. Now, here the race condition, > > I could think of could be that by the time we change parallel-safety > > (say making it unsafe) of a function, some of the other sessions might > > have already started processing insert on a relation where that > > function is associated via trigger or check constraint in which case > > there could be a problem. I think to avoid that we need to acquire an > > Exclusive lock on the relation as we are doing in Rename Policy kind > > of operations. > > Well, the big issue here is that we don't actually lock functions > while they are in use. So there's absolutely nothing that prevents a > function from being altered in any arbitrary way, or even dropped, > while code that uses it is running. I don't really know what happens > in practice if you do that sort of thing: can you get the same query > to run with one function definition for the first part of execution > and some other definition for the rest of execution? I tend to doubt > it, because I suspect we cache the function definition at some point. > It is possible that in the same statement execution a different function definition can be executed. Say, in session-1 we are inserting three rows, on first-row execution definition-1 of function in index expression gets executed. Now, from session-2, we change the definition of the function to definition-2. Now, in session-1, on second-row insertion, while executing definition-1 of function, we insert in another table that will accept the invalidation message registered in session-2. Now, on third-row insertion, the new definition (definition-2) of function will be executed. > If that's the case, caching the parallel-safety marking at the same > point seems OK too, or at least no worse than what we're doing > already. But on the other hand if it is possible for a query's notion > of the function definition to shift while the query is in flight, then > this is just another example of that and no worse than any other. > Instead of changing the parallel-safety flag, somebody could redefine > the function so that it divides by zero or produces a syntax error and > kaboom, running queries break. Either way, I don't see what the big > deal is. As long as we make the handling of parallel-safety consistent > with other ways the function could be concurrently redefined, it won't > suck any more than the current system already does, or in any > fundamentally new ways. > Okay, so, in this scheme, we have allowed changing the function definition during statement execution but even though the rel's parallel-safe property gets modified (say to parallel-unsafe), we will still proceed in parallel-mode as if it's not changed. I guess this may not be a big deal as we can anyway allow breaking the running statement by changing its definition and users may be okay if the parallel statement errors out or behave in an unpredictable way in such corner cases. -- With Regards, Amit Kapila.
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 15, 2021 at 7:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Yeah, dealing with partitioned tables is tricky. I think if we don't > > want to check upfront the parallel safety of all the partitions then > > the other option as discussed could be to ask the user to specify the > > parallel safety of partitioned tables. > > Just to be clear here, I don't think it really matters what we *want* to do. I don't > think it's reasonably *possible* to check all the partitions, because we don't > hold locks on them. When we're assessing a bunch of stuff related to an > individual relation, we have a lock on it. I think - though we should > double-check tablecmds.c - that this is enough to prevent all of the dependent > objects - triggers, constraints, etc. - from changing. So the stuff we care about > is stable. But the situation with a partitioned table is different. In that case, we > can't even examine that stuff without locking all the partitions. And even if we > do lock all the partitions, the stuff could change immediately afterward and we > wouldn't know. So I think it would be difficult to make it correct. > > Now, maybe it could be done, and I think that's worth a little more thought. For > example, perhaps whenever we invalidate a relation, we could also somehow > send some new, special kind of invalidation for its parent saying, essentially, > "hey, one of your children has changed in a way you might care about." But > that's not good enough, because it only goes up one level. The grandparent > would still be unaware that a change it potentially cares about has occurred > someplace down in the partitioning hierarchy. That seems hard to patch up, > again because of the locking rules. The child can know the OID of its parent > without locking the parent, but it can't know the OID of its grandparent without > locking its parent. Walking up the whole partitioning hierarchy might be an > issue for a number of reasons, including possible deadlocks, and possible race > conditions where we don't emit all of the right invalidations in the face of > concurrent changes. So I don't quite see a way around this part of the problem, > but I may well be missing something. In fact I hope I am missing something, > because solving this problem would be really nice. I think the check of partition could be even more complicated if we need to check the parallel safety of partition key expression. If user directly insert into a partition, then we need invoke ExecPartitionCheck which will execute all it's parent's and grandparent's partition key expressions. It means if we change a parent table's partition key expression(by 1) change function in expr or 2) attach the parent table as partition of another parent table), then we need to invalidate all its child's relcache. BTW, currently, If user attach a partitioned table 'A' to be partition of another partitioned table 'B', the child of 'A' will not be invalidated. Best regards, houzj
On Tue, Jun 15, 2021 at 7:31 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Jun 15, 2021 at 7:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Okay, but I think if we go with your suggested model where whenever > > there is a change in parallel-safety of any function, we need to send > > the new invalidation then I think it won't matter whether the function > > is associated with index expression, check constraint in the table, or > > is used in any other way. > > No, it will still matter, because I'm proposing that the > parallel-safety of functions that we only access via operator classes > will not even be checked. > I am not very clear on what exactly you have in your mind in this regard. I understand that while computing parallel-safety for a rel we don't need to consider functions that we only access via operator class but how do we distinguish such functions during Alter Function? Is there a simple way to deduce that this is an operator class function so don't register invalidation for it? Shall we check it via pg_depend? > > > We can additionally check the > > parallel safety of partitions when we are trying to insert into a > > particular partition and error out if we detect any parallel-unsafe > > clause and we are in parallel-mode. So, in this case, we won't be > > completely relying on the users. Users can either change the parallel > > safe option of the table or remove/change the parallel-unsafe clause > > after error. The new invalidation message as we are discussing would > > invalidate the parallel-safety for individual partitions but not the > > root partition (partitioned table itself). For root partition, we will > > rely on information specified by the user. > > Yeah, that may be the best we can do. Just to be clear, I think we > would want to check whether the relation is still parallel-safe at the > start of the operation, but not have a run-time check at each function > call. > Agreed, that is what I also had in mind. > > I am not sure if we have a simple way to check the parallel safety of > > partitioned tables. In some way, we need to rely on user either (a) by > > providing an option to specify whether parallel Inserts (and/or other > > DMLs) can be performed, or (b) by providing a guc and/or rel option > > which indicate that we can check the parallel-safety of all the > > partitions. Yet another option that I don't like could be to > > parallelize inserts on non-partitioned tables. > > If we figure out a way to check the partitions automatically that > actually works, we don't need a switch for it; we can (and should) > just do it that way all the time. But if we can't come up with a > correct algorithm for that, then we'll need to add some kind of option > where the user declares whether it's OK. > Yeah, so let us think for some more time and see if we can come up with something better for partitions, otherwise, we can sort out things further in this direction. -- With Regards, Amit Kapila.
On Tue, Jun 15, 2021 at 8:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jun 14, 2021 at 9:08 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Mon, Jun 14, 2021 at 2:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Yeah, this could be one idea but I think even if we use pg_proc OID, > > > we still need to check all the rel cache entries to find which one > > > contains the invalidated OID and that could be expensive. I wonder > > > can't we directly find the relation involved and register invalidation > > > for the same? We are able to find the relation to which trigger > > > function is associated during drop function via findDependentObjects > > > by scanning pg_depend. Assuming, we are able to find the relation for > > > trigger function by scanning pg_depend, what kinds of problems do we > > > envision in registering the invalidation for the same? > > > > I don't think that finding the relation involved and registering an > > invalidation for the same will work properly. Suppose there is a > > concurrently-running transaction which has created a new table and > > attached a trigger function to it. You can't see any of the catalog > > entries for that relation yet, so you can't invalidate it, but > > invalidation needs to happen. Even if you used some snapshot that can > > see those catalog entries before they are committed, I doubt it fixes > > the race condition. You can't hold any lock on that relation, because > > the creating transaction holds AccessExclusiveLock, but the whole > > invalidation mechanism is built around the assumption that the sender > > puts messages into the shared queue first and then releases locks, > > while the receiver first acquires a conflicting lock and then > > processes messages from the queue. > > > > Won't such messages be proceesed at start transaction time > (AtStart_Cache->AcceptInvalidationMessages)? > Even if accept invalidation at the start transaction time, we need to accept and execute it after taking a lock on relation to ensure that relation doesn't change afterward. I think what I mentioned didn't break this assumption because after finding a relation we will take a lock on it before registering the invalidation, so in the above scenario, it should wait before registering the invalidation. -- With Regards, Amit Kapila.
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 15, 2021 at 7:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Okay, but I think if we go with your suggested model where whenever > > there is a change in parallel-safety of any function, we need to send > > the new invalidation then I think it won't matter whether the function > > is associated with index expression, check constraint in the table, or > > is used in any other way. > > No, it will still matter, because I'm proposing that the parallel-safety of > functions that we only access via operator classes will not even be checked. > Also, if we decided to make the system more fine-grained - e.g. by invalidating > on the specific OID of the function that was changed rather than doing > something that is database-wide or global - then it matters even more. > > > Yeah, dealing with partitioned tables is tricky. I think if we don't > > want to check upfront the parallel safety of all the partitions then > > the other option as discussed could be to ask the user to specify the > > parallel safety of partitioned tables. > > Just to be clear here, I don't think it really matters what we *want* to do. I don't > think it's reasonably *possible* to check all the partitions, because we don't > hold locks on them. When we're assessing a bunch of stuff related to an > individual relation, we have a lock on it. I think - though we should > double-check tablecmds.c - that this is enough to prevent all of the dependent > objects - triggers, constraints, etc. - from changing. So the stuff we care about > is stable. But the situation with a partitioned table is different. In that case, we > can't even examine that stuff without locking all the partitions. And even if we > do lock all the partitions, the stuff could change immediately afterward and we > wouldn't know. So I think it would be difficult to make it correct. > > Now, maybe it could be done, and I think that's worth a little more thought. For > example, perhaps whenever we invalidate a relation, we could also somehow > send some new, special kind of invalidation for its parent saying, essentially, > "hey, one of your children has changed in a way you might care about." But > that's not good enough, because it only goes up one level. The grandparent > would still be unaware that a change it potentially cares about has occurred > someplace down in the partitioning hierarchy. That seems hard to patch up, > again because of the locking rules. The child can know the OID of its parent > without locking the parent, but it can't know the OID of its grandparent without > locking its parent. Walking up the whole partitioning hierarchy might be an > issue for a number of reasons, including possible deadlocks, and possible race > conditions where we don't emit all of the right invalidations in the face of > concurrent changes. So I don't quite see a way around this part of the problem, > but I may well be missing something. In fact I hope I am missing something, > because solving this problem would be really nice. For partition, I think postgres already have the logic about recursively finding the parent table[1]. Can we copy that logic to send serval invalid messages that invalidate the parent and grandparent... relcache if change a partition's parallel safety ? Although, it means we need more lock(on its parents) when the parallel safety changed, but it seems it's not a frequent scenario and looks acceptable. [1] In generate_partition_qual() parentrelid = get_partition_parent(RelationGetRelid(rel), true); parent = relation_open(parentrelid, AccessShareLock); ... /* Add the parent's quals to the list (if any) */ if (parent->rd_rel->relispartition) result = list_concat(generate_partition_qual(parent), my_qual); Besides, I have a possible crazy idea that maybe it's not necessary to invalidate the relcache when parallel safety of function is changed. I take a look at what postgres currently behaves, and found that even if user changes a function (CREATE OR REPLACE/ALTER FUNCTION) which is used in objects(like: constraint or index expression or partition key expression), the data in the relation won't be rechecked. And as the doc said[2], It is *not recommended* to change the function which is already used in some other objects. The recommended way to handle such a change is to drop the object, adjust the function definition, and re-add the objects. Maybe we only care about the parallel safety change when create or drop an object(constraint or index or partition or trigger). And we can check the parallel safety when insert into a particular table, if find functions not allowed in parallel mode which means someone change the function's parallel safety, then we can invalidate the relcache and error out. [2]https://www.postgresql.org/docs/14/ddl-constraints.html Best regards, houzj
On Tue, Jun 15, 2021 at 10:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I don't think that finding the relation involved and registering an > > invalidation for the same will work properly. Suppose there is a > > concurrently-running transaction which has created a new table and > > attached a trigger function to it. You can't see any of the catalog > > entries for that relation yet, so you can't invalidate it, but > > invalidation needs to happen. Even if you used some snapshot that can > > see those catalog entries before they are committed, I doubt it fixes > > the race condition. You can't hold any lock on that relation, because > > the creating transaction holds AccessExclusiveLock, but the whole > > invalidation mechanism is built around the assumption that the sender > > puts messages into the shared queue first and then releases locks, > > while the receiver first acquires a conflicting lock and then > > processes messages from the queue. > > Won't such messages be proceesed at start transaction time > (AtStart_Cache->AcceptInvalidationMessages)? Only if they show up in the queue before that. But there's nothing forcing that to happen. You don't seem to understand how important heavyweight locking is to the whole shared invalidation message system.... > Okay, so, in this scheme, we have allowed changing the function > definition during statement execution but even though the rel's > parallel-safe property gets modified (say to parallel-unsafe), we will > still proceed in parallel-mode as if it's not changed. I guess this > may not be a big deal as we can anyway allow breaking the running > statement by changing its definition and users may be okay if the > parallel statement errors out or behave in an unpredictable way in > such corner cases. Yeah, I mean, it's no different than leaving the parallel-safety marking exactly as it was and changing the body of the function to call some other function marked parallel-unsafe. I don't think we've gotten any complaints about that, because I don't think it would normally have any really bad consequences; most likely you'd just get an error saying that something-or-other isn't allowed in parallel mode. If it does have bad consequences, then I guess we'll have to fix it when we find out about it, but in the meantime there's no reason to hold the parallel-safety flag to a stricter standard than the function body. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Jun 16, 2021 at 9:22 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Jun 15, 2021 at 10:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > I don't think that finding the relation involved and registering an > > > invalidation for the same will work properly. Suppose there is a > > > concurrently-running transaction which has created a new table and > > > attached a trigger function to it. You can't see any of the catalog > > > entries for that relation yet, so you can't invalidate it, but > > > invalidation needs to happen. Even if you used some snapshot that can > > > see those catalog entries before they are committed, I doubt it fixes > > > the race condition. You can't hold any lock on that relation, because > > > the creating transaction holds AccessExclusiveLock, but the whole > > > invalidation mechanism is built around the assumption that the sender > > > puts messages into the shared queue first and then releases locks, > > > while the receiver first acquires a conflicting lock and then > > > processes messages from the queue. > > > > Won't such messages be proceesed at start transaction time > > (AtStart_Cache->AcceptInvalidationMessages)? > > Only if they show up in the queue before that. But there's nothing > forcing that to happen. You don't seem to understand how important > heavyweight locking is to the whole shared invalidation message > system.... > I have responded about heavy-weight locking stuff in my next email [1] and why I think the approach I mentioned will work. I don't deny that I might be missing something here. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BT2CWqp40YqYttDA1Skk7wK6yDrkCD5GZ80QGr5ze-6g%40mail.gmail.com -- With Regards, Amit Kapila.
On Thu, Jun 17, 2021 at 4:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > I have responded about heavy-weight locking stuff in my next email [1] > and why I think the approach I mentioned will work. I don't deny that > I might be missing something here. > > [1] - https://www.postgresql.org/message-id/CAA4eK1%2BT2CWqp40YqYttDA1Skk7wK6yDrkCD5GZ80QGr5ze-6g%40mail.gmail.com I mean I saw that but I don't see how it addresses the visibility issue. There could be a relation that is not visible to your snapshot and upon which AccessExclusiveLock is held which needs to be invalidated. You can't lock it because it's AccessExclusiveLock'd already. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Jun 17, 2021 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Jun 17, 2021 at 4:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I have responded about heavy-weight locking stuff in my next email [1] > > and why I think the approach I mentioned will work. I don't deny that > > I might be missing something here. > > > > [1] - https://www.postgresql.org/message-id/CAA4eK1%2BT2CWqp40YqYttDA1Skk7wK6yDrkCD5GZ80QGr5ze-6g%40mail.gmail.com > > I mean I saw that but I don't see how it addresses the visibility > issue. > I thought if we scan a system catalog using DirtySnapshot, then we should be able to find such a relation. But, if the system catalog is updated after our scan then surely we won't be able to see it and in that case, we won't be able to send invalidation. Now, say if the rel is not visible to us because of the snapshot we used or due to some race condition then we won't be able to send the invalidation but why we want to consider it worse than the case where we miss such invalidations (invalidations due to change of parallel-safe property) when the insertion into relation is in-progress. > There could be a relation that is not visible to your snapshot > and upon which AccessExclusiveLock is held which needs to be > invalidated. You can't lock it because it's AccessExclusiveLock'd > already. > Yeah, the session in which we are doing Alter Function won't be able to lock it but it will wait for the AccessExclusiveLock on the rel to be released because it will also try to acquire it before sending invalidation. -- With Regards, Amit Kapila.
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
On Wednesday, June 16, 2021 11:27 AM houzj.fnst@fujitsu.com wrote: > On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas@gmail.com> wrote: > > Just to be clear here, I don't think it really matters what we *want* > > to do. I don't think it's reasonably *possible* to check all the > > partitions, because we don't hold locks on them. When we're assessing > > a bunch of stuff related to an individual relation, we have a lock on > > it. I think - though we should double-check tablecmds.c - that this is > > enough to prevent all of the dependent objects - triggers, > > constraints, etc. - from changing. So the stuff we care about is > > stable. But the situation with a partitioned table is different. In > > that case, we can't even examine that stuff without locking all the > > partitions. And even if we do lock all the partitions, the stuff could change > immediately afterward and we wouldn't know. So I think it would be difficult to > >make it correct. > > Now, maybe it could be done, and I think that's worth a little more > > thought. For example, perhaps whenever we invalidate a relation, we > > could also somehow send some new, special kind of invalidation for its > > parent saying, essentially, "hey, one of your children has changed in > > a way you might care about." But that's not good enough, because it > > only goes up one level. The grandparent would still be unaware that a > > change it potentially cares about has occurred someplace down in the > > partitioning hierarchy. That seems hard to patch up, again because of > > the locking rules. The child can know the OID of its parent without > > locking the parent, but it can't know the OID of its grandparent > > without locking its parent. Walking up the whole partitioning > > hierarchy might be an issue for a number of reasons, including > > possible deadlocks, and possible race conditions where we don't emit > > all of the right invalidations in the face of concurrent changes. So I > > don't quite see a way around this part of the problem, but I may well be > missing something. In fact I hope I am missing something, because solving this > problem would be really nice. > > I think the check of partition could be even more complicated if we need to > check the parallel safety of partition key expression. If user directly insert into a > partition, then we need invoke ExecPartitionCheck which will execute all it's > parent's and grandparent's partition key expressions. It means if we change a > parent table's partition key expression(by 1) change function in expr or 2) > attach the parent table as partition of another parent table), then we need to > invalidate all its child's relcache. > > BTW, currently, If user attach a partitioned table 'A' to be partition of another > partitioned table 'B', the child of 'A' will not be invalidated. To be honest, I didn't find a cheap way to invalidate partitioned table's parallel safety automatically. For one thing, We need to recurse higher in the partition tree to invalid all the parent table's relcache(and perhaps all its children's relcache) not only when alter function parallel safety, but also for DDLs which will invalid the partition's relcache, such as CREATE/DROP INDEX/TRIGGER/CONSTRAINT. It seems too expensive. For another, even if we can invalidate the partitioned table's parallel safety automatically, we still need to lock all the partition when checking table's parallel safety, because the partition's parallel safety could be changed after checking the parallel safety. So, IMO, at least for partitioned table, an explicit flag looks more acceptable. For regular table, It seems we can work it out automatically, although I am not sure does anyone think it looks a bit inconsistent. Best regards, houzj
On Mon, Jun 21, 2021 at 12:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > I thought if we scan a system catalog using DirtySnapshot, then we > should be able to find such a relation. But, if the system catalog is > updated after our scan then surely we won't be able to see it and in > that case, we won't be able to send invalidation. Now, say if the rel > is not visible to us because of the snapshot we used or due to some > race condition then we won't be able to send the invalidation but why > we want to consider it worse than the case where we miss such > invalidations (invalidations due to change of parallel-safe property) > when the insertion into relation is in-progress. A concurrent change is something quite different than a change that happened some time in the past. We all know that DROP TABLE blocks if it is run while the table is in use, and everybody considers that acceptable, but if DROP TABLE were to block because the table was in use at some previous time, everybody would complain, and rightly so. The same principle applies here. It's not possible to react to a change that happens in the middle of the query. Somebody could argue that we ought to lock all the functions we're using against concurrent changes so that attempts to change their properties block on a lock rather than succeeding. But given that that's not how it works, we can hardly go back in time and switch to a non-parallel plan after we've already decided on a parallel one. On the other hand, we should be able to notice a change that has *already completed* at the time we do planning. I don't see how we can blame failure to do that on anything other than bad coding. > Yeah, the session in which we are doing Alter Function won't be able > to lock it but it will wait for the AccessExclusiveLock on the rel to > be released because it will also try to acquire it before sending > invalidation. I think users would not be very happy with such behavior. Users accept that if they try to access a relation, they might end up needing to wait for a lock on it, but what you are proposing here might make a session block waiting for a lock on a relation which it never attempted to access. I think this whole line of attack is a complete dead-end. We can invent new types of invalidations if we want, but they need to be sent based on which objects actually got changed, not based on what we think might be affected indirectly as a result of those changes. It's reasonable to regard something like a trigger or constraint as a property of the table because it is really a dependent object. It is associated with precisely one table when it is created and the association can never be changed. On the other hand, functions clearly have their own existence. They can be created and dropped independently of any table and the tables with which they are associated can change at any time. In that kind of situation, invalidating the table based on changes to the function is riddled with problems which I am pretty convinced we're never going to be able to solve. I'm not 100% sure what we ought to do here, but I'm pretty sure that looking up the tables that happen to be associated with the function in the session that is modifying the function is not it. -- Robert Haas EDB: http://www.enterprisedb.com
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
On Monday, June 21, 2021 11:23 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jun 21, 2021 at 12:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Yeah, the session in which we are doing Alter Function won't be able > > to lock it but it will wait for the AccessExclusiveLock on the rel to > > be released because it will also try to acquire it before sending > > invalidation. > > I think users would not be very happy with such behavior. Users accept that if > they try to access a relation, they might end up needing to wait for a lock on it, > but what you are proposing here might make a session block waiting for a lock > on a relation which it never attempted to access. > > I think this whole line of attack is a complete dead-end. We can invent new > types of invalidations if we want, but they need to be sent based on which > objects actually got changed, not based on what we think might be affected > indirectly as a result of those changes. It's reasonable to regard something like > a trigger or constraint as a property of the table because it is really a > dependent object. It is associated with precisely one table when it is created > and the association can never be changed. On the other hand, functions clearly > have their own existence. They can be created and dropped independently of > any table and the tables with which they are associated can change at any time. > In that kind of situation, invalidating the table based on changes to the function > is riddled with problems which I am pretty convinced we're never going to be > able to solve. I'm not 100% sure what we ought to do here, but I'm pretty sure > that looking up the tables that happen to be associated with the function in the > session that is modifying the function is not it. I agree that we should send invalid message like " function OID's parallel safety has changed ". And when each session accept this invalid message, each session needs to invalid the related table. Based on previous mails, we only want to invalid the table that use this function in the index expression/trigger/constraints. The problem is how to get all the related tables. Robert-san suggested cache a list of pg_proc OIDs, that means we need to rebuild the list everytime if the relcache is invalidated. The cost to do that could be expensive, especially for extracting pg_proc OIDs from index expression, because we need to invoke index_open(index, lock) to get the index expression. Or, maybe we can let each session uses the pg_depend to get the related table and invalid them after accepting the new type invalid message. Best regards, houzj
On Wed, Jun 16, 2021 at 8:57 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > I think the check of partition could be even more complicated if we need to > check the parallel safety of partition key expression. If user directly insert into > a partition, then we need invoke ExecPartitionCheck which will execute all it's > parent's and grandparent's partition key expressions. It means if we change a > parent table's partition key expression(by 1) change function in expr or 2) attach > the parent table as partition of another parent table), then we need to invalidate > all its child's relcache. > I think we already invalidate the child entries when we add/drop constraints on a parent table. See ATAddCheckConstraint, ATExecDropConstraint. If I am not missing anything then this case shouldn't be a problem. Do you have something else in mind? -- With Regards, Amit Kapila.
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, June 22, 2021 8:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Jun 16, 2021 at 8:57 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > > > I think the check of partition could be even more complicated if we > > need to check the parallel safety of partition key expression. If user > > directly insert into a partition, then we need invoke > > ExecPartitionCheck which will execute all it's parent's and > > grandparent's partition key expressions. It means if we change a > > parent table's partition key expression(by 1) change function in expr > > or 2) attach the parent table as partition of another parent table), then we > need to invalidate all its child's relcache. > > > > I think we already invalidate the child entries when we add/drop constraints on > a parent table. See ATAddCheckConstraint, ATExecDropConstraint. If I am not > missing anything then this case shouldn't be a problem. Do you have > something else in mind? Currently, attach/detach a partition doesn't invalidate the child entries recursively, except when detach a partition concurrently which will add a constraint to all the child. Do you mean we can add the logic about invalidating the child entries recursively when attach/detach a partition ? Best regards, houzj
On Wed, Jun 23, 2021 at 6:35 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > On Tuesday, June 22, 2021 8:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jun 16, 2021 at 8:57 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > > > > > I think the check of partition could be even more complicated if we > > > need to check the parallel safety of partition key expression. If user > > > directly insert into a partition, then we need invoke > > > ExecPartitionCheck which will execute all it's parent's and > > > grandparent's partition key expressions. It means if we change a > > > parent table's partition key expression(by 1) change function in expr > > > or 2) attach the parent table as partition of another parent table), then we > > need to invalidate all its child's relcache. > > > > > > > I think we already invalidate the child entries when we add/drop constraints on > > a parent table. See ATAddCheckConstraint, ATExecDropConstraint. If I am not > > missing anything then this case shouldn't be a problem. Do you have > > something else in mind? > > Currently, attach/detach a partition doesn't invalidate the child entries > recursively, except when detach a partition concurrently which will add a > constraint to all the child. Do you mean we can add the logic about > invalidating the child entries recursively when attach/detach a partition ? > I was talking about adding/dropping CHECK or other constraints on partitioned tables via Alter Table. I think if attach/detach leads to change in constraints of child tables then either they should invalidate child rels to avoid problems in the existing sessions or if it is not doing due to a reason then probably it might not matter. I see that you have started a separate thread [1] to confirm the behavior of attach/detach partition and we might want to decide based on the conclusion of that thread. [1] - https://www.postgresql.org/message-id/OS3PR01MB5718DA1C4609A25186D1FBF194089%40OS3PR01MB5718.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
On Wed, Jun 16, 2021 at 6:10 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > Now, maybe it could be done, and I think that's worth a little more thought. For > > example, perhaps whenever we invalidate a relation, we could also somehow > > send some new, special kind of invalidation for its parent saying, essentially, > > "hey, one of your children has changed in a way you might care about." But > > that's not good enough, because it only goes up one level. The grandparent > > would still be unaware that a change it potentially cares about has occurred > > someplace down in the partitioning hierarchy. That seems hard to patch up, > > again because of the locking rules. The child can know the OID of its parent > > without locking the parent, but it can't know the OID of its grandparent without > > locking its parent. Walking up the whole partitioning hierarchy might be an > > issue for a number of reasons, including possible deadlocks, and possible race > > conditions where we don't emit all of the right invalidations in the face of > > concurrent changes. So I don't quite see a way around this part of the problem, > > but I may well be missing something. In fact I hope I am missing something, > > because solving this problem would be really nice. > > For partition, I think postgres already have the logic about recursively finding > the parent table[1]. Can we copy that logic to send serval invalid messages that > invalidate the parent and grandparent... relcache if change a partition's parallel safety ? > Although, it means we need more lock(on its parents) when the parallel safety > changed, but it seems it's not a frequent scenario and looks acceptable. > > [1] In generate_partition_qual() > parentrelid = get_partition_parent(RelationGetRelid(rel), true); > parent = relation_open(parentrelid, AccessShareLock); > ... > /* Add the parent's quals to the list (if any) */ > if (parent->rd_rel->relispartition) > result = list_concat(generate_partition_qual(parent), my_qual); > As shown by me in another email [1], such a coding pattern can lead to deadlock. It is because in some DDL operations we walk the partition hierarchy from top to down and if we walk from bottom to upwards, then that can lead to deadlock. I think this is a dangerous coding pattern and we shouldn't try to replicate it. [1] - https://www.postgresql.org/message-id/CAA4eK1LsFpjK5gL%2B0HEvoqB2DJVOi19vGAWbZBEx8fACOi5%2B_A%40mail.gmail.com -- With Regards, Amit Kapila.
On Wed, Jun 23, 2021 at 8:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 16, 2021 at 6:10 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, June 15, 2021 10:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > Now, maybe it could be done, and I think that's worth a little more thought. For
> > example, perhaps whenever we invalidate a relation, we could also somehow
> > send some new, special kind of invalidation for its parent saying, essentially,
> > "hey, one of your children has changed in a way you might care about." But
> > that's not good enough, because it only goes up one level. The grandparent
> > would still be unaware that a change it potentially cares about has occurred
> > someplace down in the partitioning hierarchy. That seems hard to patch up,
> > again because of the locking rules. The child can know the OID of its parent
> > without locking the parent, but it can't know the OID of its grandparent without
> > locking its parent. Walking up the whole partitioning hierarchy might be an
> > issue for a number of reasons, including possible deadlocks, and possible race
> > conditions where we don't emit all of the right invalidations in the face of
> > concurrent changes. So I don't quite see a way around this part of the problem,
> > but I may well be missing something. In fact I hope I am missing something,
> > because solving this problem would be really nice.
>
> For partition, I think postgres already have the logic about recursively finding
> the parent table[1]. Can we copy that logic to send serval invalid messages that
> invalidate the parent and grandparent... relcache if change a partition's parallel safety ?
> Although, it means we need more lock(on its parents) when the parallel safety
> changed, but it seems it's not a frequent scenario and looks acceptable.
>
> [1] In generate_partition_qual()
> parentrelid = get_partition_parent(RelationGetRelid(rel), true);
> parent = relation_open(parentrelid, AccessShareLock);
> ...
> /* Add the parent's quals to the list (if any) */
> if (parent->rd_rel->relispartition)
> result = list_concat(generate_partition_qual(parent), my_qual);
>
As shown by me in another email [1], such a coding pattern can lead to
deadlock. It is because in some DDL operations we walk the partition
hierarchy from top to down and if we walk from bottom to upwards, then
that can lead to deadlock. I think this is a dangerous coding pattern
and we shouldn't try to replicate it.
[1] - https://www.postgresql.org/message-id/CAA4eK1LsFpjK5gL%2B0HEvoqB2DJVOi19vGAWbZBEx8fACOi5%2B_A%40mail.gmail.com
--
With Regards,
Amit Kapila.
Hi,
How about walking the partition hierarchy bottom up, recording the parents but not taking the locks.
Once top-most parent is found, take the locks in reverse order (top down) ?
Cheers
On Thu, Jun 24, 2021 at 1:38 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > How about walking the partition hierarchy bottom up, recording the parents but not taking the locks. > Once top-most parent is found, take the locks in reverse order (top down) ? > Is it safe to walk up the partition hierarchy (to record the parents for the eventual locking in reverse order) without taking locks? Regards, Greg Nancarrow Fujitsu Australia
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
On Thursday, June 24, 2021 11:44 AM Zhihong Yu <zyu@yugabyte.com> wrote: > Hi, > How about walking the partition hierarchy bottom up, recording the parents but not taking the locks. > Once top-most parent is found, take the locks in reverse order (top down) ? IMO, When we directly INSERT INTO a partition, postgres already lock the partition as the target table before execution which means we cannot postpone the lock on partition to find the parent table. Best regards, houzj
On Mon, Jun 21, 2021 at 4:40 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > To be honest, I didn't find a cheap way to invalidate partitioned table's > parallel safety automatically. > I also don't see the feasibility for doing parallelism checks for partitioned tables both because it is expensive due to traversing/locking all the partitions and then the invalidations are difficult to handle due to deadlock hazards as discussed above. Let me try to summarize the discussion so far and see if we can have any better ideas than what we have discussed so far or we want to go with one of the ideas discussed till now. I think we have broadly discussed two approaches (a) to automatically decide whether parallelism can be enabled for inserts, (b) provide an option to the user to specify whether inserts can be parallelized on a relation. For the first approach (a), we have evaluated both the partitioned and non-partitioned relation cases. For non-partitioned relations, we can compute the parallel-safety of relation during the planning and save it in the relation cache entry. This is normally safe because we have a lock on the relation and any change to the relation should raise an invalidation which will lead to re-computation of parallel-safety information for a relation. Now, there are cases where the parallel-safety of some trigger function or a function used in index expression can be changed by the user which won't register an invalidation for a relation. To handle such cases, we can register a new kind of invalidation only when a function's parallel-safety information is changed. And every backend in the same database then needs to re-evaluate the parallel-safety of every relation for which it has cached a value. For partitioned relations, the similar idea won't work because of multiple reasons (a) We need to traverse and lock all the partitions to compute the parallel-safety of the root relation which could be very expensive; (b) Whenever we invalidate a particular partition, we need to invalidate its parent hierarchy as well. We can't traverse the parent hierarchy without taking locks on the parent table which can lead to deadlock. The alternative could be that for partitioned relations we can rely on the user-specified information about parallel-safety (like the approach-b mentioned in the previous paragraph). We can additionally check the parallel safety of partitions when we are trying to insert into a particular partition and error out if we detect any parallel-unsafe clause and we are in parallel-mode. So, in this case, we won't be completely relying on the users. Users can either change the parallel safe option of the table or remove/change the parallel-unsafe clause after an error. For the second approach (b), we can provide an option to the user to specify whether inserts (or other dml's) can be parallelized for a relation. One of the ideas is to provide some options like below to the user: CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE }; ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE }; This property is recorded in pg_class's relparallel column as 'u', 'r', or 's', just like pg_proc's proparallel. The default is UNSAFE. Additionally, provide a function pg_get_parallel_safety(oid) using which users can determine whether it is safe to enable parallelism. Surely, after the user has checked with that function, one can add some unsafe constraints to the table by altering the table but it will still be an aid to enable parallelism on a relation. The first approach (a) has an appeal because it would allow to automatically parallelize inserts in many cases but might have some overhead in some cases due to processing of relcache entries after the parallel-safety of the relation is changed. The second approach (b) has an appeal because of its consistent behavior for partitioned and non-partitioned relations. Among the above options, I would personally prefer (b) mainly because of the consistent handling for partition and non-partition table cases but I am fine with approach (a) as well if that is what other people feel is better. Thoughts? -- With Regards, Amit Kapila.
On Mon, Jun 28, 2021 at 7:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Among the above options, I would personally prefer (b) mainly because > of the consistent handling for partition and non-partition table cases > but I am fine with approach (a) as well if that is what other people > feel is better. > > Thoughts? > I personally think "(b) provide an option to the user to specify whether inserts can be parallelized on a relation" is the preferable option. There seems to be too many issues with the alternative of trying to determine the parallel-safety of a partitioned table automatically. I think (b) is the simplest and most consistent approach, working the same way for all table types, and without the overhead of (a). Also, I don't think (b) is difficult for the user. At worst, the user can use the provided utility-functions at development-time to verify the intended declared table parallel-safety. I can't really see some mixture of (a) and (b) being acceptable. Regards, Greg Nancarrow Fujitsu Australia
On Wed, Jun 30, 2021 at 11:46 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > I personally think "(b) provide an option to the user to specify > whether inserts can be parallelized on a relation" is the preferable > option. > There seems to be too many issues with the alternative of trying to > determine the parallel-safety of a partitioned table automatically. > I think (b) is the simplest and most consistent approach, working the > same way for all table types, and without the overhead of (a). > Also, I don't think (b) is difficult for the user. At worst, the user > can use the provided utility-functions at development-time to verify > the intended declared table parallel-safety. > I can't really see some mixture of (a) and (b) being acceptable. Yeah, I'd like to have it be automatic, but I don't have a clear idea how to make that work nicely. It's possible somebody (Tom?) can suggest something that I'm overlooking, though. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Jul 2, 2021 at 8:16 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Jun 30, 2021 at 11:46 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > I personally think "(b) provide an option to the user to specify > > whether inserts can be parallelized on a relation" is the preferable > > option. > > There seems to be too many issues with the alternative of trying to > > determine the parallel-safety of a partitioned table automatically. > > I think (b) is the simplest and most consistent approach, working the > > same way for all table types, and without the overhead of (a). > > Also, I don't think (b) is difficult for the user. At worst, the user > > can use the provided utility-functions at development-time to verify > > the intended declared table parallel-safety. > > I can't really see some mixture of (a) and (b) being acceptable. > > Yeah, I'd like to have it be automatic, but I don't have a clear idea > how to make that work nicely. It's possible somebody (Tom?) can > suggest something that I'm overlooking, though. In general, for the non-partitioned table, where we don't have much overhead of checking the parallel safety and invalidation is also not a big problem so I am tempted to provide an automatic parallel safety check. This would enable parallelism for more cases wherever it is suitable without user intervention. OTOH, I understand that providing automatic checking might be very costly if the number of partitions is more. Can't we provide some mid-way where the parallelism is enabled by default for the normal table but for the partitioned table it is disabled by default and the user has to set it safe for enabling parallelism? I agree that such behavior might sound a bit hackish. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
On Sunday, July 4, 2021 1:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > On Fri, Jul 2, 2021 at 8:16 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Wed, Jun 30, 2021 at 11:46 PM Greg Nancarrow <gregn4422@gmail.com> > wrote: > > > I personally think "(b) provide an option to the user to specify > > > whether inserts can be parallelized on a relation" is the preferable > > > option. > > > There seems to be too many issues with the alternative of trying to > > > determine the parallel-safety of a partitioned table automatically. > > > I think (b) is the simplest and most consistent approach, working > > > the same way for all table types, and without the overhead of (a). > > > Also, I don't think (b) is difficult for the user. At worst, the > > > user can use the provided utility-functions at development-time to > > > verify the intended declared table parallel-safety. > > > I can't really see some mixture of (a) and (b) being acceptable. > > > > Yeah, I'd like to have it be automatic, but I don't have a clear idea > > how to make that work nicely. It's possible somebody (Tom?) can > > suggest something that I'm overlooking, though. > > In general, for the non-partitioned table, where we don't have much overhead > of checking the parallel safety and invalidation is also not a big problem so I am > tempted to provide an automatic parallel safety check. This would enable > parallelism for more cases wherever it is suitable without user intervention. > OTOH, I understand that providing automatic checking might be very costly if > the number of partitions is more. Can't we provide some mid-way where the > parallelism is enabled by default for the normal table but for the partitioned > table it is disabled by default and the user has to set it safe for enabling > parallelism? I agree that such behavior might sound a bit hackish. About the invalidation for non-partitioned table, I think it still has a problem: When a function's parallel safety changed, it's expensive to judge whether the function is related to index or trigger or some table-related objects by using pg_depend, because we can only do the judgement in each backend when accept a invalidation message. If we don't do that, it means whenever a function's parallel safety changed, we invalidate every relation's cached safety which looks not very nice to me. So, I personally think "(b) provide an option to the user to specify whether inserts can be parallelized on a relation" is the preferable option. Best regards, houzj
On Sun, Jul 4, 2021 at 1:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > In general, for the non-partitioned table, where we don't have much > overhead of checking the parallel safety and invalidation is also not > a big problem so I am tempted to provide an automatic parallel safety > check. This would enable parallelism for more cases wherever it is > suitable without user intervention. OTOH, I understand that providing > automatic checking might be very costly if the number of partitions is > more. Can't we provide some mid-way where the parallelism is enabled > by default for the normal table but for the partitioned table it is > disabled by default and the user has to set it safe for enabling > parallelism? I agree that such behavior might sound a bit hackish. I think that's basically the proposal that Amit and I have been discussing. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Jul 21, 2021 at 12:30 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sun, Jul 4, 2021 at 1:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > In general, for the non-partitioned table, where we don't have much > > overhead of checking the parallel safety and invalidation is also not > > a big problem so I am tempted to provide an automatic parallel safety > > check. This would enable parallelism for more cases wherever it is > > suitable without user intervention. OTOH, I understand that providing > > automatic checking might be very costly if the number of partitions is > > more. Can't we provide some mid-way where the parallelism is enabled > > by default for the normal table but for the partitioned table it is > > disabled by default and the user has to set it safe for enabling > > parallelism? I agree that such behavior might sound a bit hackish. > > I think that's basically the proposal that Amit and I have been discussing. > I see here we have a mix of opinions from various people. Dilip seems to be favoring the approach where we provide some option to the user for partitioned tables and automatic behavior for non-partitioned tables but he also seems to have mild concerns about this behavior. OTOH, Greg and Hou-San seem to favor an approach where we can provide an option to the user for both partitioned and non-partitioned tables. I am also in favor of providing an option to the user for the sake of consistency in behavior and not trying to introduce a special kind of invalidation as it doesn't serve the purpose for partitioned tables. Robert seems to be in favor of automatic behavior but it is not very clear to me if he is fine with dealing differently for partitioned and non-partitioned relations. Robert, can you please provide your opinion on what do you think is the best way to move forward here? -- With Regards, Amit Kapila.
On Wed, Jul 21, 2021 at 11:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > I see here we have a mix of opinions from various people. Dilip seems > to be favoring the approach where we provide some option to the user > for partitioned tables and automatic behavior for non-partitioned > tables but he also seems to have mild concerns about this behavior. > OTOH, Greg and Hou-San seem to favor an approach where we can provide > an option to the user for both partitioned and non-partitioned tables. > I am also in favor of providing an option to the user for the sake of > consistency in behavior and not trying to introduce a special kind of > invalidation as it doesn't serve the purpose for partitioned tables. > Robert seems to be in favor of automatic behavior but it is not very > clear to me if he is fine with dealing differently for partitioned and > non-partitioned relations. Robert, can you please provide your opinion > on what do you think is the best way to move forward here? I thought we had agreed on handling partitioned and unpartitioned tables differently, but maybe I misunderstood the discussion. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Jul 23, 2021 at 6:55 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Jul 21, 2021 at 11:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I see here we have a mix of opinions from various people. Dilip seems > > to be favoring the approach where we provide some option to the user > > for partitioned tables and automatic behavior for non-partitioned > > tables but he also seems to have mild concerns about this behavior. > > OTOH, Greg and Hou-San seem to favor an approach where we can provide > > an option to the user for both partitioned and non-partitioned tables. > > I am also in favor of providing an option to the user for the sake of > > consistency in behavior and not trying to introduce a special kind of > > invalidation as it doesn't serve the purpose for partitioned tables. > > Robert seems to be in favor of automatic behavior but it is not very > > clear to me if he is fine with dealing differently for partitioned and > > non-partitioned relations. Robert, can you please provide your opinion > > on what do you think is the best way to move forward here? > > I thought we had agreed on handling partitioned and unpartitioned > tables differently, but maybe I misunderstood the discussion. > I think for the consistency argument how about allowing users to specify a parallel-safety option for both partitioned and non-partitioned relations but for non-partitioned relations if users didn't specify, it would be computed automatically? If the user has specified parallel-safety option for non-partitioned relation then we would consider that instead of computing the value by ourselves. Another reason for hesitation to do automatically for non-partitioned relations was the new invalidation which will invalidate the cached parallel-safety for all relations in relcache for a particular database. As mentioned by Hou-San [1], it seems we need to do this whenever any function's parallel-safety is changed. OTOH, changing parallel-safety for a function is probably not that often to matter in practice which is why I think you seem to be fine with this idea. So, I think, on that premise, it is okay to go ahead with different handling for partitioned and non-partitioned relations here. [1] - https://www.postgresql.org/message-id/OS0PR01MB5716EC1D07ACCA24373C2557941B9%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
On Sat, Jul 24, 2021 at 5:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > I think for the consistency argument how about allowing users to > specify a parallel-safety option for both partitioned and > non-partitioned relations but for non-partitioned relations if users > didn't specify, it would be computed automatically? If the user has > specified parallel-safety option for non-partitioned relation then we > would consider that instead of computing the value by ourselves. Having the option for both partitioned and non-partitioned tables doesn't seem like the worst idea ever, but I am also not entirely sure that I understand the point. > Another reason for hesitation to do automatically for non-partitioned > relations was the new invalidation which will invalidate the cached > parallel-safety for all relations in relcache for a particular > database. As mentioned by Hou-San [1], it seems we need to do this > whenever any function's parallel-safety is changed. OTOH, changing > parallel-safety for a function is probably not that often to matter in > practice which is why I think you seem to be fine with this idea. Right. I think it should be quite rare, and invalidation events are also not crazy expensive. We can test what the worst case is, but if you have to sit there and run ALTER FUNCTION in a tight loop to see a measurable performance impact, it's not a real problem. There may be a code complexity argument against trying to figure it out automatically, perhaps, but I don't think there's a big performance issue. What bothers me is that if this is something people have to set manually then many people won't and will not get the benefit of the feature. And some of them will also set it incorrectly and have problems. So I am in favor of trying to determine it automatically where possible, to make it easy for people. However, other people may feel differently, and I'm not trying to say they're necessarily wrong. I'm just telling you what I think. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jul 26, 2021 at 8:33 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Jul 24, 2021 at 5:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I think for the consistency argument how about allowing users to > > specify a parallel-safety option for both partitioned and > > non-partitioned relations but for non-partitioned relations if users > > didn't specify, it would be computed automatically? If the user has > > specified parallel-safety option for non-partitioned relation then we > > would consider that instead of computing the value by ourselves. > > Having the option for both partitioned and non-partitioned tables > doesn't seem like the worst idea ever, but I am also not entirely sure > that I understand the point. > Consider below ways to allow the user to specify the parallel-safety option: (a) CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ... ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } .. OR (b) CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true) ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true) The point was what should we do if the user specifies the option for a non-partitioned table. Do we just ignore it or give an error that this is not a valid syntax/option when used with non-partitioned tables? I find it slightly odd that this option works for partitioned tables but gives an error for non-partitioned tables but maybe we can document it. With the above syntax, even if the user doesn't specify the parallelism option for non-partitioned relations, we will determine it automatically. Now, in some situations, users might want to force parallelism even when we wouldn't have chosen it automatically. It is possible that she might face an error due to some parallel-unsafe function but OTOH, she might have ensured that it is safe to choose parallelism in her particular case. > > Another reason for hesitation to do automatically for non-partitioned > > relations was the new invalidation which will invalidate the cached > > parallel-safety for all relations in relcache for a particular > > database. As mentioned by Hou-San [1], it seems we need to do this > > whenever any function's parallel-safety is changed. OTOH, changing > > parallel-safety for a function is probably not that often to matter in > > practice which is why I think you seem to be fine with this idea. > > Right. I think it should be quite rare, and invalidation events are > also not crazy expensive. We can test what the worst case is, but if > you have to sit there and run ALTER FUNCTION in a tight loop to see a > measurable performance impact, it's not a real problem. There may be a > code complexity argument against trying to figure it out > automatically, perhaps, but I don't think there's a big performance > issue. > True, there could be some code complexity but I think we can see once the patch is ready for review. > What bothers me is that if this is something people have to set > manually then many people won't and will not get the benefit of the > feature. And some of them will also set it incorrectly and have > problems. So I am in favor of trying to determine it automatically > where possible, to make it easy for people. However, other people may > feel differently, and I'm not trying to say they're necessarily wrong. > I'm just telling you what I think. > Thanks for all your suggestions and feedback. -- With Regards, Amit Kapila.
On Tue, Jul 27, 2021 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jul 26, 2021 at 8:33 PM Robert Haas <robertmhaas@gmail.com> wrote: > Consider below ways to allow the user to specify the parallel-safety option: > > (a) > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ... > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } .. > > OR > > (b) > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true) > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true) > > The point was what should we do if the user specifies the option for a > non-partitioned table. Do we just ignore it or give an error that this > is not a valid syntax/option when used with non-partitioned tables? I > find it slightly odd that this option works for partitioned tables but > gives an error for non-partitioned tables but maybe we can document > it. IMHO, for a non-partitioned table, we should be default allow the parallel safely checking so that users don't have to set it for individual tables, OTOH, I don't think that there is any point in blocking the syntax for the non-partitioned table, So I think for the non-partitioned table if the user hasn't set it we should do automatic safety checking and if the user has defined the safety externally then we should respect that. And for the partitioned table, we will never do the automatic safety checking and we should always respect what the user has set. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jul 27, 2021 at 11:28 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jul 27, 2021 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Jul 26, 2021 at 8:33 PM Robert Haas <robertmhaas@gmail.com> wrote: > > Consider below ways to allow the user to specify the parallel-safety option: > > > > (a) > > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ... > > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } .. > > > > OR > > > > (b) > > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true) > > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true) > > > > The point was what should we do if the user specifies the option for a > > non-partitioned table. Do we just ignore it or give an error that this > > is not a valid syntax/option when used with non-partitioned tables? I > > find it slightly odd that this option works for partitioned tables but > > gives an error for non-partitioned tables but maybe we can document > > it. > > IMHO, for a non-partitioned table, we should be default allow the > parallel safely checking so that users don't have to set it for > individual tables, OTOH, I don't think that there is any point in > blocking the syntax for the non-partitioned table, So I think for the > non-partitioned table if the user hasn't set it we should do automatic > safety checking and if the user has defined the safety externally then > we should respect that. And for the partitioned table, we will never > do the automatic safety checking and we should always respect what the > user has set. > This is exactly what I am saying. BTW, do you have any preference for the syntax among (a) or (b)? -- With Regards, Amit Kapila.
On Tue, Jul 27, 2021 at 3:58 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > IMHO, for a non-partitioned table, we should be default allow the > parallel safely checking so that users don't have to set it for > individual tables, OTOH, I don't think that there is any point in > blocking the syntax for the non-partitioned table, So I think for the > non-partitioned table if the user hasn't set it we should do automatic > safety checking and if the user has defined the safety externally then > we should respect that. And for the partitioned table, we will never > do the automatic safety checking and we should always respect what the > user has set. > Provided it is possible to distinguish between the default parallel-safety (unsafe) and that default being explicitly specified by the user, it should be OK. In the case of performing the automatic parallel-safety checking and the table using something that is parallel-unsafe, there will be a performance degradation compared to the current code (hopefully only small). That can be avoided by the user explicitly specifying that it's parallel-unsafe. Regards, Greg Nancarrow Fujitsu Australia
On Tue, Jul 27, 2021 at 4:00 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Tue, Jul 27, 2021 at 3:58 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > IMHO, for a non-partitioned table, we should be default allow the > > parallel safely checking so that users don't have to set it for > > individual tables, OTOH, I don't think that there is any point in > > blocking the syntax for the non-partitioned table, So I think for the > > non-partitioned table if the user hasn't set it we should do automatic > > safety checking and if the user has defined the safety externally then > > we should respect that. And for the partitioned table, we will never > > do the automatic safety checking and we should always respect what the > > user has set. > > > > Provided it is possible to distinguish between the default > parallel-safety (unsafe) and that default being explicitly specified > by the user, it should be OK. > Offhand, I don't see any problem with this. Do you have something specific in mind? > In the case of performing the automatic parallel-safety checking and > the table using something that is parallel-unsafe, there will be a > performance degradation compared to the current code (hopefully only > small). That can be avoided by the user explicitly specifying that > it's parallel-unsafe. > True, but I guess this should be largely addressed by caching the value of parallel safety at the relation level. Sure, there will be some cost the first time we compute it but on consecutive accesses, it should be quite cheap. -- With Regards, Amit Kapila.
RE: [bug?] Missed parallel safety checks, and wrong parallel safety
From
"houzj.fnst@fujitsu.com"
Date:
On July 27, 2021 1:14 PM Amit Kapila <amit.kapila16@gmail.com> > On Mon, Jul 26, 2021 at 8:33 PM Robert Haas <robertmhaas@gmail.com> > wrote: > > > > On Sat, Jul 24, 2021 at 5:52 AM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > I think for the consistency argument how about allowing users to > > > specify a parallel-safety option for both partitioned and > > > non-partitioned relations but for non-partitioned relations if users > > > didn't specify, it would be computed automatically? If the user has > > > specified parallel-safety option for non-partitioned relation then we > > > would consider that instead of computing the value by ourselves. > > > > Having the option for both partitioned and non-partitioned tables > > doesn't seem like the worst idea ever, but I am also not entirely sure > > that I understand the point. > > > > Consider below ways to allow the user to specify the parallel-safety option: > > (a) > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ... > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } .. > > OR > > (b) > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true) > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true) Personally, I think the approach (a) might be better. Since it's similar to ALTER FUNCTION PARALLEL XXX which user might be more familiar with. Besides, I think we need a new default value about parallel dml safety. Maybe 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is likely to alter the safety to the default value to get the automatic safety check, a independent default value can make it more clear. Best regards, Houzj
On Wed, Jul 28, 2021 at 12:52 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > > Consider below ways to allow the user to specify the parallel-safety option: > > > > (a) > > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ... > > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } .. > > > > OR > > > > (b) > > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true) > > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true) > > Personally, I think the approach (a) might be better. Since it's similar to > ALTER FUNCTION PARALLEL XXX which user might be more familiar with. > I think so too. > Besides, I think we need a new default value about parallel dml safety. Maybe > 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is > likely to alter the safety to the default value to get the automatic safety > check, a independent default value can make it more clear. > Yes, I was thinking something similar when I said "Provided it is possible to distinguish between the default parallel-safety (unsafe) and that default being explicitly specified by the user". If we don't have a new default value, then we need to distinguish these cases, but I'm not sure Postgres does something similar elsewhere (for example, for function parallel-safety, it's not currently recorded whether parallel-safety=unsafe is because of the default or because the user specifically set it to what is the default value). Opinions? Regards, Greg Nancarrow Fujitsu Australia
Note: Changing the subject as I felt the topic has diverted from the original reported case and also it might help others to pay attention. On Wed, Jul 28, 2021 at 8:22 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > > > Consider below ways to allow the user to specify the parallel-safety option: > > > > (a) > > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ... > > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } .. > > > > OR > > > > (b) > > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true) > > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true) > > Personally, I think the approach (a) might be better. Since it's similar to > ALTER FUNCTION PARALLEL XXX which user might be more familiar with. > Okay, and I think for (b) true/false won't be sufficient because one might want to specify restricted. > Besides, I think we need a new default value about parallel dml safety. Maybe > 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is > likely to alter the safety to the default value to get the automatic safety > check, a independent default value can make it more clear. > Hmm, but auto won't work for partitioned tables, right? If so, that might appear like an inconsistency to the user and we need to document the same. Let me summarize the discussion so far in this thread so that it is helpful to others. We would like to parallelize INSERT SELECT (first step INSERT + parallel SELECT and then Parallel (INSERT + SELECT)) and for that, we have explored a couple of ways. The first approach is to automatically detect if it is safe to parallelize insert and then do it without user intervention. To detect automatically, we need to determine the parallel-safety of various expressions (like default column expressions, check constraints, index expressions, etc.) at the planning time which can be costly but we can avoid most of the cost if we cache the parallel safety for the relation. So, the cost needs to be paid just once. Now, we can't cache this for partitioned relations because it can be very costly (as we need to lock all the partitions) and has deadlock risks (while processing invalidation), this has been explained in email [1]. Now, as we can't think of a nice way to determine parallel safety automatically for partitioned relations, we thought of providing an option to the user. The next thing to decide here is that if we are providing an option to the user in one of the ways as mentioned above in the email, what should we do if the user uses that option for non-partitioned relations, shall we just ignore it or give an error that this is not a valid syntax/option? The one idea which Dilip and I are advocating is to respect the user's input for non-partitioned relations and if it is not given then compute the parallel safety and cache it. To facilitate users for providing a parallel-safety option, we are thinking to provide a utility function "pg_get_table_parallel_dml_safety(regclass)" that returns records of (objid, classid, parallel_safety) for all parallel unsafe/restricted table-related objects from which the table's parallel DML safety is determined. This will allow user to identify unsafe objects and if the required user can change the parallel safety of required functions and then use the parallel safety option for the table. Thoughts? Note - This topic has been discussed in another thread as well [2] but as many of the key technical points have been discussed here I thought it is better to continue here. [1] - https://www.postgresql.org/message-id/CAA4eK1Jwz8xGss4b0-33eyX0i5W_1CnqT16DjB9snVC--DoOsQ%40mail.gmail.com [2] - https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709%40TYAPR01MB2990.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Besides, I think we need a new default value about parallel dml safety. Maybe > > 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is > > likely to alter the safety to the default value to get the automatic safety > > check, a independent default value can make it more clear. > > > > Hmm, but auto won't work for partitioned tables, right? If so, that > might appear like an inconsistency to the user and we need to document > the same. Let me summarize the discussion so far in this thread so > that it is helpful to others. > To avoid that inconsistency, UNSAFE could be the default for partitioned tables (and we would disallow setting AUTO for these). So then AUTO is the default for non-partitioned tables only. Regards, Greg Nancarrow Fujitsu Australia
RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)
From
"houzj.fnst@fujitsu.com"
Date:
On Friday, July 30, 2021 2:52 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Besides, I think we need a new default value about parallel dml > > > safety. Maybe 'auto' or 'null'(different from > > > safe/restricted/unsafe). Because, user is likely to alter the safety > > > to the default value to get the automatic safety check, a independent default > > > value can make it more clear. > > > > > > > Hmm, but auto won't work for partitioned tables, right? If so, that > > might appear like an inconsistency to the user and we need to document > > the same. Let me summarize the discussion so far in this thread so > > that it is helpful to others. > > > > To avoid that inconsistency, UNSAFE could be the default for partitioned tables > (and we would disallow setting AUTO for these). > So then AUTO is the default for non-partitioned tables only. I think this approach is reasonable, +1. Best regards, houzj
On Fri, Jul 30, 2021 at 6:53 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > On Friday, July 30, 2021 2:52 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > Besides, I think we need a new default value about parallel dml > > > > safety. Maybe 'auto' or 'null'(different from > > > > safe/restricted/unsafe). Because, user is likely to alter the safety > > > > to the default value to get the automatic safety check, a independent default > > > > value can make it more clear. > > > > > > > > > > Hmm, but auto won't work for partitioned tables, right? If so, that > > > might appear like an inconsistency to the user and we need to document > > > the same. Let me summarize the discussion so far in this thread so > > > that it is helpful to others. > > > > > > > To avoid that inconsistency, UNSAFE could be the default for partitioned tables > > (and we would disallow setting AUTO for these). > > So then AUTO is the default for non-partitioned tables only. > > I think this approach is reasonable, +1. > I see the need to change to default via Alter Table but I am not sure if Auto is the most appropriate way to handle that. How about using DEFAULT itself as we do in the case of REPLICA IDENTITY? So, if users have to alter parallel safety value to default, they need to just say Parallel DML DEFAULT. The default would mean automatic behavior for non-partitioned relations and ignore parallelism for partitioned tables. -- With Regards, Amit Kapila.
On Mon, Aug 2, 2021 at 2:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jul 30, 2021 at 6:53 PM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > > > On Friday, July 30, 2021 2:52 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > Besides, I think we need a new default value about parallel dml > > > > > safety. Maybe 'auto' or 'null'(different from > > > > > safe/restricted/unsafe). Because, user is likely to alter the safety > > > > > to the default value to get the automatic safety check, a independent default > > > > > value can make it more clear. > > > > > > > > > > > > > Hmm, but auto won't work for partitioned tables, right? If so, that > > > > might appear like an inconsistency to the user and we need to document > > > > the same. Let me summarize the discussion so far in this thread so > > > > that it is helpful to others. > > > > > > > > > > To avoid that inconsistency, UNSAFE could be the default for partitioned tables > > > (and we would disallow setting AUTO for these). > > > So then AUTO is the default for non-partitioned tables only. > > > > I think this approach is reasonable, +1. > > > > I see the need to change to default via Alter Table but I am not sure > if Auto is the most appropriate way to handle that. How about using > DEFAULT itself as we do in the case of REPLICA IDENTITY? So, if users > have to alter parallel safety value to default, they need to just say > Parallel DML DEFAULT. The default would mean automatic behavior for > non-partitioned relations and ignore parallelism for partitioned > tables. > Hmm, I'm not so sure I'm sold on that. I personally think "DEFAULT" here is vague, and users then need to know what DEFAULT equates to, based on the type of table (partitioned or non-partitioned table). Also, then there's two ways to set the actual "default" DML parallel-safety for partitioned tables: DEFAULT or UNSAFE. At least "AUTO" is a meaningful default option name for non-partitioned tables - "automatic" parallel-safety checking, and the fact that it isn't the default (and can't be set) for partitioned tables highlights the difference in the way being proposed to treat them (i.e. use automatic checking only for non-partitioned tables). I'd be interested to hear what others think. I think a viable alternative would be to record whether an explicit DML parallel-safety has been specified, and if not, apply default behavior (i.e. by default use automatic checking for non-partitioned tables and treat partitioned tables as UNSAFE). I'm just not sure whether this kind of distinction (explicit vs implicit default) has been used before in Postgres options. Regards, Greg Nancarrow Fujitsu Australia
RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)
From
"houzj.fnst@fujitsu.com"
Date:
On August 2, 2021 2:04 PM Greg Nancarrow <gregn4422@gmail.com> wrote: > On Mon, Aug 2, 2021 at 2:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jul 30, 2021 at 6:53 PM houzj.fnst@fujitsu.com > > <houzj.fnst@fujitsu.com> wrote: > > > > > > On Friday, July 30, 2021 2:52 PM Greg Nancarrow <gregn4422@gmail.com> > wrote: > > > > On Fri, Jul 30, 2021 at 4:02 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > > > > > > > > Besides, I think we need a new default value about parallel > > > > > > dml safety. Maybe 'auto' or 'null'(different from > > > > > > safe/restricted/unsafe). Because, user is likely to alter the > > > > > > safety to the default value to get the automatic safety check, > > > > > > a independent default value can make it more clear. > > > > > > > > > > > > > > > > Hmm, but auto won't work for partitioned tables, right? If so, > > > > > that might appear like an inconsistency to the user and we need > > > > > to document the same. Let me summarize the discussion so far in > > > > > this thread so that it is helpful to others. > > > > > > > > > > > > > To avoid that inconsistency, UNSAFE could be the default for > > > > partitioned tables (and we would disallow setting AUTO for these). > > > > So then AUTO is the default for non-partitioned tables only. > > > > > > I think this approach is reasonable, +1. > > > > > > > I see the need to change to default via Alter Table but I am not sure > > if Auto is the most appropriate way to handle that. How about using > > DEFAULT itself as we do in the case of REPLICA IDENTITY? So, if users > > have to alter parallel safety value to default, they need to just say > > Parallel DML DEFAULT. The default would mean automatic behavior for > > non-partitioned relations and ignore parallelism for partitioned > > tables. > > > > Hmm, I'm not so sure I'm sold on that. > I personally think "DEFAULT" here is vague, and users then need to know what > DEFAULT equates to, based on the type of table (partitioned or non-partitioned > table). > Also, then there's two ways to set the actual "default" DML parallel-safety for > partitioned tables: DEFAULT or UNSAFE. > At least "AUTO" is a meaningful default option name for non-partitioned tables > - "automatic" parallel-safety checking, and the fact that it isn't the default (and > can't be set) for partitioned tables highlights the difference in the way being > proposed to treat them (i.e. use automatic checking only for non-partitioned > tables). > I'd be interested to hear what others think. > I think a viable alternative would be to record whether an explicit DML > parallel-safety has been specified, and if not, apply default behavior (i.e. by > default use automatic checking for non-partitioned tables and treat partitioned > tables as UNSAFE). I'm just not sure whether this kind of distinction (explicit vs > implicit default) has been used before in Postgres options. I think both approaches are fine, but using "DEFAULT" might has a disadvantage that if we somehow support automatic safety check for partitioned table in the future, then the meaning of "DEFAULT" for partitioned table will change from UNSAFE to automatic check. It could also bring some burden on the user to modify their sql script. Best regards, houzj
RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)
From
"houzj.fnst@fujitsu.com"
Date:
Based on the discussion here, I implemented the auto-safety-check feature. Since most of the technical discussion happened here,I attatched the patches in this thread. The patches allow users to specify a parallel-safety option for both partitioned and non-partitioned relations, and for non-partitioned relations if users didn't specify, it would be computed automatically. If the user has specified parallel-safety option then we would consider that instead of computing the value by ourselves. But for partitioned table, if users didn't specify the parallel dml safety, it will treat is as unsafe. For non-partitioned relations, after computing the parallel-safety of relation during the planning, we save it in the relation cache entry and invalidate the cached parallel-safety for all relations in relcache for a particular database whenever any function's parallel-safety is changed. To make it possible for user to alter the safety to a not specified value to get the automatic safety check, add a new default option(temporarily named 'DEFAULT' in addition to safe/unsafe/restricted) about parallel dml safety. To facilitate users for providing a parallel-safety option, provide a utility functionr "pg_get_table_parallel_dml_safety(regclass)" that returns records of (objid, classid, parallel_safety) for all parallel unsafe/restricted table-related objects from which the table's parallel DML safety is determined. This will allow user to identify unsafe objects and if the required user can change the parallel safety of required functions and then use the parallel safety option for the table. Best regards, houzj
Attachment
RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)
From
"houzj.fnst@fujitsu.com"
Date:
On Tues, August 3, 2021 3:40 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > Based on the discussion here, I implemented the auto-safety-check feature. > Since most of the technical discussion happened here,I attatched the patches in > this thread. > > The patches allow users to specify a parallel-safety option for both partitioned > and non-partitioned relations, and for non-partitioned relations if users didn't > specify, it would be computed automatically. If the user has specified > parallel-safety option then we would consider that instead of computing the > value by ourselves. But for partitioned table, if users didn't specify the parallel > dml safety, it will treat is as unsafe. > > For non-partitioned relations, after computing the parallel-safety of relation > during the planning, we save it in the relation cache entry and invalidate the > cached parallel-safety for all relations in relcache for a particular database > whenever any function's parallel-safety is changed. > > To make it possible for user to alter the safety to a not specified value to get the > automatic safety check, add a new default option(temporarily named 'DEFAULT' > in addition to safe/unsafe/restricted) about parallel dml safety. > > To facilitate users for providing a parallel-safety option, provide a utility > functionr "pg_get_table_parallel_dml_safety(regclass)" that returns records of > (objid, classid, parallel_safety) for all parallel unsafe/restricted table-related > objects from which the table's parallel DML safety is determined. > This will allow user to identify unsafe objects and if the required user can change > the parallel safety of required functions and then use the parallel safety option > for the table. Update the commit message in patches to make it easier for others to review. Best regards, Houzj
Attachment
RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)
From
"houzj.fnst@fujitsu.com"
Date:
On Fri, Aug 6, 2021 4:23 PM Hou zhijie <houzj.fnst@fujitsu.com> wrote: > > Update the commit message in patches to make it easier for others to review. CFbot reported a compile error due to recent commit 3aafc03. Attach rebased patches which fix the error. Best regards, Hou zj
Attachment
RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)
From
"houzj.fnst@fujitsu.com"
Date:
Thursday, August 19, 2021 4:16 PM Hou zhijie <houzj.fnst@fujitsu.com> wrote: > On Fri, Aug 6, 2021 4:23 PM Hou zhijie <houzj.fnst@fujitsu.com> wrote: > > > > Update the commit message in patches to make it easier for others to review. > > CFbot reported a compile error due to recent commit 3aafc03. > Attach rebased patches which fix the error. The patch can't apply to the HEAD branch due a recent commit. Attach rebased patches. Best regards, Hou zj
Attachment
RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)
From
"houzj.fnst@fujitsu.com"
Date:
From: Wednesday, September 1, 2021 5:24 PM Hou Zhijie<houzj.fnst@fujitsu.com> > Thursday, August 19, 2021 4:16 PM Hou zhijie <houzj.fnst@fujitsu.com> wrote: > > On Fri, Aug 6, 2021 4:23 PM Hou zhijie <houzj.fnst@fujitsu.com> wrote: > > > > > > Update the commit message in patches to make it easier for others to > review. > > > > CFbot reported a compile error due to recent commit 3aafc03. > > Attach rebased patches which fix the error. > > The patch can't apply to the HEAD branch due a recent commit. > Attach rebased patches. In the past, the rewriter could generate a re-written query with a modifying CTE does not have hasModifyingCTE flag set and this bug cause the regression test(force_parallel_mode=regress) failure when enable parallel select for insert, so , we had a workaround 0006.patch for it. But now, the bug has been fixed in commit 362e2d and we don't need the workaround patch anymore. Attach new version patch set which remove the workaround patch. Best regards, Hou zj
Attachment
Hi, On Thu, Sep 09, 2021 at 02:12:08AM +0000, houzj.fnst@fujitsu.com wrote: > > Attach new version patch set which remove the workaround patch. This version of the patchset doesn't apply anymore: http://cfbot.cputube.org/patch_36_3143.log === Applying patches on top of PostgreSQL commit ID a18b6d2dc288dfa6e7905ede1d4462edd6a8af47 === === applying patch ./v19-0001-CREATE-ALTER-TABLE-PARALLEL-DML.patch [...] patching file src/backend/commands/tablecmds.c Hunk #1 FAILED at 40. Hunk #2 succeeded at 624 (offset 21 lines). Hunk #3 succeeded at 670 (offset 21 lines). Hunk #4 succeeded at 947 (offset 19 lines). Hunk #5 succeeded at 991 (offset 19 lines). Hunk #6 succeeded at 4256 (offset 40 lines). Hunk #7 succeeded at 4807 (offset 40 lines). Hunk #8 succeeded at 5217 (offset 40 lines). Hunk #9 succeeded at 6193 (offset 42 lines). Hunk #10 succeeded at 19278 (offset 465 lines). 1 out of 10 hunks FAILED -- saving rejects to file src/backend/commands/tablecmds.c.rej [...] patching file src/bin/pg_dump/pg_dump.c Hunk #1 FAILED at 6253. Hunk #2 FAILED at 6358. Hunk #3 FAILED at 6450. Hunk #4 FAILED at 6503. Hunk #5 FAILED at 6556. Hunk #6 FAILED at 6609. Hunk #7 FAILED at 6660. Hunk #8 FAILED at 6708. Hunk #9 FAILED at 6756. Hunk #10 FAILED at 6803. Hunk #11 FAILED at 6872. Hunk #12 FAILED at 6927. Hunk #13 succeeded at 15524 (offset -1031 lines). 12 out of 13 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej [...] patching file src/bin/psql/describe.c Hunk #1 succeeded at 1479 (offset -177 lines). Hunk #2 succeeded at 1493 (offset -177 lines). Hunk #3 succeeded at 1631 (offset -241 lines). Hunk #4 succeeded at 3374 (offset -277 lines). Hunk #5 succeeded at 3731 (offset -310 lines). Hunk #6 FAILED at 4109. 1 out of 6 hunks FAILED -- saving rejects to file src/bin/psql/describe.c.rej Could you send a rebased version? In the meantime I will switch the entry to Waiting on Author.
On Thu, Jul 28, 2022 at 8:43 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > Could you send a rebased version? In the meantime I will switch the entry to > Waiting on Author. By request in [1] I'm marking this Returned with Feedback for now. Whenever you're ready, you can resurrect the patch entry by visiting https://commitfest.postgresql.org/38/3143/ and changing the status to "Needs Review", and then changing the status again to "Move to next CF". (Don't forget the second step; hopefully we will have streamlined this in the near future!) Thanks, --Jacob [1] https://www.postgresql.org/message-id/OS0PR01MB571696D623F35A09AB51903A94969%40OS0PR01MB5716.jpnprd01.prod.outlook.com