Thread: Parallel safety of binary_upgrade_create_empty_extension
Hi, My colleague Richard Yen came across this situation: pg_restore: [archiver (db)] could not execute query: ERROR: cannot assign XIDs during a parallel operation Command was: -- For binary upgrade, create an empty extension and insert objects into it DROP EXTENSION IF EXISTS "btree_gin"; SELECT pg_catalog.binary_upgrade_create_empty_extension('btree_gin', 'public', true, '1.0', NULL, NULL, ARRAY[]::pg_catalog.text[]); It turned out that the target cluster was running with force_parallel_mode = on. Here's a single character patch to mark that function PARALLEL UNSAFE. Obviously that'll affect only newly initdb'd clusters after this patch, but that's what people have in a pg_upgrade scenario. This goes back to d89f06f0482 so I think it should probably be back-patched to 9.6 and 10. -- Thomas Munro http://www.enterprisedb.com
Attachment
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Here's a single character patch to mark > that function PARALLEL UNSAFE. Ugh. Clearly a bug. > Obviously that'll affect only newly > initdb'd clusters after this patch, but that's what people have in a > pg_upgrade scenario. We're a bit fortunate on that. I wonder if there's some less trial-and-error way of discovering that functions are or are not parallel safe. > This goes back to d89f06f0482 so I think it should probably be > back-patched to 9.6 and 10. Roger, will do. regards, tom lane
... BTW: # select proname, proparallel from pg_proc where proname like 'binary_upg%'; proname | proparallel --------------------------------------------+------------- binary_upgrade_create_empty_extension | r binary_upgrade_set_next_array_pg_type_oid | r binary_upgrade_set_next_heap_pg_class_oid | r binary_upgrade_set_next_index_pg_class_oid | r binary_upgrade_set_next_pg_authid_oid | r binary_upgrade_set_next_pg_enum_oid | r binary_upgrade_set_next_pg_type_oid | r binary_upgrade_set_next_toast_pg_class_oid | r binary_upgrade_set_next_toast_pg_type_oid | r binary_upgrade_set_record_init_privs | r (10 rows) I wonder whether we shouldn't mark *all* of these parallel-unsafe. I'm not exactly convinced that 'restricted' is sufficient for the others, and even if it is, there's certainly little if any upside for letting them be executed in parallel-enabled mode. regards, tom lane
On Tue, Mar 27, 2018 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > # select proname, proparallel from pg_proc where proname like 'binary_upg%'; > proname | proparallel > --------------------------------------------+------------- > binary_upgrade_create_empty_extension | r > binary_upgrade_set_next_array_pg_type_oid | r > binary_upgrade_set_next_heap_pg_class_oid | r > binary_upgrade_set_next_index_pg_class_oid | r > binary_upgrade_set_next_pg_authid_oid | r > binary_upgrade_set_next_pg_enum_oid | r > binary_upgrade_set_next_pg_type_oid | r > binary_upgrade_set_next_toast_pg_class_oid | r > binary_upgrade_set_next_toast_pg_type_oid | r > binary_upgrade_set_record_init_privs | r > (10 rows) > > I wonder whether we shouldn't mark *all* of these parallel-unsafe. > I'm not exactly convinced that 'restricted' is sufficient for the > others, and even if it is, there's certainly little if any upside > for letting them be executed in parallel-enabled mode. Yeah, I almost sent a patch to do exactly that, but then I realised the others all just set a global variable so they're technically fine as 'r'. I have no strong preference either way; these functions will only actually be run in parallel in the weird situation of force_parallel_mode = on. -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Tue, Mar 27, 2018 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wonder whether we shouldn't mark *all* of these parallel-unsafe. >> I'm not exactly convinced that 'restricted' is sufficient for the >> others, and even if it is, there's certainly little if any upside >> for letting them be executed in parallel-enabled mode. > Yeah, I almost sent a patch to do exactly that, but then I realised > the others all just set a global variable so they're technically fine > as 'r'. Yeah, so I see after looking closer. > I have no strong preference either way; these functions will > only actually be run in parallel in the weird situation of > force_parallel_mode = on. While the minimal patch would be fine for now, what I'm worried about is preventing future mistakes. It seems highly likely that the reason binary_upgrade_create_empty_extension is marked 'r' is that somebody copied-and-pasted that from another binary_upgrade_foo function without thinking very hard. Marking them all 'u' would help to forestall future mistakes of the same sort, while leaving them as 'r' doesn't seem to buy us anything much (beyond a smaller catalog patch today). Querying for other functions marked 'r' leaves me with some other related doubts: 1. Why are the various flavors of pg_get_viewdef() marked 'r'? Surely reading the catalogs is a thing parallel children are allowed to do. If there is a good reason for pg_get_viewdef() to be 'r', why doesn't the same reason apply to all the other ruleutils functions? 2. Why are the various thingy-to-xml functions marked 'r'? Again it can't be because they read catalogs or data. I can imagine that the reason for restricting cursor_to_xml is that the cursor might execute parallel-unsafe operations, but then why isn't it marked 'u'? 3. Isn't pg_import_system_collations() unsafe, for the same reason as binary_upgrade_create_empty_extension()? regards, tom lane
On Tue, Mar 27, 2018 at 12:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Querying for other functions marked 'r' leaves me with some other related > doubts: > > 1. Why are the various flavors of pg_get_viewdef() marked 'r'? Surely > reading the catalogs is a thing parallel children are allowed to do. > If there is a good reason for pg_get_viewdef() to be 'r', why doesn't > the same reason apply to all the other ruleutils functions? > > 2. Why are the various thingy-to-xml functions marked 'r'? Again it > can't be because they read catalogs or data. I can imagine that the > reason for restricting cursor_to_xml is that the cursor might execute > parallel-unsafe operations, but then why isn't it marked 'u'? > > 3. Isn't pg_import_system_collations() unsafe, for the same reason > as binary_upgrade_create_empty_extension()? Yeah. I hacked something up in Python to analyse the C call graph and look for non-PARALLEL SAFE functions written in C that can reach AssignTransactionId. Attached, for interest. Not a great approach because current_schema, fetch_search_path, SPI_XXX and a couple of others all lead there creating many possibly false positives (though who knows). If I filter those out I'm left with the ones already mentioned (pg_import_system_collations, binary_upgrade_create_empty_extension) plus two others: 1. unique_key_recheck, not user callable anyway. 2. brin_summarize_range is marked 's'. Seems wrong. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Tue, Mar 27, 2018 at 3:30 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I hacked something up in Python # otool -tvV | \ In case anyone is interested in trying that, it should be "otool -tvV [path to postgres executable compiled with -O0]" (meaning disassemble it). I was removing my home directory from the path before sending and apparently got carried away... -- Thomas Munro http://www.enterprisedb.com
On Mon, Mar 26, 2018 at 7:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > While the minimal patch would be fine for now, what I'm worried about > is preventing future mistakes. It seems highly likely that the reason > binary_upgrade_create_empty_extension is marked 'r' is that somebody > copied-and-pasted that from another binary_upgrade_foo function without > thinking very hard. Marking them all 'u' would help to forestall future > mistakes of the same sort, while leaving them as 'r' doesn't seem to buy > us anything much (beyond a smaller catalog patch today). I'm not sure that deliberately mismarking things in the hopes that it will make blind cut-and-paste a sensible approach is a very good strategy. > Querying for other functions marked 'r' leaves me with some other related > doubts: > > 1. Why are the various flavors of pg_get_viewdef() marked 'r'? Surely > reading the catalogs is a thing parallel children are allowed to do. > If there is a good reason for pg_get_viewdef() to be 'r', why doesn't > the same reason apply to all the other ruleutils functions? The only problem with running those in a worker (that I know about) is that any locks they acquire wouldn't be retained. But that is technically a difference in semantics. > 2. Why are the various thingy-to-xml functions marked 'r'? Again it > can't be because they read catalogs or data. I can imagine that the > reason for restricting cursor_to_xml is that the cursor might execute > parallel-unsafe operations, but then why isn't it marked 'u'? Same reason as above -- they acquire a lock on a named object and that lock won't be retained if it happens in a worker. Regarding cursor_to_xml, I don't think the problem you mention exists, because the query the cursor runs is independently subject to the parallel restrictions. > 3. Isn't pg_import_system_collations() unsafe, for the same reason > as binary_upgrade_create_empty_extension()? Yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Mar 26, 2018 at 7:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> While the minimal patch would be fine for now, what I'm worried about >> is preventing future mistakes. It seems highly likely that the reason >> binary_upgrade_create_empty_extension is marked 'r' is that somebody >> copied-and-pasted that from another binary_upgrade_foo function without >> thinking very hard. Marking them all 'u' would help to forestall future >> mistakes of the same sort, while leaving them as 'r' doesn't seem to buy >> us anything much (beyond a smaller catalog patch today). > I'm not sure that deliberately mismarking things in the hopes that it > will make blind cut-and-paste a sensible approach is a very good > strategy. So, your proposal is to do nothing and just hope we don't make the same mistake again in future? >> 1. Why are the various flavors of pg_get_viewdef() marked 'r'? Surely >> reading the catalogs is a thing parallel children are allowed to do. >> If there is a good reason for pg_get_viewdef() to be 'r', why doesn't >> the same reason apply to all the other ruleutils functions? > The only problem with running those in a worker (that I know about) is > that any locks they acquire wouldn't be retained. But that is > technically a difference in semantics. Meh. It's not documented that pg_get_viewdef takes any locks, and I'm sure most users would be astonished to learn that, and this argument surely doesn't explain why pg_get_viewdef is restricted while pg_get_ruledef is marked safe. > Regarding cursor_to_xml, I don't think the problem you mention exists, > because the query the cursor runs is independently subject to the > parallel restrictions. Well, "independently" is exactly the point. If the cursor query contains some parallel-unsafe (not just parallel-restricted) operations, and we allow cursor_to_xml to be parallel-restricted, don't we end up risking executing parallel-unsafe operations while doing parallelism? Or to be concrete: regression=# create sequence s1; CREATE SEQUENCE regression=# begin; BEGIN regression=# set force_parallel_mode to 1; SET regression=# declare c cursor for select nextval('s1'); DECLARE CURSOR regression=# select cursor_to_xml('c',1,true,true,''); ERROR: cannot execute nextval() during a parallel operation I think this behavior is a bug. regards, tom lane
On Tue, Mar 27, 2018 at 11:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So, your proposal is to do nothing and just hope we don't make the same > mistake again in future? That wasn't really my proposal, but if it were, it would be at least as good as your proposal of making changing things in an unprincipled fashion and hoping we get the right result. :-) I think what we need to do is write either a README or some SGML documentation explaining the reasoning behind existing markings with an eye toward helping future patch authors/committers get this right. I'm willing to do that work, or at least a first draft of it, although not today. > Meh. It's not documented that pg_get_viewdef takes any locks, and > I'm sure most users would be astonished to learn that, and this > argument surely doesn't explain why pg_get_viewdef is restricted > while pg_get_ruledef is marked safe. I'm personally of the opinion that we'd be smarter to make some things that currently hold locks until commit release them immediately, which would then make it reasonable to mark such things parallel-safe. I don't think it makes much sense to keep the locks only sometimes and pretend like the difference doesn't matter, though. That's telling people "it's very important to hold these locks until commit, except when it's not!" which doesn't seem like it can be right. >> Regarding cursor_to_xml, I don't think the problem you mention exists, >> because the query the cursor runs is independently subject to the >> parallel restrictions. > > Well, "independently" is exactly the point. If the cursor query contains > some parallel-unsafe (not just parallel-restricted) operations, and we > allow cursor_to_xml to be parallel-restricted, don't we end up risking > executing parallel-unsafe operations while doing parallelism? > > Or to be concrete: > > regression=# create sequence s1; > CREATE SEQUENCE > regression=# begin; > BEGIN > regression=# set force_parallel_mode to 1; > SET > regression=# declare c cursor for select nextval('s1'); > DECLARE CURSOR > regression=# select cursor_to_xml('c',1,true,true,''); > ERROR: cannot execute nextval() during a parallel operation > > I think this behavior is a bug. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 28, 2018 at 6:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Mar 27, 2018 at 11:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> regression=# create sequence s1; >> CREATE SEQUENCE >> regression=# begin; >> BEGIN >> regression=# set force_parallel_mode to 1; >> SET >> regression=# declare c cursor for select nextval('s1'); >> DECLARE CURSOR >> regression=# select cursor_to_xml('c',1,true,true,''); >> ERROR: cannot execute nextval() during a parallel operation >> >> I think this behavior is a bug. > > I agree. Presumably also cursor_to_xmlschema. I also found some more suspicious brin and gin modifying functions. So I think the list of functions that needs to be marked 'u' so far is: * binary_upgrade_create_empty_extension * pg_import_system_collations * brin_summarize_range * brin_summarize_new_values * brin_desummarize_range * gin_clean_pending_list * cursor_to_xml * cursor_to_xmlschema Has anyone got anything else? -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Presumably also cursor_to_xmlschema. I also found some more > suspicious brin and gin modifying functions. So I think the list of > functions that needs to be marked 'u' so far is: > * binary_upgrade_create_empty_extension > * pg_import_system_collations > * brin_summarize_range > * brin_summarize_new_values > * brin_desummarize_range > * gin_clean_pending_list > * cursor_to_xml > * cursor_to_xmlschema > Has anyone got anything else? There are some tsquery functions that execute user-supplied queries, and so need to be 'u' on the same grounds as the cursor_to_xml cases. I haven't checked to be sure if they are already so, but was planning to do so before considering this matter closed. regards, tom lane
I wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: >> Presumably also cursor_to_xmlschema. I also found some more >> suspicious brin and gin modifying functions. So I think the list of >> functions that needs to be marked 'u' so far is: >> * binary_upgrade_create_empty_extension >> * pg_import_system_collations >> * brin_summarize_range >> * brin_summarize_new_values >> * brin_desummarize_range >> * gin_clean_pending_list >> * cursor_to_xml >> * cursor_to_xmlschema >> Has anyone got anything else? > There are some tsquery functions that execute user-supplied queries, > and so need to be 'u' on the same grounds as the cursor_to_xml cases. > I haven't checked to be sure if they are already so, but was planning > to do so before considering this matter closed. So I looked, and I find that tsquery_rewrite_query, ts_stat1, and ts_stat2 all execute SQL strings of unknown properties, yet they're blithely marked parallel safe. So is contrib/xml2's xpath_table(). The trouble spot there is just the user-supplied WHERE clause, but that's still a hole big enough to cause problems. This one's particularly irritating because it'd require a new extension update script to fix. I'm not sufficiently excited about it to do that, considering that we've been trying to deprecate that module for years. Anyway, that looks like 11 functions that there's no question we need to relabel. I'll go do that. I'm still not very happy about the inconsistent marking of the ruleutils.c functions, but it seems like we don't have consensus on what to change there. regards, tom lane
I wrote: > Anyway, that looks like 11 functions that there's no question we > need to relabel. I'll go do that. Oh, while I'm looking at this, I notice that cursor_to_xml[schema] is not merely mismarked as to its parallel safety, but its volatility: it's marked stable, which is frickin' insane considering it has side effects on the state of the cursor. query_to_xml* are likewise marked stable, which seems also foolish since we have no idea whether the source query has side-effects. All of this breakage is pretty old. I guess the best we can do is fix it without a catversion bump in the back branches. regards, tom lane