Thread: Parallel safety of binary_upgrade_create_empty_extension

Parallel safety of binary_upgrade_create_empty_extension

From

Thomas Munro

Date:

27 March 2018, 01:45:53

Hi,

My colleague Richard Yen came across this situation:

pg_restore: [archiver (db)] could not execute query: ERROR:  cannot
assign XIDs during a parallel operation
    Command was: -- For binary upgrade, create an empty extension and
insert objects into it
DROP EXTENSION IF EXISTS "btree_gin";
SELECT pg_catalog.binary_upgrade_create_empty_extension('btree_gin',
'public', true, '1.0', NULL, NULL, ARRAY[]::pg_catalog.text[]);

It turned out that the target cluster was running with
force_parallel_mode = on.  Here's a single character patch to mark
that function PARALLEL UNSAFE.  Obviously that'll affect only newly
initdb'd clusters after this patch, but that's what people have in a
pg_upgrade scenario.

This goes back to d89f06f0482 so I think it should probably be
back-patched to 9.6 and 10.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

0001-Mark-binary_upgrade_create_empty_extension-PARALLEL-.patch

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Tom Lane

Date:

27 March 2018, 01:53:32

Thomas Munro <thomas.munro@enterprisedb.com> writes:
> Here's a single character patch to mark
> that function PARALLEL UNSAFE.

Ugh.  Clearly a bug.

> Obviously that'll affect only newly
> initdb'd clusters after this patch, but that's what people have in a
> pg_upgrade scenario.

We're a bit fortunate on that.  I wonder if there's some less
trial-and-error way of discovering that functions are or are not
parallel safe.

> This goes back to d89f06f0482 so I think it should probably be
> back-patched to 9.6 and 10.

Roger, will do.

            regards, tom lane

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Tom Lane

Date:

27 March 2018, 01:58:51

... BTW:

# select proname, proparallel from pg_proc where proname like 'binary_upg%';
                  proname                   | proparallel 
--------------------------------------------+-------------
 binary_upgrade_create_empty_extension      | r
 binary_upgrade_set_next_array_pg_type_oid  | r
 binary_upgrade_set_next_heap_pg_class_oid  | r
 binary_upgrade_set_next_index_pg_class_oid | r
 binary_upgrade_set_next_pg_authid_oid      | r
 binary_upgrade_set_next_pg_enum_oid        | r
 binary_upgrade_set_next_pg_type_oid        | r
 binary_upgrade_set_next_toast_pg_class_oid | r
 binary_upgrade_set_next_toast_pg_type_oid  | r
 binary_upgrade_set_record_init_privs       | r
(10 rows)

I wonder whether we shouldn't mark *all* of these parallel-unsafe.
I'm not exactly convinced that 'restricted' is sufficient for the
others, and even if it is, there's certainly little if any upside
for letting them be executed in parallel-enabled mode.

            regards, tom lane

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Thomas Munro

Date:

27 March 2018, 02:04:27

On Tue, Mar 27, 2018 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> # select proname, proparallel from pg_proc where proname like 'binary_upg%';
>                   proname                   | proparallel
> --------------------------------------------+-------------
>  binary_upgrade_create_empty_extension      | r
>  binary_upgrade_set_next_array_pg_type_oid  | r
>  binary_upgrade_set_next_heap_pg_class_oid  | r
>  binary_upgrade_set_next_index_pg_class_oid | r
>  binary_upgrade_set_next_pg_authid_oid      | r
>  binary_upgrade_set_next_pg_enum_oid        | r
>  binary_upgrade_set_next_pg_type_oid        | r
>  binary_upgrade_set_next_toast_pg_class_oid | r
>  binary_upgrade_set_next_toast_pg_type_oid  | r
>  binary_upgrade_set_record_init_privs       | r
> (10 rows)
>
> I wonder whether we shouldn't mark *all* of these parallel-unsafe.
> I'm not exactly convinced that 'restricted' is sufficient for the
> others, and even if it is, there's certainly little if any upside
> for letting them be executed in parallel-enabled mode.

Yeah, I almost sent a patch to do exactly that, but then I realised
the others all just set a global variable so they're technically fine
as 'r'.  I have no strong preference either way; these functions will
only actually be run in parallel in the weird situation of
force_parallel_mode = on.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Tom Lane

Date:

27 March 2018, 02:23:30

Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Tue, Mar 27, 2018 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I wonder whether we shouldn't mark *all* of these parallel-unsafe.
>> I'm not exactly convinced that 'restricted' is sufficient for the
>> others, and even if it is, there's certainly little if any upside
>> for letting them be executed in parallel-enabled mode.

> Yeah, I almost sent a patch to do exactly that, but then I realised
> the others all just set a global variable so they're technically fine
> as 'r'.

Yeah, so I see after looking closer.

> I have no strong preference either way; these functions will
> only actually be run in parallel in the weird situation of
> force_parallel_mode = on.

While the minimal patch would be fine for now, what I'm worried about
is preventing future mistakes.  It seems highly likely that the reason
binary_upgrade_create_empty_extension is marked 'r' is that somebody
copied-and-pasted that from another binary_upgrade_foo function without
thinking very hard.  Marking them all 'u' would help to forestall future
mistakes of the same sort, while leaving them as 'r' doesn't seem to buy
us anything much (beyond a smaller catalog patch today).

Querying for other functions marked 'r' leaves me with some other related
doubts:

1. Why are the various flavors of pg_get_viewdef() marked 'r'?  Surely
reading the catalogs is a thing parallel children are allowed to do.
If there is a good reason for pg_get_viewdef() to be 'r', why doesn't
the same reason apply to all the other ruleutils functions?

2. Why are the various thingy-to-xml functions marked 'r'?  Again it
can't be because they read catalogs or data.  I can imagine that the
reason for restricting cursor_to_xml is that the cursor might execute
parallel-unsafe operations, but then why isn't it marked 'u'?

3. Isn't pg_import_system_collations() unsafe, for the same reason
as binary_upgrade_create_empty_extension()?

            regards, tom lane

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Thomas Munro

Date:

27 March 2018, 05:30:21

On Tue, Mar 27, 2018 at 12:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Querying for other functions marked 'r' leaves me with some other related
> doubts:
>
> 1. Why are the various flavors of pg_get_viewdef() marked 'r'?  Surely
> reading the catalogs is a thing parallel children are allowed to do.
> If there is a good reason for pg_get_viewdef() to be 'r', why doesn't
> the same reason apply to all the other ruleutils functions?
>
> 2. Why are the various thingy-to-xml functions marked 'r'?  Again it
> can't be because they read catalogs or data.  I can imagine that the
> reason for restricting cursor_to_xml is that the cursor might execute
> parallel-unsafe operations, but then why isn't it marked 'u'?
>
> 3. Isn't pg_import_system_collations() unsafe, for the same reason
> as binary_upgrade_create_empty_extension()?

Yeah.  I hacked something up in Python to analyse the C call graph and
look for non-PARALLEL SAFE functions written in C that can reach
AssignTransactionId.  Attached, for interest.  Not a great approach
because current_schema, fetch_search_path, SPI_XXX and a couple of
others all lead there creating many possibly false positives (though
who knows).  If I filter those out I'm left with the ones already
mentioned (pg_import_system_collations,
binary_upgrade_create_empty_extension) plus two others:

1.  unique_key_recheck, not user callable anyway.
2.  brin_summarize_range is marked 's'.  Seems wrong.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

find_unsafe_parallel_functions.py

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Thomas Munro

Date:

27 March 2018, 05:39:02

On Tue, Mar 27, 2018 at 3:30 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I hacked something up in Python

#   otool -tvV | \

In case anyone is interested in trying that, it should be "otool -tvV
[path to postgres executable compiled with -O0]" (meaning disassemble
it).  I was removing my home directory from the path before sending
and apparently got carried away...

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Robert Haas

Date:

27 March 2018, 14:27:31

On Mon, Mar 26, 2018 at 7:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> While the minimal patch would be fine for now, what I'm worried about
> is preventing future mistakes.  It seems highly likely that the reason
> binary_upgrade_create_empty_extension is marked 'r' is that somebody
> copied-and-pasted that from another binary_upgrade_foo function without
> thinking very hard.  Marking them all 'u' would help to forestall future
> mistakes of the same sort, while leaving them as 'r' doesn't seem to buy
> us anything much (beyond a smaller catalog patch today).

I'm not sure that deliberately mismarking things in the hopes that it
will make blind cut-and-paste a sensible approach is a very good
strategy.

> Querying for other functions marked 'r' leaves me with some other related
> doubts:
>
> 1. Why are the various flavors of pg_get_viewdef() marked 'r'?  Surely
> reading the catalogs is a thing parallel children are allowed to do.
> If there is a good reason for pg_get_viewdef() to be 'r', why doesn't
> the same reason apply to all the other ruleutils functions?

The only problem with running those in a worker (that I know about) is
that any locks they acquire wouldn't be retained.  But that is
technically a difference in semantics.

> 2. Why are the various thingy-to-xml functions marked 'r'?  Again it
> can't be because they read catalogs or data.  I can imagine that the
> reason for restricting cursor_to_xml is that the cursor might execute
> parallel-unsafe operations, but then why isn't it marked 'u'?

Same reason as above -- they acquire a lock on a named object and that
lock won't be retained if it happens in a worker.

Regarding cursor_to_xml, I don't think the problem you mention exists,
because the query the cursor runs is independently subject to the
parallel restrictions.

> 3. Isn't pg_import_system_collations() unsafe, for the same reason
> as binary_upgrade_create_empty_extension()?

Yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Tom Lane

Date:

27 March 2018, 18:17:17

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Mar 26, 2018 at 7:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> While the minimal patch would be fine for now, what I'm worried about
>> is preventing future mistakes.  It seems highly likely that the reason
>> binary_upgrade_create_empty_extension is marked 'r' is that somebody
>> copied-and-pasted that from another binary_upgrade_foo function without
>> thinking very hard.  Marking them all 'u' would help to forestall future
>> mistakes of the same sort, while leaving them as 'r' doesn't seem to buy
>> us anything much (beyond a smaller catalog patch today).

> I'm not sure that deliberately mismarking things in the hopes that it
> will make blind cut-and-paste a sensible approach is a very good
> strategy.

So, your proposal is to do nothing and just hope we don't make the same
mistake again in future?

>> 1. Why are the various flavors of pg_get_viewdef() marked 'r'?  Surely
>> reading the catalogs is a thing parallel children are allowed to do.
>> If there is a good reason for pg_get_viewdef() to be 'r', why doesn't
>> the same reason apply to all the other ruleutils functions?

> The only problem with running those in a worker (that I know about) is
> that any locks they acquire wouldn't be retained.  But that is
> technically a difference in semantics.

Meh.  It's not documented that pg_get_viewdef takes any locks, and
I'm sure most users would be astonished to learn that, and this
argument surely doesn't explain why pg_get_viewdef is restricted
while pg_get_ruledef is marked safe.

> Regarding cursor_to_xml, I don't think the problem you mention exists,
> because the query the cursor runs is independently subject to the
> parallel restrictions.

Well, "independently" is exactly the point.  If the cursor query contains
some parallel-unsafe (not just parallel-restricted) operations, and we
allow cursor_to_xml to be parallel-restricted, don't we end up risking
executing parallel-unsafe operations while doing parallelism?

Or to be concrete:

regression=# create sequence s1;
CREATE SEQUENCE
regression=# begin;
BEGIN
regression=# set force_parallel_mode to 1;
SET
regression=# declare c cursor for select nextval('s1');
DECLARE CURSOR
regression=# select cursor_to_xml('c',1,true,true,'');
ERROR:  cannot execute nextval() during a parallel operation

I think this behavior is a bug.

            regards, tom lane

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Robert Haas

Date:

27 March 2018, 20:18:36

On Tue, Mar 27, 2018 at 11:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So, your proposal is to do nothing and just hope we don't make the same
> mistake again in future?

That wasn't really my proposal, but if it were, it would be at least
as good as your proposal of making changing things in an unprincipled
fashion and hoping we get the right result.  :-)

I think what we need to do is write either a README or some SGML
documentation explaining the reasoning behind existing markings with
an eye toward helping future patch authors/committers get this right.
I'm willing to do that work, or at least a first draft of it, although
not today.

> Meh.  It's not documented that pg_get_viewdef takes any locks, and
> I'm sure most users would be astonished to learn that, and this
> argument surely doesn't explain why pg_get_viewdef is restricted
> while pg_get_ruledef is marked safe.

I'm personally of the opinion that we'd be smarter to make some things
that currently hold locks until commit release them immediately, which
would then make it reasonable to mark such things parallel-safe.  I
don't think it makes much sense to keep the locks only sometimes and
pretend like the difference doesn't matter, though.  That's telling
people "it's very important to hold these locks until commit, except
when it's not!" which doesn't seem like it can be right.

>> Regarding cursor_to_xml, I don't think the problem you mention exists,
>> because the query the cursor runs is independently subject to the
>> parallel restrictions.
>
> Well, "independently" is exactly the point.  If the cursor query contains
> some parallel-unsafe (not just parallel-restricted) operations, and we
> allow cursor_to_xml to be parallel-restricted, don't we end up risking
> executing parallel-unsafe operations while doing parallelism?
>
> Or to be concrete:
>
> regression=# create sequence s1;
> CREATE SEQUENCE
> regression=# begin;
> BEGIN
> regression=# set force_parallel_mode to 1;
> SET
> regression=# declare c cursor for select nextval('s1');
> DECLARE CURSOR
> regression=# select cursor_to_xml('c',1,true,true,'');
> ERROR:  cannot execute nextval() during a parallel operation
>
> I think this behavior is a bug.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Thomas Munro

Date:

29 March 2018, 12:32:36

On Wed, Mar 28, 2018 at 6:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 27, 2018 at 11:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> regression=# create sequence s1;
>> CREATE SEQUENCE
>> regression=# begin;
>> BEGIN
>> regression=# set force_parallel_mode to 1;
>> SET
>> regression=# declare c cursor for select nextval('s1');
>> DECLARE CURSOR
>> regression=# select cursor_to_xml('c',1,true,true,'');
>> ERROR:  cannot execute nextval() during a parallel operation
>>
>> I think this behavior is a bug.
>
> I agree.

Presumably also cursor_to_xmlschema.  I also found some more
suspicious brin and gin modifying functions.  So I think the list of
functions that needs to be marked 'u' so far is:

* binary_upgrade_create_empty_extension
* pg_import_system_collations
* brin_summarize_range
* brin_summarize_new_values
* brin_desummarize_range
* gin_clean_pending_list
* cursor_to_xml
* cursor_to_xmlschema

Has anyone got anything else?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Tom Lane

Date:

29 March 2018, 16:26:50

Thomas Munro <thomas.munro@enterprisedb.com> writes:
> Presumably also cursor_to_xmlschema.  I also found some more
> suspicious brin and gin modifying functions.  So I think the list of
> functions that needs to be marked 'u' so far is:

> * binary_upgrade_create_empty_extension
> * pg_import_system_collations
> * brin_summarize_range
> * brin_summarize_new_values
> * brin_desummarize_range
> * gin_clean_pending_list
> * cursor_to_xml
> * cursor_to_xmlschema

> Has anyone got anything else?

There are some tsquery functions that execute user-supplied queries,
and so need to be 'u' on the same grounds as the cursor_to_xml cases.
I haven't checked to be sure if they are already so, but was planning
to do so before considering this matter closed.

            regards, tom lane

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Tom Lane

Date:

31 March 2018, 00:09:15

I wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> Presumably also cursor_to_xmlschema.  I also found some more
>> suspicious brin and gin modifying functions.  So I think the list of
>> functions that needs to be marked 'u' so far is:

>> * binary_upgrade_create_empty_extension
>> * pg_import_system_collations
>> * brin_summarize_range
>> * brin_summarize_new_values
>> * brin_desummarize_range
>> * gin_clean_pending_list
>> * cursor_to_xml
>> * cursor_to_xmlschema

>> Has anyone got anything else?

> There are some tsquery functions that execute user-supplied queries,
> and so need to be 'u' on the same grounds as the cursor_to_xml cases.
> I haven't checked to be sure if they are already so, but was planning
> to do so before considering this matter closed.

So I looked, and I find that tsquery_rewrite_query, ts_stat1, and
ts_stat2 all execute SQL strings of unknown properties, yet they're
blithely marked parallel safe.

So is contrib/xml2's xpath_table().  The trouble spot there is just
the user-supplied WHERE clause, but that's still a hole big enough
to cause problems.  This one's particularly irritating because it'd
require a new extension update script to fix.  I'm not sufficiently
excited about it to do that, considering that we've been trying to
deprecate that module for years.

Anyway, that looks like 11 functions that there's no question we
need to relabel.  I'll go do that.

I'm still not very happy about the inconsistent marking of the
ruleutils.c functions, but it seems like we don't have consensus
on what to change there.

            regards, tom lane

Re: Parallel safety of binary_upgrade_create_empty_extension

From

Tom Lane

Date:

31 March 2018, 00:26:21

I wrote:
> Anyway, that looks like 11 functions that there's no question we
> need to relabel.  I'll go do that.

Oh, while I'm looking at this, I notice that cursor_to_xml[schema]
is not merely mismarked as to its parallel safety, but its volatility:
it's marked stable, which is frickin' insane considering it has side
effects on the state of the cursor.

query_to_xml* are likewise marked stable, which seems also foolish
since we have no idea whether the source query has side-effects.

All of this breakage is pretty old.  I guess the best we can do is
fix it without a catversion bump in the back branches.

            regards, tom lane