Thread: Autovaccuum vs temp tables crash

Autovaccuum vs temp tables crash

From
Magnus Hagander
Date:
The following sequence causes autovacuum to crash on master:

CREATE TEMP TABLE foo(a int);
DROP TABLE foo;
DROP SCHEMA pg_temp_3;
CREATE TEMP TABLE bar(a int);
DROP TABLE bar;

it coredumps with:

#1  0x00007f96b5daa42a in __GI_abort () at abort.c:89
#2  0x000055d997915ba3 in ExceptionalCondition (conditionName=conditionName@entry=0x55d997b2a05f "!(strvalue != ((void *)0))", 
    errorType=errorType@entry=0x55d997965ebd "FailedAssertion", fileName=fileName@entry=0x55d997b2a054 "snprintf.c", 
    lineNumber=lineNumber@entry=442) at assert.c:54
#3  0x000055d99795e4ec in dopr (target=target@entry=0x7ffc04f479e0, format=0x55d997abfa2d ".%s\"", 
    format@entry=0x55d997abfa00 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=<optimized out>) at snprintf.c:442
#4  0x000055d99795e6ce in pg_vsnprintf (str=<optimized out>, count=<optimized out>, count@entry=1024, 
    fmt=fmt@entry=0x55d997abfa00 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=args@entry=0x7ffc04f47a88)
    at snprintf.c:195
#5  0x000055d997963c21 in pvsnprintf (buf=<optimized out>, len=len@entry=1024, 
    fmt=fmt@entry=0x55d997abfa00 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=args@entry=0x7ffc04f47a88)
    at psprintf.c:110
#6  0x000055d9976deef8 in appendStringInfoVA (str=str@entry=0x7ffc04f47a70, 
    fmt=fmt@entry=0x55d997abfa00 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=args@entry=0x7ffc04f47a88)
    at stringinfo.c:136
#7  0x000055d997919b60 in errmsg (fmt=fmt@entry=0x55d997abfa00 "autovacuum: dropping orphan temp table \"%s.%s.%s\"") at elog.c:794
#8  0x000055d9974dd551 in do_autovacuum () at autovacuum.c:2255
 

We are certainly not supposed to go DROP SCHEMA on the temp namespaces, but we are also not supposed to coredump on it (if we were, we should prevent people from DROP SCHEMA it and we don't).

In fact, we probably *should* prevent the dropping of the temp schema? But that's independent from fixing this one.

The reason for the crash is 6d842be6c11, where Tom added an assert for passing null into %s. But I don't think we can blame that patch for the problem -- it's passing the NULL there in the first place that's the problem.

AFAICT the actual drop works fine, it's just the logging that crashes. So maybe we should just add a check and make it log something like "<dropped>" if pg_namespace_name() returns null?

Re: Autovaccuum vs temp tables crash

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> The reason for the crash is 6d842be6c11, where Tom added an assert for
> passing null into %s. But I don't think we can blame that patch for the
> problem -- it's passing the NULL there in the first place that's the
> problem.

Indeed; this crash existed on some platforms all along (which means
we'd better back-patch the fix).

> AFAICT the actual drop works fine, it's just the logging that crashes. So
> maybe we should just add a check and make it log something like "<dropped>"
> if pg_namespace_name() returns null?

+1 ... maybe "(dropped)", because we tend to use parens for this sort
of thing, I think.

            regards, tom lane


Re: Autovaccuum vs temp tables crash

From
Robert Haas
Date:
On Fri, Feb 22, 2019 at 3:43 AM Magnus Hagander <magnus@hagander.net> wrote:
> We are certainly not supposed to go DROP SCHEMA on the temp namespaces, ...

Actually, I think that's supposed to work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Autovaccuum vs temp tables crash

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Feb 22, 2019 at 3:43 AM Magnus Hagander <magnus@hagander.net> wrote:
>> We are certainly not supposed to go DROP SCHEMA on the temp namespaces, ...

> Actually, I think that's supposed to work.

If it's in active use by any session (including your own), that's not going
to have nice consequences; the owning session will have the OID in
static storage, and it will be unhappy when that OID becomes dangling.

            regards, tom lane


Re: Autovaccuum vs temp tables crash

From
Robert Haas
Date:
On Fri, Feb 22, 2019 at 12:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, Feb 22, 2019 at 3:43 AM Magnus Hagander <magnus@hagander.net> wrote:
> >> We are certainly not supposed to go DROP SCHEMA on the temp namespaces, ...
>
> > Actually, I think that's supposed to work.
>
> If it's in active use by any session (including your own), that's not going
> to have nice consequences; the owning session will have the OID in
> static storage, and it will be unhappy when that OID becomes dangling.

Maybe that's something we should fix?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Autovaccuum vs temp tables crash

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Feb 22, 2019 at 12:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Fri, Feb 22, 2019 at 3:43 AM Magnus Hagander <magnus@hagander.net> wrote:
>>>> We are certainly not supposed to go DROP SCHEMA on the temp namespaces, ...

>>> Actually, I think that's supposed to work.

>> If it's in active use by any session (including your own), that's not going
>> to have nice consequences; the owning session will have the OID in
>> static storage, and it will be unhappy when that OID becomes dangling.

> Maybe that's something we should fix?

Why?  It would likely be a significant amount of effort and added overhead,
to accomplish no obviously-useful goal.

Note that all the temp schemas are made as owned by the bootstrap
superuser, so there is no real argument to be made that people might
be expecting they should be able to delete them.

            regards, tom lane


Re: Autovaccuum vs temp tables crash

From
Robert Haas
Date:
On Fri, Feb 22, 2019 at 1:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Why?  It would likely be a significant amount of effort and added overhead,
> to accomplish no obviously-useful goal.
>
> Note that all the temp schemas are made as owned by the bootstrap
> superuser, so there is no real argument to be made that people might
> be expecting they should be able to delete them.

Hmm, well maybe you're right.  Just seems like an odd wart.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Autovaccuum vs temp tables crash

From
Michael Paquier
Date:
On Fri, Feb 22, 2019 at 09:45:42AM -0500, Tom Lane wrote:
> +1 ... maybe "(dropped)", because we tend to use parens for this sort
> of thing, I think.

+1.  Using "dropped" sounds good to me in this context.  Perhaps we
could have something more fancy like what's used for dropped columns?
It would be nice to get a reference to a schema, like say "dropped
temporary schema".
--
Michael

Attachment

Re: Autovaccuum vs temp tables crash

From
Magnus Hagander
Date:


On Fri, Feb 22, 2019 at 7:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Feb 22, 2019 at 1:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Why?  It would likely be a significant amount of effort and added overhead,
> to accomplish no obviously-useful goal.
>
> Note that all the temp schemas are made as owned by the bootstrap
> superuser, so there is no real argument to be made that people might
> be expecting they should be able to delete them.

Hmm, well maybe you're right.  Just seems like an odd wart.

Well, the way it works now is you can drop them. But if you then create another temp table in the same session, it will get an oid of the already dropped schema in the relnamespace column.

That just seems plain broken.

I think we need to either prevent dropping of temp namespaces *or* we need to create a new entry in pg_namespace in this particular case.

I wonder if other "fun" things could happen if you go rename the namespace, haven't tried that yet...
 
--

Re: Autovaccuum vs temp tables crash

From
Michael Paquier
Date:
On Sat, Feb 23, 2019 at 02:48:58PM +0100, Magnus Hagander wrote:
> I think we need to either prevent dropping of temp namespaces *or* we need
> to create a new entry in pg_namespace in this particular case.

Perhaps I am missing something, but it would be just more simple to
now allow users to restrict that?

> I wonder if other "fun" things could happen if you go rename the namespace,
> haven't tried that yet...

In this case the OID remains the same, still there are some cases
where we rely on the namespace name, and one is CLUSTER.
objectaddress.c uses as well get_namespace_name_or_temp(), which would
be messed up, so it would be better to prevent a temp namespace to be
renamed.  Could ALTER SCHEMA OWNER TO also be a problem?
--
Michael

Attachment

Re: Autovaccuum vs temp tables crash

From
Magnus Hagander
Date:


On Sat, Feb 23, 2019 at 12:29 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Feb 22, 2019 at 09:45:42AM -0500, Tom Lane wrote:
> +1 ... maybe "(dropped)", because we tend to use parens for this sort
> of thing, I think.

+1.  Using "dropped" sounds good to me in this context.  Perhaps we
could have something more fancy like what's used for dropped columns?
It would be nice to get a reference to a schema, like say "dropped
temporary schema".

I think that's unnecessary given the context:

2019-02-23 15:43:56.375 CET [17250] LOG:  autovacuum: dropping orphan temp table "postgres.(dropped).bar"

That said, it just moves the crash. Turns out the problem goes a lot deeper than just the logging, it's basically the cleanup of orphaned temp tables that's completely broken if you drop the namespace.

TRAP: FailedAssertion("!(relation->rd_backend != (-1))", File: "relcache.c", Line: 1085)
2019-02-23 15:43:56.378 CET [17146] LOG:  server process (PID 17250) was terminated by signal 6: Aborted

#2  0x0000563e0fe4bc83 in ExceptionalCondition (conditionName=conditionName@entry=0x563e10035fb0 "!(relation->rd_backend != (-1))", 
    errorType=errorType@entry=0x563e0fe9bf9d "FailedAssertion", fileName=fileName@entry=0x563e100357dc "relcache.c", 
    lineNumber=lineNumber@entry=1085) at assert.c:54
#3  0x0000563e0fe40e18 in RelationBuildDesc (targetRelId=24580, insertIt=insertIt@entry=true) at relcache.c:1085
#4  0x0000563e0fe41a86 in RelationIdGetRelation (relationId=<optimized out>, relationId@entry=24580) at relcache.c:1894
#5  0x0000563e0fa24b4c in relation_open (relationId=relationId@entry=24580, lockmode=lockmode@entry=8) at relation.c:59
#6  0x0000563e0fadcfea in heap_drop_with_catalog (relid=24580) at heap.c:1856
#7  0x0000563e0fad9145 in doDeletion (flags=21, object=<optimized out>) at dependency.c:1329
#8  deleteOneObject (flags=21, depRel=0x7ffd80db4808, object=<optimized out>) at dependency.c:1231
#9  deleteObjectsInList (targetObjects=targetObjects@entry=0x563e10640110, depRel=depRel@entry=0x7ffd80db4808, flags=flags@entry=21)
    at dependency.c:271
#10 0x0000563e0fad91f0 in performDeletion (object=object@entry=0x7ffd80db4944, behavior=behavior@entry=DROP_CASCADE, 
    flags=flags@entry=21) at dependency.c:352
#11 0x0000563e0fa13532 in do_autovacuum () at autovacuum.c:2269


So basically I think it's the wrong approach to try to fix this error message. We need to fix the underlying problem. Which is the ability to drop the temp table schemas and then create temp tables which ahve no schemas.

We could try to recreate the namespace if dropped. But a quick fix around that just moved coredumps around to a lot of other dependent places.

I think we're better off just peventing the explicit drop of a temp schema. See attached?

We'd also want to block things like:
ALTER SCHEMA pg_temp_3 RENAME TO foobar;
DROP SCHEMA foobar;

Are there any more things beyond RENAME we need to block?

-- 
Attachment

Re: Autovaccuum vs temp tables crash

From
Magnus Hagander
Date:


On Sat, Feb 23, 2019 at 4:18 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Feb 23, 2019 at 02:48:58PM +0100, Magnus Hagander wrote:
> I think we need to either prevent dropping of temp namespaces *or* we need
> to create a new entry in pg_namespace in this particular case.

Perhaps I am missing something, but it would be just more simple to
now allow users to restrict that?

I can't parse what you are saying here. Now allow users to restrict what?


> I wonder if other "fun" things could happen if you go rename the namespace,
> haven't tried that yet...

In this case the OID remains the same, still there are some cases
where we rely on the namespace name, and one is CLUSTER.
objectaddress.c uses as well get_namespace_name_or_temp(), which would
be messed up, so it would be better to prevent a temp namespace to be
renamed.  Could ALTER SCHEMA OWNER TO also be a problem?

Or possibly altering permissions on it? 

--

Re: Autovaccuum vs temp tables crash

From
Magnus Hagander
Date:


On Sat, Feb 23, 2019 at 4:28 PM Magnus Hagander <magnus@hagander.net> wrote:


On Sat, Feb 23, 2019 at 12:29 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Feb 22, 2019 at 09:45:42AM -0500, Tom Lane wrote:
> +1 ... maybe "(dropped)", because we tend to use parens for this sort
> of thing, I think.

+1.  Using "dropped" sounds good to me in this context.  Perhaps we
could have something more fancy like what's used for dropped columns?
It would be nice to get a reference to a schema, like say "dropped
temporary schema".

I think that's unnecessary given the context:

2019-02-23 15:43:56.375 CET [17250] LOG:  autovacuum: dropping orphan temp table "postgres.(dropped).bar"

That said, it just moves the crash. Turns out the problem goes a lot deeper than just the logging, it's basically the cleanup of orphaned temp tables that's completely broken if you drop the namespace.

TRAP: FailedAssertion("!(relation->rd_backend != (-1))", File: "relcache.c", Line: 1085)
2019-02-23 15:43:56.378 CET [17146] LOG:  server process (PID 17250) was terminated by signal 6: Aborted

#2  0x0000563e0fe4bc83 in ExceptionalCondition (conditionName=conditionName@entry=0x563e10035fb0 "!(relation->rd_backend != (-1))", 
    errorType=errorType@entry=0x563e0fe9bf9d "FailedAssertion", fileName=fileName@entry=0x563e100357dc "relcache.c", 
    lineNumber=lineNumber@entry=1085) at assert.c:54
#3  0x0000563e0fe40e18 in RelationBuildDesc (targetRelId=24580, insertIt=insertIt@entry=true) at relcache.c:1085
#4  0x0000563e0fe41a86 in RelationIdGetRelation (relationId=<optimized out>, relationId@entry=24580) at relcache.c:1894
#5  0x0000563e0fa24b4c in relation_open (relationId=relationId@entry=24580, lockmode=lockmode@entry=8) at relation.c:59
#6  0x0000563e0fadcfea in heap_drop_with_catalog (relid=24580) at heap.c:1856
#7  0x0000563e0fad9145 in doDeletion (flags=21, object=<optimized out>) at dependency.c:1329
#8  deleteOneObject (flags=21, depRel=0x7ffd80db4808, object=<optimized out>) at dependency.c:1231
#9  deleteObjectsInList (targetObjects=targetObjects@entry=0x563e10640110, depRel=depRel@entry=0x7ffd80db4808, flags=flags@entry=21)
    at dependency.c:271
#10 0x0000563e0fad91f0 in performDeletion (object=object@entry=0x7ffd80db4944, behavior=behavior@entry=DROP_CASCADE, 
    flags=flags@entry=21) at dependency.c:352
#11 0x0000563e0fa13532 in do_autovacuum () at autovacuum.c:2269


So basically I think it's the wrong approach to try to fix this error message. We need to fix the underlying problem. Which is the ability to drop the temp table schemas and then create temp tables which ahve no schemas.

We could try to recreate the namespace if dropped. But a quick fix around that just moved coredumps around to a lot of other dependent places.

I think we're better off just peventing the explicit drop of a temp schema. See attached?

We'd also want to block things like:
ALTER SCHEMA pg_temp_3 RENAME TO foobar;
DROP SCHEMA foobar;

Are there any more things beyond RENAME we need to block?


Ooh, RENAME has problems entirely unrelated to that:

ALTER SCHEMA pg_catalog RENAME TO foobar;

Yeah, that breaks things...

Renaming schemas only check that the *target* name is not reserved. Surely it should also check that the *source* name is not reserved? (Unless allowSystemTableMods)?

--

Re: Autovaccuum vs temp tables crash

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Fri, Feb 22, 2019 at 7:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Feb 22, 2019 at 1:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Note that all the temp schemas are made as owned by the bootstrap
>>> superuser, so there is no real argument to be made that people might
>>> be expecting they should be able to delete them.

>> Hmm, well maybe you're right.  Just seems like an odd wart.

> Well, the way it works now is you can drop them. But if you then create
> another temp table in the same session, it will get an oid of the already
> dropped schema in the relnamespace column.

Only if you're superuser.

> That just seems plain broken.

There are a *lot* of ways that a superuser can break things.  I'm not
real sure that this one is special enough that we need a defense
against it.

However, if someone held a gun to my head and said fix it, I'd be inclined
to do so by having temp-namespace creation insert a "pin" dependency into
pg_depend.  Arguably, the only reason we don't create all the temp
namespaces during bootstrap is because we aren't sure how many we'd need
--- but if we did do that, they'd presumably end up pinned.

> I wonder if other "fun" things could happen if you go rename the namespace,
> haven't tried that yet...

I put that one on exactly a par with renaming all the "=" operators.
Yes, the system will let a superuser do it, and no, it's not a good idea.

            regards, tom lane


Re: Autovaccuum vs temp tables crash

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> I think we're better off just peventing the explicit drop of a temp schema.
> See attached?

I think this is a poor implementation of a bad idea.  Would you like a
list of the ways a superuser can break the system?  We could start with
"DELETE FROM pg_proc;".

            regards, tom lane


Re: Autovaccuum vs temp tables crash

From
Magnus Hagander
Date:


On Sat, Feb 23, 2019 at 5:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> I think we're better off just peventing the explicit drop of a temp schema.
> See attached?

I think this is a poor implementation of a bad idea.  Would you like a
list of the ways a superuser can break the system?  We could start with
"DELETE FROM pg_proc;".

Yeah, true.

That said, I'm not sure there is much point in fixing the original problem either. It comes down to a "don't do that", as the system just keeps crashing even if we fix that one. Trying to fix every possible place that breaks if there are tables with invalid data in pg_class like that is not likely to work either. 

--

Re: Autovaccuum vs temp tables crash

From
Michael Paquier
Date:
On Sat, Feb 23, 2019 at 04:29:24PM +0100, Magnus Hagander wrote:
> On Sat, Feb 23, 2019 at 4:18 PM Michael Paquier <michael@paquier.xyz> wrote:
>> Perhaps I am missing something, but it would be just more simple to
>> now allow users to restrict that?
>>
>
> I can't parse what you are saying here. Now allow users to restrict
> what?

Second try after some sleep:
"Perhaps I am missing something, but it would be just more simple to
now allow users to drop it?"
--
Michael

Attachment

Re: Autovaccuum vs temp tables crash

From
Alvaro Herrera
Date:
On 2019-Feb-23, Tom Lane wrote:

> However, if someone held a gun to my head and said fix it, I'd be inclined
> to do so by having temp-namespace creation insert a "pin" dependency into
> pg_depend.  Arguably, the only reason we don't create all the temp
> namespaces during bootstrap is because we aren't sure how many we'd need
> --- but if we did do that, they'd presumably end up pinned.

Is there a problem if we start with very high max_backends and this pins
a few thousands schemas that are later no longer needed?  There's no
decent way to drop them ... (I'm not sure it matters all that much,
except for bloat in pg_namespace.)

How about hardcoding a pin for any schema that's within the current
max_backends?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Autovaccuum vs temp tables crash

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Feb-23, Tom Lane wrote:
>> However, if someone held a gun to my head and said fix it, I'd be inclined
>> to do so by having temp-namespace creation insert a "pin" dependency into
>> pg_depend.  Arguably, the only reason we don't create all the temp
>> namespaces during bootstrap is because we aren't sure how many we'd need
>> --- but if we did do that, they'd presumably end up pinned.

> Is there a problem if we start with very high max_backends and this pins
> a few thousands schemas that are later no longer needed?  There's no
> decent way to drop them ... (I'm not sure it matters all that much,
> except for bloat in pg_namespace.)

> How about hardcoding a pin for any schema that's within the current
> max_backends?

I remain skeptical that there's a problem here that so badly needs
fixed as to justify half-baked hacks in the dependency system.  We'd
be more likely to create problems than fix them.

The existing state of affairs is that a superuser who really needs to drop
a temp schema can do so, if she's careful that it's not active.  Pinning
things would break that, or at least add an additional roadblock.  If it's
some sort of virtual pin rather than a regular pg_depend entry, then it
*would* be impossible to get around (mumble ... DELETE FROM pg_namespace
... mumble).  As against that, what problem are we fixing by preventing
superusers from doing that?  A careless superuser can screw things up
arbitrarily badly in any case, so I'm not that fussed about the hazard
that the namespace isn't idle.

            regards, tom lane


Re: Autovaccuum vs temp tables crash

From
Michael Paquier
Date:
On Tue, Feb 26, 2019 at 07:21:40PM -0500, Tom Lane wrote:
> The existing state of affairs is that a superuser who really needs to drop
> a temp schema can do so, if she's careful that it's not active.  Pinning
> things would break that, or at least add an additional roadblock.  If it's
> some sort of virtual pin rather than a regular pg_depend entry, then it
> *would* be impossible to get around (mumble ... DELETE FROM pg_namespace
> ... mumble).  As against that, what problem are we fixing by preventing
> superusers from doing that?  A careless superuser can screw things up
> arbitrarily badly in any case, so I'm not that fussed about the hazard
> that the namespace isn't idle.

And when you try to do chirugy on a corrupted cluster, it can be on
the contrary very useful to be able to work with objects and
manipulate them more freely as a superuser.
--
Michael

Attachment