Thread: [bug fix] Suppress "autovacuum: found orphan temp table" message

[bug fix] Suppress "autovacuum: found orphan temp table" message

From

"MauMau"

Date:

18 July 2014, 14:35:01

Hello,

My customer reported a problem that the following message is output too
often.

LOG:  autovacuum: found orphan temp table "pg_temp_838"."some_table" in
database "some_db"
LOG:  autovacuum: found orphan temp table "pg_temp_902"."some_table" in
database "some_db"

They are using PostgreSQL 9.2.8 on a large Linux machine which has 32 cores,
hundreds of GB of RAM, and powerful storage (striped flash memory).  They
run a heavy read/write workload.

These messages appear in the server log once per second.  This is because
autovacuum_naptime is set to 1s.  I'm not sure if the setting is reasonable,
but they are running a write-intensive workload on a powerful machine.  So,
the frequent fine-grained autovacuum may make sense to prevent the table
bloat.

They asked questions like this:

1. Why and when are these messages are output?  Do we have to do something?
2. Won't they use up disk space?
3. Doesn't the output processing of these messages or its cause affect
performance?  We happen to be facing a performance problem, the cause of
which we haven't found yet.

I could answer the question 1, because I found some discussions in the
community ML, the oldest of which was in 2008.  But I haven't answered the
rest yet.

Their concern is plausible.  If users don't have to see this message, I
think we should not worry them with a flood of messages.

BTW, showing a hint message that directs the user to drop the schema is not
user-friendly.  First, the plenty of messages may have a bad impact on disk
space and performance until the user notices the message.    Second, it's
not necessarily safe to manually drop the pg_temp_n schema, because a new
session may have just reused that schema.  Third, if dropping the schema is
the solution, the user would say "then, let the database server do it."

So, I propose a simple fix to change the LOG level to DEBUG1.  I don't know
which of DEBUG1-DEBUG5 is appropriate, and any level is OK.  Could you
include this in 9.2.9?

Regards
MauMau

Attachment

suppress_orphan_temp_table_message.patch

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Andres Freund

Date:

18 July 2014, 15:13:55

Hi,

On 2014-07-18 23:38:09 +0900, MauMau wrote:
> My customer reported a problem that the following message is output too
> often.
> 
> LOG:  autovacuum: found orphan temp table "pg_temp_838"."some_table" in
> database "some_db"
> LOG:  autovacuum: found orphan temp table "pg_temp_902"."some_table" in
> database "some_db"

So they had server crashes of some form before - otherwise they
shouldn't see this because during ordinary shutdown the schema will have
been dropped. C.f. RemoveTempRelationsCallback().

> 1. Why and when are these messages are output?  Do we have to do
> something?

Yes, you should investigate how the situation came to be.

> 2. Won't they use up disk space?

Yes.

> 3. Doesn't the output processing of these messages or its cause affect
> performance?  We happen to be facing a performance problem, the cause of
> which we haven't found yet.

Meh. If that's the bottleneck you've bigger problems.

> So, I propose a simple fix to change the LOG level to DEBUG1.  I don't know
> which of DEBUG1-DEBUG5 is appropriate, and any level is OK.  Could you
> include this in 9.2.9?

Surely that's the wrong end to tackle this from. Hiding actual problems
is a seriously bad idea.

It'd be nice if we had infrastructure to do this at startup, but we
don't...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

"MauMau"

Date:

19 July 2014, 06:57:05

From: "Andres Freund" <andres@2ndquadrant.com>
> On 2014-07-18 23:38:09 +0900, MauMau wrote:
>> LOG:  autovacuum: found orphan temp table "pg_temp_838"."some_table" in
>> database "some_db"
>> LOG:  autovacuum: found orphan temp table "pg_temp_902"."some_table" in
>> database "some_db"
>
> So they had server crashes of some form before - otherwise they
> shouldn't see this because during ordinary shutdown the schema will have
> been dropped. C.f. RemoveTempRelationsCallback().

Yes, they are using streaming replication, and experienced failover.

>> 1. Why and when are these messages are output?  Do we have to do
>> something?
>
> Yes, you should investigate how the situation came to be.

Yes, as mentioned before, I know the reason thanks to the past mails of this 
community.  The situation is like this:

1. The applications were using temporary tables.  The rows for temporary 
tables were created in pg_namespace (one row for pg_temp_n) and pg_class. 
Those rows were replicated to the standby.   The data files for the 
temporary tables were not replicated.
2. The server crashed the standby was promoted to the primary.
3. The new primary performed recovery, but the rows for temporary tables in 
the system catalog were left.
4. The applications resumed processing.  However, the workload got lighter, 
so the zonbie pg_temp_n entries were not recycled.
5. autovacuum workers found the zonbie temporary table entries in the system 
catalog, repeatedly emitting lots of messages.

>> 3. Doesn't the output processing of these messages or its cause affect
>> performance?  We happen to be facing a performance problem, the cause of
>> which we haven't found yet.
>
> Meh. If that's the bottleneck you've bigger problems.

I guess the performance problem they are facing is not due to this message 
output, but I don't have evidence.  Anyway, I think worrying users with lots 
of messages is evil itself.

>> So, I propose a simple fix to change the LOG level to DEBUG1.  I don't 
>> know
>> which of DEBUG1-DEBUG5 is appropriate, and any level is OK.  Could you
>> include this in 9.2.9?
>
> Surely that's the wrong end to tackle this from. Hiding actual problems
> is a seriously bad idea.

No, there is no serious problem in the user operation in this situation. 
Server crash cannot be avoided, and must be anticipated.  The problem is 
that PostgreSQL makes users worried about lots of (probably) unnecessary 
messages.

> It'd be nice if we had infrastructure to do this at startup, but we
> don't...

Yes, ideally so.  It is the responsibility of the database server to clean 
up the zombie metadata (catalog entries).  But I understand there's not such 
infrastracture now.  If it's not (easily) possible, the best and only thing 
is to not make users concerned.  Is there any reason to output the message 
in the viewpoint of users, not the viewpoint of developers?

The problem is pressing.  The customer is trying to use PostgreSQL for very 
mission-critical system, and I wish PostgreSQL will get high reputation. 
Could you include this in 9.2.9?

Regards
MauMau

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

"MauMau"

Date:

22 July 2014, 01:05:58

> From: "Andres Freund" <andres@2ndquadrant.com>
>> On 2014-07-18 23:38:09 +0900, MauMau wrote:
>>> So, I propose a simple fix to change the LOG level to DEBUG1.  I don't 
>>> know
>>> which of DEBUG1-DEBUG5 is appropriate, and any level is OK.  Could you
>>> include this in 9.2.9?
>>
>> Surely that's the wrong end to tackle this from. Hiding actual problems
>> is a seriously bad idea.
>
> No, there is no serious problem in the user operation in this situation. 
> Server crash cannot be avoided, and must be anticipated.  The problem is 
> that PostgreSQL makes users worried about lots of (probably) unnecessary 
> messages.

Is there any problem if we don't output the message?  According the past 
discussion in this community, the user can just ignore this message.  If 
there is no problem, and the customer asks, I'd like to propose it as one 
temporary measure.

Regards
MauMau

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Andres Freund

Date:

22 July 2014, 06:31:24

On 2014-07-22 10:09:04 +0900, MauMau wrote:
> >From: "Andres Freund" <andres@2ndquadrant.com>
> >>On 2014-07-18 23:38:09 +0900, MauMau wrote:
> >>>So, I propose a simple fix to change the LOG level to DEBUG1.  I don't
> >>>know
> >>>which of DEBUG1-DEBUG5 is appropriate, and any level is OK.  Could you
> >>>include this in 9.2.9?
> >>
> >>Surely that's the wrong end to tackle this from. Hiding actual problems
> >>is a seriously bad idea.
> >
> >No, there is no serious problem in the user operation in this situation.
> >Server crash cannot be avoided, and must be anticipated.  The problem is
> >that PostgreSQL makes users worried about lots of (probably) unnecessary
> >messages.
> 
> Is there any problem if we don't output the message?

Yes. The user won't know that possibly gigabytes worth of diskspace
aren't freed.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

"MauMau"

Date:

22 July 2014, 08:02:07

From: "Andres Freund" <andres@2ndquadrant.com>
> On 2014-07-22 10:09:04 +0900, MauMau wrote:
>> Is there any problem if we don't output the message?
>
> Yes. The user won't know that possibly gigabytes worth of diskspace
> aren't freed.

RemovePgTempFiles() frees the disk space by removing temp relation files at 
server start.  In addition, the temp relation files are not replicated to 
the standby server of streaming replication (this is the customer's case). 
So, those messages seem no more than just the noise.

With this, are those messages really necessary?

Regards
MauMau

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Andres Freund

Date:

22 July 2014, 09:39:37

On 2014-07-22 17:05:22 +0900, MauMau wrote:
> From: "Andres Freund" <andres@2ndquadrant.com>
> >On 2014-07-22 10:09:04 +0900, MauMau wrote:
> >>Is there any problem if we don't output the message?
> >
> >Yes. The user won't know that possibly gigabytes worth of diskspace
> >aren't freed.
> 
> RemovePgTempFiles() frees the disk space by removing temp relation files at
> server start.

But it's not called during a crash restart.

> In addition, the temp relation files are not replicated to
> the standby server of streaming replication (this is the customer's
> case).

So?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

"MauMau"

Date:

22 July 2014, 10:10:56

From: "Andres Freund" <andres@2ndquadrant.com>
> On 2014-07-22 17:05:22 +0900, MauMau wrote:
>> RemovePgTempFiles() frees the disk space by removing temp relation files 
>> at
>> server start.
>
> But it's not called during a crash restart.

Yes, the comment of the function says:
* NOTE: we could, but don't, call this during a post-backend-crash restart* cycle.  The argument for not doing it is
thatsomeone might want to 

examine* the temp files for debugging purposes.  This does however mean that* OpenTemporaryFile had better allow for
collisionwith an existing temp* file name.

But this is true if restart_after_crash = on in postgresql.conf, because the 
crash restart only occurs in that case.  However, in HA cluster, whether it 
is shared-disk or replication, restart_after_crash is set to off, isn't it?

Moreover, as the comment says, the behavior of keeping leftover temp files 
is for debugging by developers.  It's not helpful for users, isn't it?  I 
thought messages of DEBUG level is more appropriate, because the behavior is 
for debugging purposes.

>> In addition, the temp relation files are not replicated to
>> the standby server of streaming replication (this is the customer's
>> case).
>
> So?

Yes, so nobody can convince serious customers that the current behavior 
makes real sense.

Could you please reconsider this?

Regards
MauMau

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Andres Freund

Date:

22 July 2014, 10:24:07

On 2014-07-22 19:13:56 +0900, MauMau wrote:
> From: "Andres Freund" <andres@2ndquadrant.com>
> >On 2014-07-22 17:05:22 +0900, MauMau wrote:
> >>RemovePgTempFiles() frees the disk space by removing temp relation files
> >>at
> >>server start.
> >
> >But it's not called during a crash restart.
> 
> Yes, the comment of the function says:
> 
> * NOTE: we could, but don't, call this during a post-backend-crash restart
> * cycle.  The argument for not doing it is that someone might want to
> examine
> * the temp files for debugging purposes.  This does however mean that
> * OpenTemporaryFile had better allow for collision with an existing temp
> * file name.
> 
> But this is true if restart_after_crash = on in postgresql.conf, because the
> crash restart only occurs in that case.  However, in HA cluster, whether it
> is shared-disk or replication, restart_after_crash is set to off, isn't it?

In almost all setups I've seen it's set to on, even in HA scenarios.

> Moreover, as the comment says, the behavior of keeping leftover temp files
> is for debugging by developers.  It's not helpful for users, isn't it?  I
> thought messages of DEBUG level is more appropriate, because the behavior is
> for debugging purposes.

GRR. That doesn't change the fact that there'll be files left over after
a crash restart.

> Yes, so nobody can convince serious customers that the current behavior
> makes real sense.

I think you're making lots of noise over a trivial log message.

> Could you please reconsider this?

No. Just removing a warning isn't the way to solve this. If you want to
improve things you'll actually need to improve things not just stick
your head into the sand.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

"MauMau"

Date:

22 July 2014, 13:14:48

From: "Andres Freund" <andres@2ndquadrant.com>
> On 2014-07-22 19:13:56 +0900, MauMau wrote:
>> But this is true if restart_after_crash = on in postgresql.conf, because 
>> the
>> crash restart only occurs in that case.  However, in HA cluster, whether 
>> it
>> is shared-disk or replication, restart_after_crash is set to off, isn't 
>> it?
>
> In almost all setups I've seen it's set to on, even in HA scenarios.

I'm afraid that's because people don't notice the existence or purpose of 
this parameter.  The 9.1 release note says:

Add restart_after_crash setting which disables automatic server restart 
after a backend crash (Robert Haas)
This allows external cluster management software to control whether the 
database server restarts or not.

Reading this, I guess the parameter was introduced, and should be used, for 
HA environments controlled by the clusterware.  Restarting the database 
server on the same machine may fail, or the restarted server may fail again, 
due to the broken hardware components, so I guess it was considered better 
to let the clusterware determine what to do.

>> Moreover, as the comment says, the behavior of keeping leftover temp 
>> files
>> is for debugging by developers.  It's not helpful for users, isn't it?  I
>> thought messages of DEBUG level is more appropriate, because the behavior 
>> is
>> for debugging purposes.
>
> GRR. That doesn't change the fact that there'll be files left over after
> a crash restart.

Yes... that's a source of headache.  But please understand that there's a 
problem -- trying to leave temp relations just for debugging is causing a 
flood of messages, which the customer is actually concerned about.

> I think you're making lots of noise over a trivial log message.

Maybe so, and I hope so.  I may be too nervous about what the customer will 
ask and/or request next.  If they request something similar to what I 
proposed here, let me consult you again.

>> Could you please reconsider this?
>
> No. Just removing a warning isn't the way to solve this. If you want to
> improve things you'll actually need to improve things not just stick
> your head into the sand.

I have a few ideas below, but none of them seems better than the original 
proposal.  What do you think?

1. startup process deletes the catalog entries and data files of leftover 
temp relations at the end of recovery.
This is probably difficult, impossible or undesirable, because the startup 
process cannot access system catalogs.  Even if it's possible, it is against 
the developers' desire to leave temp relation files for debugging.

2. autovacuum launcher deletes the catalog entries and data files of 
leftover temp relations during its initialization.
This may be possible, but it is against the developers' desire to leave temp 
relation files for debugging.

3. Emit the "orphan temp relation" message only when the associated data 
file actually exists.
autovacuum workers check if the temp relation file is left over with stat(). 
If not, delete the catalog entry in pg_class silently.
This sounds reasonable because the purpose of the message is to notify users 
of potential disk space shortage.  In the streaming replication case, no 
data files should exist on the promoted new primary, so no messages should 
be emitted.
However, in the shared-disk HA cluster case, the temp relation files are 
left over on the shared disk, so this fix doesn't improve anything.

4. Emit the "orphan temp relation" message only when restart_after_crash is 
on.
i.e. ereport(restart_after_crash ? LOG : DEBUG1, ...

Regards
MauMau

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Robert Haas

Date:

22 July 2014, 13:39:23

On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Yes, so nobody can convince serious customers that the current behavior
>> makes real sense.
>
> I think you're making lots of noise over a trivial log message.
>
>> Could you please reconsider this?
>
> No. Just removing a warning isn't the way to solve this. If you want to
> improve things you'll actually need to improve things not just stick
> your head into the sand.

I've studied this area of the code before, and I would actually
proposed to fix this in the opposite way - namely, adopt the logic
currently used for the wraparound case, which drops the temp table,
even if wraparound is not an issue.  The current code seems to be
laboring under the impression that there's some benefit to keeping the
temporary relation around, which, as far as I can see, there isn't.
Now, you could perhaps argue that it's useful for forensics, but that
presumes that the situation where a backend fails to crash without
cleaning up its temporary relations is exciting enough to merit an
investigation, which I believe to be false.
RemoveTempRelationsCallback just does this:
               AbortOutOfAnyTransaction();               StartTransactionCommand();
RemoveTempRelations(myTempNamespace);              CommitTransactionCommand();

RemoveTempRelations uses:

deleteWhatDependsOn(&object, false);

These are pretty high-level operations, and there are all kinds of
reasons they could fail.  Many of those reasons do indeed involve the
system being messed up in some way, but it's likely that the user will
know about that for other reasons anyway - e.g. if the cleanup fails
because the disk where the files are located has gone read-only at the
operating system level, things are going to go downhill in a hurry.
When the user restarts, they expect recovery - or other automatic
cleanup mechanisms - to put things right.  And normally they do: the
first backend to get the same backend ID as any orphaned temp schema
will clear it out anyway on first use - completely silently - but if
it so happens that a crashed backend had a very high backend ID and
that temp table usage is relatively uncommon, then the user gets log
spam from now and until eternity because of a problem that, in their
mind, is already fixed.  Even better, they will typically be
completely unable to connect the log spam back to the event that
triggered it, and will have no idea how to put it right.

So while I disagree with MauMau's proposed fix, I agree with him that
this error message is useless log spam.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Andres Freund

Date:

22 July 2014, 14:09:25

On 2014-07-22 09:39:13 -0400, Robert Haas wrote:
> On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> Yes, so nobody can convince serious customers that the current behavior
> >> makes real sense.
> >
> > I think you're making lots of noise over a trivial log message.
> >
> >> Could you please reconsider this?
> >
> > No. Just removing a warning isn't the way to solve this. If you want to
> > improve things you'll actually need to improve things not just stick
> > your head into the sand.
> 
> I've studied this area of the code before, and I would actually
> proposed to fix this in the opposite way - namely, adopt the logic
> currently used for the wraparound case, which drops the temp table,
> even if wraparound is not an issue.

I'm absolutely not opposed to fixing this for real. I doubt it's
backpatching material, but that's something to judge when we know what
to do.

> The current code seems to be
> laboring under the impression that there's some benefit to keeping the
> temporary relation around, which, as far as I can see, there isn't.
> Now, you could perhaps argue that it's useful for forensics, but that
> presumes that the situation where a backend fails to crash without
> cleaning up its temporary relations is exciting enough to merit an
> investigation, which I believe to be false.

I think the picture here changed with the introduction of the
restart_after_crash GUC - before it it was pretty hard to investigate
anything that involved crashes. So I'm ok with changing things
around. But I think it's more complex than just doing the if
(wraparound) in do_autovacuum().

a) There very well could be a backend reconnecting to that  backendId. Then we potentially might try to remove the temp
schema from two backends - I'm not sure that's always going to end up going  well. There's already a race window, but
it'spretty darn unlikely to  hit it right now because the wraparound case pretty much implies that  nothing has used
thatbackendid slot for a while.  I guess we could do something like:

  LockDatabaseObject(tempschema);  if (SearchSysCacheExists1)     /* bailout */  performDeletion(...);

b) I think at the very least we also need to call RemovePgTempFiles()  during crash restart.

> RemoveTempRelationsCallback just does this:
> 
>                 AbortOutOfAnyTransaction();
>                 StartTransactionCommand();
>                 RemoveTempRelations(myTempNamespace);
>                 CommitTransactionCommand();
> 
> RemoveTempRelations uses:
> 
> deleteWhatDependsOn(&object, false);

> These are pretty high-level operations, and there are all kinds of
> reasons they could fail.

One could argue, without being very convincing from my pov, that that's
a reason not to always do it from autovacuum because it'll prevent
vacuum from progressing past the borked temporary relation.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Andres Freund

Date:

22 July 2014, 14:12:23

On 2014-07-22 22:18:03 +0900, MauMau wrote:
> From: "Andres Freund" <andres@2ndquadrant.com>
> >On 2014-07-22 19:13:56 +0900, MauMau wrote:
> >>But this is true if restart_after_crash = on in postgresql.conf, because
> >>the
> >>crash restart only occurs in that case.  However, in HA cluster, whether
> >>it
> >>is shared-disk or replication, restart_after_crash is set to off, isn't
> >>it?
> >
> >In almost all setups I've seen it's set to on, even in HA scenarios.
> 
> I'm afraid that's because people don't notice the existence or purpose of
> this parameter.  The 9.1 release note says:

I think it's also because it's simply a good idea to keep it on in many
production scenarios. Failing over 'just' because something caused a
crash restart is often too expensive.

> >No. Just removing a warning isn't the way to solve this. If you want to
> >improve things you'll actually need to improve things not just stick
> >your head into the sand.

> I have a few ideas below, but none of them seems better than the original
> proposal.  What do you think?

> 1. startup process deletes the catalog entries and data files of leftover
> temp relations at the end of recovery.

> This is probably difficult, impossible or undesirable, because the startup
> process cannot access system catalogs.  Even if it's possible, it is against
> the developers' desire to leave temp relation files for debugging.
> 
> 2. autovacuum launcher deletes the catalog entries and data files of
> leftover temp relations during its initialization.
> This may be possible, but it is against the developers' desire to leave temp
> relation files for debugging.

I think that desire is pretty much antiquated and doesn't really count
for much.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Tom Lane

Date:

22 July 2014, 14:17:39

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> No. Just removing a warning isn't the way to solve this. If you want to
>> improve things you'll actually need to improve things not just stick
>> your head into the sand.

> I've studied this area of the code before, and I would actually
> proposed to fix this in the opposite way - namely, adopt the logic
> currently used for the wraparound case, which drops the temp table,
> even if wraparound is not an issue.  The current code seems to be
> laboring under the impression that there's some benefit to keeping the
> temporary relation around, which, as far as I can see, there isn't.

FWIW, I agree with Andres on this.  The right response is to drop the temp
table not complain about log spam.  Or even more to the point, investigate
why it's there in the first place; perhaps there's an actual fixable bug
somewhere in there.  But deciding that orphaned temp tables are normal
is *not* an improvement IMO.
        regards, tom lane

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Andres Freund

Date:

22 July 2014, 14:26:02

On 2014-07-22 10:17:15 -0400, Tom Lane wrote:
> Or even more to the point, investigate why it's there in the first
> place; perhaps there's an actual fixable bug somewhere in there.

I think MauMau's scenario of a failover to another database explains
their existance - there's no step that'd remove them after promoting a
standby.

So there indeed is a need to have a sensible mechanism for removing them
at some point. But it should be about removing, not ignoring them.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Tom Lane

Date:

22 July 2014, 14:54:21

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-07-22 10:17:15 -0400, Tom Lane wrote:
>> Or even more to the point, investigate why it's there in the first
>> place; perhaps there's an actual fixable bug somewhere in there.

> I think MauMau's scenario of a failover to another database explains
> their existance - there's no step that'd remove them after promoting a
> standby.

> So there indeed is a need to have a sensible mechanism for removing them
> at some point. But it should be about removing, not ignoring them.

Agreed.  Note that RemovePgTempFiles, as such, only reclaims disk space.
It does not clean out the pg_class entries, which means that just running
that at standby promotion would do nothing to get rid of autovacuum's
whining.
        regards, tom lane

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

Andres Freund

Date:

22 July 2014, 15:13:04

On 2014-07-23 00:13:26 +0900, MauMau wrote:
> Hello, Robert-san, Andres-san, Tom-san,
> 
> From: "Andres Freund" <andres@2ndquadrant.com>
> >a) There very well could be a backend reconnecting to that
> >  backendId. Then we potentially might try to remove the temp schema
> >  from two backends - I'm not sure that's always going to end up going
> >  well. There's already a race window, but it's pretty darn unlikely to
> >  hit it right now because the wraparound case pretty much implies that
> >  nothing has used that backendid slot for a while.
> >  I guess we could do something like:
> >
> >  LockDatabaseObject(tempschema);
> >  if (SearchSysCacheExists1)
> >     /* bailout */
> >  performDeletion(...);
> >
> >b) I think at the very least we also need to call RemovePgTempFiles()
> >  during crash restart.
> 
> Thank you for showing the direction.  I'll investigate the code.  But that
> will be tomorrow as it's already past midnight.
> 
> Could it be included in 9.2.9 if I could submit the patch tomorrow? (I'm not
> confident I can finish it...)  I'd really appreciate it if you could create
> the fix, if tomorrow will be late.

9.2.9 is already stamped, so no. But even without that, I don't think
that this is a change that should be rushed into the backbranches. The
risk/benefit ratio just isn't on the size of doing things hastily.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From

"MauMau"

Date:

22 July 2014, 15:18:06

Hello, Robert-san, Andres-san, Tom-san,

From: "Andres Freund" <andres@2ndquadrant.com>
> a) There very well could be a backend reconnecting to that
>   backendId. Then we potentially might try to remove the temp schema
>   from two backends - I'm not sure that's always going to end up going
>   well. There's already a race window, but it's pretty darn unlikely to
>   hit it right now because the wraparound case pretty much implies that
>   nothing has used that backendid slot for a while.
>   I guess we could do something like:
>
>   LockDatabaseObject(tempschema);
>   if (SearchSysCacheExists1)
>      /* bailout */
>   performDeletion(...);
>
> b) I think at the very least we also need to call RemovePgTempFiles()
>   during crash restart.

Thank you for showing the direction.  I'll investigate the code.  But that 
will be tomorrow as it's already past midnight.

Could it be included in 9.2.9 if I could submit the patch tomorrow? (I'm not 
confident I can finish it...)  I'd really appreciate it if you could create 
the fix, if tomorrow will be late.

Regards
MauMau