Re: [bug fix] Suppress "autovacuum: found orphan temp table" message - Mailing list pgsql-hackers

From MauMau
Subject Re: [bug fix] Suppress "autovacuum: found orphan temp table" message
Date
Msg-id 1C2948EA6273403C901A8C4EF4E3488B@maumau
Whole thread Raw
In response to Re: [bug fix] Suppress "autovacuum: found orphan temp table" message  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: [bug fix] Suppress "autovacuum: found orphan temp table" message
List pgsql-hackers
From: "Andres Freund" <andres@2ndquadrant.com>
> On 2014-07-22 19:13:56 +0900, MauMau wrote:
>> But this is true if restart_after_crash = on in postgresql.conf, because 
>> the
>> crash restart only occurs in that case.  However, in HA cluster, whether 
>> it
>> is shared-disk or replication, restart_after_crash is set to off, isn't 
>> it?
>
> In almost all setups I've seen it's set to on, even in HA scenarios.

I'm afraid that's because people don't notice the existence or purpose of 
this parameter.  The 9.1 release note says:

Add restart_after_crash setting which disables automatic server restart 
after a backend crash (Robert Haas)
This allows external cluster management software to control whether the 
database server restarts or not.

Reading this, I guess the parameter was introduced, and should be used, for 
HA environments controlled by the clusterware.  Restarting the database 
server on the same machine may fail, or the restarted server may fail again, 
due to the broken hardware components, so I guess it was considered better 
to let the clusterware determine what to do.


>> Moreover, as the comment says, the behavior of keeping leftover temp 
>> files
>> is for debugging by developers.  It's not helpful for users, isn't it?  I
>> thought messages of DEBUG level is more appropriate, because the behavior 
>> is
>> for debugging purposes.
>
> GRR. That doesn't change the fact that there'll be files left over after
> a crash restart.

Yes... that's a source of headache.  But please understand that there's a 
problem -- trying to leave temp relations just for debugging is causing a 
flood of messages, which the customer is actually concerned about.

> I think you're making lots of noise over a trivial log message.

Maybe so, and I hope so.  I may be too nervous about what the customer will 
ask and/or request next.  If they request something similar to what I 
proposed here, let me consult you again.


>> Could you please reconsider this?
>
> No. Just removing a warning isn't the way to solve this. If you want to
> improve things you'll actually need to improve things not just stick
> your head into the sand.


I have a few ideas below, but none of them seems better than the original 
proposal.  What do you think?

1. startup process deletes the catalog entries and data files of leftover 
temp relations at the end of recovery.
This is probably difficult, impossible or undesirable, because the startup 
process cannot access system catalogs.  Even if it's possible, it is against 
the developers' desire to leave temp relation files for debugging.

2. autovacuum launcher deletes the catalog entries and data files of 
leftover temp relations during its initialization.
This may be possible, but it is against the developers' desire to leave temp 
relation files for debugging.

3. Emit the "orphan temp relation" message only when the associated data 
file actually exists.
autovacuum workers check if the temp relation file is left over with stat(). 
If not, delete the catalog entry in pg_class silently.
This sounds reasonable because the purpose of the message is to notify users 
of potential disk space shortage.  In the streaming replication case, no 
data files should exist on the promoted new primary, so no messages should 
be emitted.
However, in the shared-disk HA cluster case, the temp relation files are 
left over on the shared disk, so this fix doesn't improve anything.

4. Emit the "orphan temp relation" message only when restart_after_crash is 
on.
i.e. ereport(restart_after_crash ? LOG : DEBUG1, ...


Regards
MauMau




pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Production block comparison facility
Next
From: Robert Haas
Date:
Subject: Re: [bug fix] Suppress "autovacuum: found orphan temp table" message