Thread: [bug fix] Suppress "autovacuum: found orphan temp table" message
Hello, My customer reported a problem that the following message is output too often. LOG: autovacuum: found orphan temp table "pg_temp_838"."some_table" in database "some_db" LOG: autovacuum: found orphan temp table "pg_temp_902"."some_table" in database "some_db" They are using PostgreSQL 9.2.8 on a large Linux machine which has 32 cores, hundreds of GB of RAM, and powerful storage (striped flash memory). They run a heavy read/write workload. These messages appear in the server log once per second. This is because autovacuum_naptime is set to 1s. I'm not sure if the setting is reasonable, but they are running a write-intensive workload on a powerful machine. So, the frequent fine-grained autovacuum may make sense to prevent the table bloat. They asked questions like this: 1. Why and when are these messages are output? Do we have to do something? 2. Won't they use up disk space? 3. Doesn't the output processing of these messages or its cause affect performance? We happen to be facing a performance problem, the cause of which we haven't found yet. I could answer the question 1, because I found some discussions in the community ML, the oldest of which was in 2008. But I haven't answered the rest yet. Their concern is plausible. If users don't have to see this message, I think we should not worry them with a flood of messages. BTW, showing a hint message that directs the user to drop the schema is not user-friendly. First, the plenty of messages may have a bad impact on disk space and performance until the user notices the message. Second, it's not necessarily safe to manually drop the pg_temp_n schema, because a new session may have just reused that schema. Third, if dropping the schema is the solution, the user would say "then, let the database server do it." So, I propose a simple fix to change the LOG level to DEBUG1. I don't know which of DEBUG1-DEBUG5 is appropriate, and any level is OK. Could you include this in 9.2.9? Regards MauMau
Attachment
Hi, On 2014-07-18 23:38:09 +0900, MauMau wrote: > My customer reported a problem that the following message is output too > often. > > LOG: autovacuum: found orphan temp table "pg_temp_838"."some_table" in > database "some_db" > LOG: autovacuum: found orphan temp table "pg_temp_902"."some_table" in > database "some_db" So they had server crashes of some form before - otherwise they shouldn't see this because during ordinary shutdown the schema will have been dropped. C.f. RemoveTempRelationsCallback(). > 1. Why and when are these messages are output? Do we have to do > something? Yes, you should investigate how the situation came to be. > 2. Won't they use up disk space? Yes. > 3. Doesn't the output processing of these messages or its cause affect > performance? We happen to be facing a performance problem, the cause of > which we haven't found yet. Meh. If that's the bottleneck you've bigger problems. > So, I propose a simple fix to change the LOG level to DEBUG1. I don't know > which of DEBUG1-DEBUG5 is appropriate, and any level is OK. Could you > include this in 9.2.9? Surely that's the wrong end to tackle this from. Hiding actual problems is a seriously bad idea. It'd be nice if we had infrastructure to do this at startup, but we don't... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
From: "Andres Freund" <andres@2ndquadrant.com> > On 2014-07-18 23:38:09 +0900, MauMau wrote: >> LOG: autovacuum: found orphan temp table "pg_temp_838"."some_table" in >> database "some_db" >> LOG: autovacuum: found orphan temp table "pg_temp_902"."some_table" in >> database "some_db" > > So they had server crashes of some form before - otherwise they > shouldn't see this because during ordinary shutdown the schema will have > been dropped. C.f. RemoveTempRelationsCallback(). Yes, they are using streaming replication, and experienced failover. >> 1. Why and when are these messages are output? Do we have to do >> something? > > Yes, you should investigate how the situation came to be. Yes, as mentioned before, I know the reason thanks to the past mails of this community. The situation is like this: 1. The applications were using temporary tables. The rows for temporary tables were created in pg_namespace (one row for pg_temp_n) and pg_class. Those rows were replicated to the standby. The data files for the temporary tables were not replicated. 2. The server crashed the standby was promoted to the primary. 3. The new primary performed recovery, but the rows for temporary tables in the system catalog were left. 4. The applications resumed processing. However, the workload got lighter, so the zonbie pg_temp_n entries were not recycled. 5. autovacuum workers found the zonbie temporary table entries in the system catalog, repeatedly emitting lots of messages. >> 3. Doesn't the output processing of these messages or its cause affect >> performance? We happen to be facing a performance problem, the cause of >> which we haven't found yet. > > Meh. If that's the bottleneck you've bigger problems. I guess the performance problem they are facing is not due to this message output, but I don't have evidence. Anyway, I think worrying users with lots of messages is evil itself. >> So, I propose a simple fix to change the LOG level to DEBUG1. I don't >> know >> which of DEBUG1-DEBUG5 is appropriate, and any level is OK. Could you >> include this in 9.2.9? > > Surely that's the wrong end to tackle this from. Hiding actual problems > is a seriously bad idea. No, there is no serious problem in the user operation in this situation. Server crash cannot be avoided, and must be anticipated. The problem is that PostgreSQL makes users worried about lots of (probably) unnecessary messages. > It'd be nice if we had infrastructure to do this at startup, but we > don't... Yes, ideally so. It is the responsibility of the database server to clean up the zombie metadata (catalog entries). But I understand there's not such infrastracture now. If it's not (easily) possible, the best and only thing is to not make users concerned. Is there any reason to output the message in the viewpoint of users, not the viewpoint of developers? The problem is pressing. The customer is trying to use PostgreSQL for very mission-critical system, and I wish PostgreSQL will get high reputation. Could you include this in 9.2.9? Regards MauMau
> From: "Andres Freund" <andres@2ndquadrant.com> >> On 2014-07-18 23:38:09 +0900, MauMau wrote: >>> So, I propose a simple fix to change the LOG level to DEBUG1. I don't >>> know >>> which of DEBUG1-DEBUG5 is appropriate, and any level is OK. Could you >>> include this in 9.2.9? >> >> Surely that's the wrong end to tackle this from. Hiding actual problems >> is a seriously bad idea. > > No, there is no serious problem in the user operation in this situation. > Server crash cannot be avoided, and must be anticipated. The problem is > that PostgreSQL makes users worried about lots of (probably) unnecessary > messages. Is there any problem if we don't output the message? According the past discussion in this community, the user can just ignore this message. If there is no problem, and the customer asks, I'd like to propose it as one temporary measure. Regards MauMau
On 2014-07-22 10:09:04 +0900, MauMau wrote: > >From: "Andres Freund" <andres@2ndquadrant.com> > >>On 2014-07-18 23:38:09 +0900, MauMau wrote: > >>>So, I propose a simple fix to change the LOG level to DEBUG1. I don't > >>>know > >>>which of DEBUG1-DEBUG5 is appropriate, and any level is OK. Could you > >>>include this in 9.2.9? > >> > >>Surely that's the wrong end to tackle this from. Hiding actual problems > >>is a seriously bad idea. > > > >No, there is no serious problem in the user operation in this situation. > >Server crash cannot be avoided, and must be anticipated. The problem is > >that PostgreSQL makes users worried about lots of (probably) unnecessary > >messages. > > Is there any problem if we don't output the message? Yes. The user won't know that possibly gigabytes worth of diskspace aren't freed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
From: "Andres Freund" <andres@2ndquadrant.com> > On 2014-07-22 10:09:04 +0900, MauMau wrote: >> Is there any problem if we don't output the message? > > Yes. The user won't know that possibly gigabytes worth of diskspace > aren't freed. RemovePgTempFiles() frees the disk space by removing temp relation files at server start. In addition, the temp relation files are not replicated to the standby server of streaming replication (this is the customer's case). So, those messages seem no more than just the noise. With this, are those messages really necessary? Regards MauMau
On 2014-07-22 17:05:22 +0900, MauMau wrote: > From: "Andres Freund" <andres@2ndquadrant.com> > >On 2014-07-22 10:09:04 +0900, MauMau wrote: > >>Is there any problem if we don't output the message? > > > >Yes. The user won't know that possibly gigabytes worth of diskspace > >aren't freed. > > RemovePgTempFiles() frees the disk space by removing temp relation files at > server start. But it's not called during a crash restart. > In addition, the temp relation files are not replicated to > the standby server of streaming replication (this is the customer's > case). So? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
From: "Andres Freund" <andres@2ndquadrant.com> > On 2014-07-22 17:05:22 +0900, MauMau wrote: >> RemovePgTempFiles() frees the disk space by removing temp relation files >> at >> server start. > > But it's not called during a crash restart. Yes, the comment of the function says: * NOTE: we could, but don't, call this during a post-backend-crash restart* cycle. The argument for not doing it is thatsomeone might want to examine* the temp files for debugging purposes. This does however mean that* OpenTemporaryFile had better allow for collisionwith an existing temp* file name. But this is true if restart_after_crash = on in postgresql.conf, because the crash restart only occurs in that case. However, in HA cluster, whether it is shared-disk or replication, restart_after_crash is set to off, isn't it? Moreover, as the comment says, the behavior of keeping leftover temp files is for debugging by developers. It's not helpful for users, isn't it? I thought messages of DEBUG level is more appropriate, because the behavior is for debugging purposes. >> In addition, the temp relation files are not replicated to >> the standby server of streaming replication (this is the customer's >> case). > > So? Yes, so nobody can convince serious customers that the current behavior makes real sense. Could you please reconsider this? Regards MauMau
On 2014-07-22 19:13:56 +0900, MauMau wrote: > From: "Andres Freund" <andres@2ndquadrant.com> > >On 2014-07-22 17:05:22 +0900, MauMau wrote: > >>RemovePgTempFiles() frees the disk space by removing temp relation files > >>at > >>server start. > > > >But it's not called during a crash restart. > > Yes, the comment of the function says: > > * NOTE: we could, but don't, call this during a post-backend-crash restart > * cycle. The argument for not doing it is that someone might want to > examine > * the temp files for debugging purposes. This does however mean that > * OpenTemporaryFile had better allow for collision with an existing temp > * file name. > > But this is true if restart_after_crash = on in postgresql.conf, because the > crash restart only occurs in that case. However, in HA cluster, whether it > is shared-disk or replication, restart_after_crash is set to off, isn't it? In almost all setups I've seen it's set to on, even in HA scenarios. > Moreover, as the comment says, the behavior of keeping leftover temp files > is for debugging by developers. It's not helpful for users, isn't it? I > thought messages of DEBUG level is more appropriate, because the behavior is > for debugging purposes. GRR. That doesn't change the fact that there'll be files left over after a crash restart. > Yes, so nobody can convince serious customers that the current behavior > makes real sense. I think you're making lots of noise over a trivial log message. > Could you please reconsider this? No. Just removing a warning isn't the way to solve this. If you want to improve things you'll actually need to improve things not just stick your head into the sand. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
From: "Andres Freund" <andres@2ndquadrant.com> > On 2014-07-22 19:13:56 +0900, MauMau wrote: >> But this is true if restart_after_crash = on in postgresql.conf, because >> the >> crash restart only occurs in that case. However, in HA cluster, whether >> it >> is shared-disk or replication, restart_after_crash is set to off, isn't >> it? > > In almost all setups I've seen it's set to on, even in HA scenarios. I'm afraid that's because people don't notice the existence or purpose of this parameter. The 9.1 release note says: Add restart_after_crash setting which disables automatic server restart after a backend crash (Robert Haas) This allows external cluster management software to control whether the database server restarts or not. Reading this, I guess the parameter was introduced, and should be used, for HA environments controlled by the clusterware. Restarting the database server on the same machine may fail, or the restarted server may fail again, due to the broken hardware components, so I guess it was considered better to let the clusterware determine what to do. >> Moreover, as the comment says, the behavior of keeping leftover temp >> files >> is for debugging by developers. It's not helpful for users, isn't it? I >> thought messages of DEBUG level is more appropriate, because the behavior >> is >> for debugging purposes. > > GRR. That doesn't change the fact that there'll be files left over after > a crash restart. Yes... that's a source of headache. But please understand that there's a problem -- trying to leave temp relations just for debugging is causing a flood of messages, which the customer is actually concerned about. > I think you're making lots of noise over a trivial log message. Maybe so, and I hope so. I may be too nervous about what the customer will ask and/or request next. If they request something similar to what I proposed here, let me consult you again. >> Could you please reconsider this? > > No. Just removing a warning isn't the way to solve this. If you want to > improve things you'll actually need to improve things not just stick > your head into the sand. I have a few ideas below, but none of them seems better than the original proposal. What do you think? 1. startup process deletes the catalog entries and data files of leftover temp relations at the end of recovery. This is probably difficult, impossible or undesirable, because the startup process cannot access system catalogs. Even if it's possible, it is against the developers' desire to leave temp relation files for debugging. 2. autovacuum launcher deletes the catalog entries and data files of leftover temp relations during its initialization. This may be possible, but it is against the developers' desire to leave temp relation files for debugging. 3. Emit the "orphan temp relation" message only when the associated data file actually exists. autovacuum workers check if the temp relation file is left over with stat(). If not, delete the catalog entry in pg_class silently. This sounds reasonable because the purpose of the message is to notify users of potential disk space shortage. In the streaming replication case, no data files should exist on the promoted new primary, so no messages should be emitted. However, in the shared-disk HA cluster case, the temp relation files are left over on the shared disk, so this fix doesn't improve anything. 4. Emit the "orphan temp relation" message only when restart_after_crash is on. i.e. ereport(restart_after_crash ? LOG : DEBUG1, ... Regards MauMau
On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Yes, so nobody can convince serious customers that the current behavior >> makes real sense. > > I think you're making lots of noise over a trivial log message. > >> Could you please reconsider this? > > No. Just removing a warning isn't the way to solve this. If you want to > improve things you'll actually need to improve things not just stick > your head into the sand. I've studied this area of the code before, and I would actually proposed to fix this in the opposite way - namely, adopt the logic currently used for the wraparound case, which drops the temp table, even if wraparound is not an issue. The current code seems to be laboring under the impression that there's some benefit to keeping the temporary relation around, which, as far as I can see, there isn't. Now, you could perhaps argue that it's useful for forensics, but that presumes that the situation where a backend fails to crash without cleaning up its temporary relations is exciting enough to merit an investigation, which I believe to be false. RemoveTempRelationsCallback just does this: AbortOutOfAnyTransaction(); StartTransactionCommand(); RemoveTempRelations(myTempNamespace); CommitTransactionCommand(); RemoveTempRelations uses: deleteWhatDependsOn(&object, false); These are pretty high-level operations, and there are all kinds of reasons they could fail. Many of those reasons do indeed involve the system being messed up in some way, but it's likely that the user will know about that for other reasons anyway - e.g. if the cleanup fails because the disk where the files are located has gone read-only at the operating system level, things are going to go downhill in a hurry. When the user restarts, they expect recovery - or other automatic cleanup mechanisms - to put things right. And normally they do: the first backend to get the same backend ID as any orphaned temp schema will clear it out anyway on first use - completely silently - but if it so happens that a crashed backend had a very high backend ID and that temp table usage is relatively uncommon, then the user gets log spam from now and until eternity because of a problem that, in their mind, is already fixed. Even better, they will typically be completely unable to connect the log spam back to the event that triggered it, and will have no idea how to put it right. So while I disagree with MauMau's proposed fix, I agree with him that this error message is useless log spam. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-07-22 09:39:13 -0400, Robert Haas wrote: > On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Yes, so nobody can convince serious customers that the current behavior > >> makes real sense. > > > > I think you're making lots of noise over a trivial log message. > > > >> Could you please reconsider this? > > > > No. Just removing a warning isn't the way to solve this. If you want to > > improve things you'll actually need to improve things not just stick > > your head into the sand. > > I've studied this area of the code before, and I would actually > proposed to fix this in the opposite way - namely, adopt the logic > currently used for the wraparound case, which drops the temp table, > even if wraparound is not an issue. I'm absolutely not opposed to fixing this for real. I doubt it's backpatching material, but that's something to judge when we know what to do. > The current code seems to be > laboring under the impression that there's some benefit to keeping the > temporary relation around, which, as far as I can see, there isn't. > Now, you could perhaps argue that it's useful for forensics, but that > presumes that the situation where a backend fails to crash without > cleaning up its temporary relations is exciting enough to merit an > investigation, which I believe to be false. I think the picture here changed with the introduction of the restart_after_crash GUC - before it it was pretty hard to investigate anything that involved crashes. So I'm ok with changing things around. But I think it's more complex than just doing the if (wraparound) in do_autovacuum(). a) There very well could be a backend reconnecting to that backendId. Then we potentially might try to remove the temp schema from two backends - I'm not sure that's always going to end up going well. There's already a race window, but it'spretty darn unlikely to hit it right now because the wraparound case pretty much implies that nothing has used thatbackendid slot for a while. I guess we could do something like: LockDatabaseObject(tempschema); if (SearchSysCacheExists1) /* bailout */ performDeletion(...); b) I think at the very least we also need to call RemovePgTempFiles() during crash restart. > RemoveTempRelationsCallback just does this: > > AbortOutOfAnyTransaction(); > StartTransactionCommand(); > RemoveTempRelations(myTempNamespace); > CommitTransactionCommand(); > > RemoveTempRelations uses: > > deleteWhatDependsOn(&object, false); > These are pretty high-level operations, and there are all kinds of > reasons they could fail. One could argue, without being very convincing from my pov, that that's a reason not to always do it from autovacuum because it'll prevent vacuum from progressing past the borked temporary relation. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-07-22 22:18:03 +0900, MauMau wrote: > From: "Andres Freund" <andres@2ndquadrant.com> > >On 2014-07-22 19:13:56 +0900, MauMau wrote: > >>But this is true if restart_after_crash = on in postgresql.conf, because > >>the > >>crash restart only occurs in that case. However, in HA cluster, whether > >>it > >>is shared-disk or replication, restart_after_crash is set to off, isn't > >>it? > > > >In almost all setups I've seen it's set to on, even in HA scenarios. > > I'm afraid that's because people don't notice the existence or purpose of > this parameter. The 9.1 release note says: I think it's also because it's simply a good idea to keep it on in many production scenarios. Failing over 'just' because something caused a crash restart is often too expensive. > >No. Just removing a warning isn't the way to solve this. If you want to > >improve things you'll actually need to improve things not just stick > >your head into the sand. > I have a few ideas below, but none of them seems better than the original > proposal. What do you think? > 1. startup process deletes the catalog entries and data files of leftover > temp relations at the end of recovery. > This is probably difficult, impossible or undesirable, because the startup > process cannot access system catalogs. Even if it's possible, it is against > the developers' desire to leave temp relation files for debugging. > > 2. autovacuum launcher deletes the catalog entries and data files of > leftover temp relations during its initialization. > This may be possible, but it is against the developers' desire to leave temp > relation files for debugging. I think that desire is pretty much antiquated and doesn't really count for much. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> No. Just removing a warning isn't the way to solve this. If you want to >> improve things you'll actually need to improve things not just stick >> your head into the sand. > I've studied this area of the code before, and I would actually > proposed to fix this in the opposite way - namely, adopt the logic > currently used for the wraparound case, which drops the temp table, > even if wraparound is not an issue. The current code seems to be > laboring under the impression that there's some benefit to keeping the > temporary relation around, which, as far as I can see, there isn't. FWIW, I agree with Andres on this. The right response is to drop the temp table not complain about log spam. Or even more to the point, investigate why it's there in the first place; perhaps there's an actual fixable bug somewhere in there. But deciding that orphaned temp tables are normal is *not* an improvement IMO. regards, tom lane
On 2014-07-22 10:17:15 -0400, Tom Lane wrote: > Or even more to the point, investigate why it's there in the first > place; perhaps there's an actual fixable bug somewhere in there. I think MauMau's scenario of a failover to another database explains their existance - there's no step that'd remove them after promoting a standby. So there indeed is a need to have a sensible mechanism for removing them at some point. But it should be about removing, not ignoring them. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-07-22 10:17:15 -0400, Tom Lane wrote: >> Or even more to the point, investigate why it's there in the first >> place; perhaps there's an actual fixable bug somewhere in there. > I think MauMau's scenario of a failover to another database explains > their existance - there's no step that'd remove them after promoting a > standby. > So there indeed is a need to have a sensible mechanism for removing them > at some point. But it should be about removing, not ignoring them. Agreed. Note that RemovePgTempFiles, as such, only reclaims disk space. It does not clean out the pg_class entries, which means that just running that at standby promotion would do nothing to get rid of autovacuum's whining. regards, tom lane
On 2014-07-23 00:13:26 +0900, MauMau wrote: > Hello, Robert-san, Andres-san, Tom-san, > > From: "Andres Freund" <andres@2ndquadrant.com> > >a) There very well could be a backend reconnecting to that > > backendId. Then we potentially might try to remove the temp schema > > from two backends - I'm not sure that's always going to end up going > > well. There's already a race window, but it's pretty darn unlikely to > > hit it right now because the wraparound case pretty much implies that > > nothing has used that backendid slot for a while. > > I guess we could do something like: > > > > LockDatabaseObject(tempschema); > > if (SearchSysCacheExists1) > > /* bailout */ > > performDeletion(...); > > > >b) I think at the very least we also need to call RemovePgTempFiles() > > during crash restart. > > Thank you for showing the direction. I'll investigate the code. But that > will be tomorrow as it's already past midnight. > > Could it be included in 9.2.9 if I could submit the patch tomorrow? (I'm not > confident I can finish it...) I'd really appreciate it if you could create > the fix, if tomorrow will be late. 9.2.9 is already stamped, so no. But even without that, I don't think that this is a change that should be rushed into the backbranches. The risk/benefit ratio just isn't on the size of doing things hastily. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hello, Robert-san, Andres-san, Tom-san, From: "Andres Freund" <andres@2ndquadrant.com> > a) There very well could be a backend reconnecting to that > backendId. Then we potentially might try to remove the temp schema > from two backends - I'm not sure that's always going to end up going > well. There's already a race window, but it's pretty darn unlikely to > hit it right now because the wraparound case pretty much implies that > nothing has used that backendid slot for a while. > I guess we could do something like: > > LockDatabaseObject(tempschema); > if (SearchSysCacheExists1) > /* bailout */ > performDeletion(...); > > b) I think at the very least we also need to call RemovePgTempFiles() > during crash restart. Thank you for showing the direction. I'll investigate the code. But that will be tomorrow as it's already past midnight. Could it be included in 9.2.9 if I could submit the patch tomorrow? (I'm not confident I can finish it...) I'd really appreciate it if you could create the fix, if tomorrow will be late. Regards MauMau