Thread: Interesting glitch in autovacuum
I observed a curious bug in autovac just now. Since plain vacuum avoids calling GetTransactionSnapshot, an autovac worker that happens not to analyze any tables will never call GetTransactionSnapshot at all. This means it will arrive at vac_update_datfrozenxid with RecentGlobalXmin never having been changed from its boot value of FirstNormalTransactionId, which means that it will fail to update the database's datfrozenxid ... or, if the current value of datfrozenxid is past 2 billion, that it will improperly advance datfrozenxid to sometime in the future. Once you get into this state in a reasonably idle database such as template1, autovac is completely dead in the water: if it thinks template1 needs to be vacuumed for wraparound, then every subsequent worker will be launched at template1, every one will fail to advance its datfrozenxid, rinse and repeat. Even before that happens, the DB's datfrozenxid will prevent clog truncation, which might explain some of the recent complaints. I've only directly tested this in HEAD, but I suspect the problem goes back a ways. On reflection I'm not even sure that this is strictly an autovacuum bug. It can be cast more generically as "RecentGlobalXmin getting used without ever having been set", and it sure looks to me like the HOT patch may have introduced a few risks of that sort. I'm thinking that maybe an appropriate fix is to insert a GetTransactionSnapshot call at the beginning of InitPostgres' transaction, thus ensuring that every backend has some vaguely sane value for RecentGlobalXmin before it tries to do any database access. Another thought is that even with that, an autovac worker is likely to reach vac_update_datfrozenxid with a RecentGlobalXmin value that was computed at the start of its run, and is thus rather old. I wonder why vac_update_datfrozenxid is using the variable at all rather than doing GetOldestXmin? It's not like that function is so performance-critical that it needs to avoid calling GetOldestXmin. Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData, is it actually necessary for lazy vacuum to avoid setting a snapshot? It seems like it might be a good idea for it to do so in order to keep its RecentGlobalXmin reasonably current. I've only looked at this in HEAD, but I am thinking that we have a real problem here in both HEAD and 8.3. I'm less sure how bad things are in the older branches. Comments? regards, tom lane
Tom Lane wrote: > I observed a curious bug in autovac just now. Since plain vacuum avoids > calling GetTransactionSnapshot, an autovac worker that happens not to > analyze any tables will never call GetTransactionSnapshot at all. > This means it will arrive at vac_update_datfrozenxid with > RecentGlobalXmin never having been changed from its boot value of > FirstNormalTransactionId, which means that it will fail to update the > database's datfrozenxid ... or, if the current value of datfrozenxid > is past 2 billion, that it will improperly advance datfrozenxid to > sometime in the future. Ouch :-( > I've only directly tested this in HEAD, but I suspect the problem goes > back a ways. Well, this logic was introduced in 8.2; I'm not sure if there's a problem in 8.1, but I don't think so. > On reflection I'm not even sure that this is strictly an autovacuum > bug. It can be cast more generically as "RecentGlobalXmin getting > used without ever having been set", and it sure looks to me like the > HOT patch may have introduced a few risks of that sort. Agreed. Maybe we should boot RecentGlobalXmin with InvalidOid, and ensure where it's going to be used that it's not that. > I'm thinking that maybe an appropriate fix is to insert a > GetTransactionSnapshot call at the beginning of InitPostgres' > transaction, thus ensuring that every backend has some vaguely sane > value for RecentGlobalXmin before it tries to do any database access. AFAIR there's an "initial transaction" in InitPostgres or something like that. Since it goes away quickly, it'd be a good place to ensure the snapshot does not last much longer. > Another thought is that even with that, an autovac worker is likely > to reach vac_update_datfrozenxid with a RecentGlobalXmin value that > was computed at the start of its run, and is thus rather old. > I wonder why vac_update_datfrozenxid is using the variable at all > rather than doing GetOldestXmin? It's not like that function is > so performance-critical that it needs to avoid calling GetOldestXmin. The function is called only once per autovacuum iteration, and once in manually-invoked vacuum, so certainly it's not performance-critical. > Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData, > is it actually necessary for lazy vacuum to avoid setting a snapshot? > It seems like it might be a good idea for it to do so in order to > keep its RecentGlobalXmin reasonably current. Hmm, I think I'd rather be inclined to get a snapshot just when it's going to finish. That way, RecentGlobalXmin will be up to date even if the > I've only looked at this in HEAD, but I am thinking that we have > a real problem here in both HEAD and 8.3. I'm less sure how bad > things are in the older branches. 8.2 does contain the vac_update_datfrozenxid problem at the very least. Older versions do not have that logic, so they are probably safe. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> I observed a curious bug in autovac just now. ... > Maybe we should boot RecentGlobalXmin with InvalidOid, and ensure where > it's going to be used that it's not that. Good idea --- an Assert at the use sites should be sufficient. >> Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData, >> is it actually necessary for lazy vacuum to avoid setting a snapshot? >> It seems like it might be a good idea for it to do so in order to >> keep its RecentGlobalXmin reasonably current. > Hmm, I think I'd rather be inclined to get a snapshot just when it's > going to finish. I'm worried about keeping RecentGlobalXmin up to date during the vacuums, not only at the end, because it will be used for HOT page pruning during the vacuums. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> Lastly, now that we have the PROC_IN_VACUUM test in GetSnapshotData, > >> is it actually necessary for lazy vacuum to avoid setting a snapshot? > >> It seems like it might be a good idea for it to do so in order to > >> keep its RecentGlobalXmin reasonably current. > > > Hmm, I think I'd rather be inclined to get a snapshot just when it's > > going to finish. > > I'm worried about keeping RecentGlobalXmin up to date during the > vacuums, not only at the end, because it will be used for HOT page > pruning during the vacuums. Oh, I see. I didn't know we were doing HOT pruning during vacuum; does it make sense? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> I'm worried about keeping RecentGlobalXmin up to date during the >> vacuums, not only at the end, because it will be used for HOT page >> pruning during the vacuums. > Oh, I see. I didn't know we were doing HOT pruning during vacuum; does > it make sense? Sorry, I got a bit confused there. The vacuum's intentional pruning will use its own OldestXmin variable, which is known current at the start of the vacuum (and I think there were even proposals to advance it more frequently than that). However, a vacuum could require some incidental system catalog fetches, which I think could result in prune operations based on RecentGlobalXmin on the catalog pages (cf index_getnext). So it's probably not too terribly important ... as long as an autovac worker doesn't live long enough that its RecentGlobalXmin threatens to wrap around ... but I'm still interested in it as a code cleanup measure. Skipping the transaction snapshot fetch was a performance kluge, and if we don't need it any more I'd like to get rid of that distinction between full and lazy vacuum behavior. Anyway I think we are on the same page about the rest of the issues. Did you want to work on fixing them, or shall I? regards, tom lane
Alvaro Herrera wrote: > I didn't know we were doing HOT pruning during vacuum; does > it make sense? Removing HOT-updated, dead tuples is a bit complicated. It needs to be done one HOT-chain at a time, and the root line pointer needs to be updated to the next live tuple in the chain. lazy_scan_heap() calls the regular pruning function heap_page_prune() to deal with those before doing the normal scan of line pointers. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2008-09-10 at 11:52 -0400, Tom Lane wrote: > I'm thinking that maybe an appropriate fix is to insert a > GetTransactionSnapshot call at the beginning of InitPostgres' > transaction, thus ensuring that every backend has some vaguely sane > value for RecentGlobalXmin before it tries to do any database access. Can't we just set RecentGlobalXmin without getting a Snapshot? There's no need for a snapshot at that point is there? Just lock ProcArrayLock, read GlobalXmin, drop lock. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > On Wed, 2008-09-10 at 11:52 -0400, Tom Lane wrote: >> I'm thinking that maybe an appropriate fix is to insert a >> GetTransactionSnapshot call at the beginning of InitPostgres' >> transaction, thus ensuring that every backend has some vaguely sane >> value for RecentGlobalXmin before it tries to do any database access. > Can't we just set RecentGlobalXmin without getting a Snapshot? Well, certainly building an MVCC snapshot is more than we (appear to) need, but I'm of the opinion that what we have got here is an unexpected way in which these specialized transactions are unlike all others. I think the safest fix is to make them more like normal transactions, rather than invent still other ways to do things. If there were a serious performance argument against that, then yeah, but I don't see one. Backend startup isn't cheap in any case, and neither is a VACUUM, so the incremental cost involved here seems negligible. > There's > no need for a snapshot at that point is there? Just lock ProcArrayLock, > read GlobalXmin, drop lock. There's no "GlobalXmin" variable. We'd have to use GetOldestXmin(); which is cheaper than GetSnapshotData, but not hugely so. regards, tom lane
Tom Lane wrote: > Sorry, I got a bit confused there. The vacuum's intentional pruning > will use its own OldestXmin variable, which is known current at the > start of the vacuum (and I think there were even proposals to advance > it more frequently than that). However, a vacuum could require some > incidental system catalog fetches, which I think could result in > prune operations based on RecentGlobalXmin on the catalog pages > (cf index_getnext). Hmm, right, and what Heikki said too. > Anyway I think we are on the same page about the rest of the issues. > Did you want to work on fixing them, or shall I? Is this more or less what you had in mind? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
On Wed, 2008-09-10 at 16:11 -0400, Tom Lane wrote: > If there were a serious performance argument against that, then yeah, > but I don't see one. Maybe! Just finishing other patch then I'll be starting Hot Standby discussions, so we'll see. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Anyway I think we are on the same page about the rest of the issues. >> Did you want to work on fixing them, or shall I? > Is this more or less what you had in mind? Yeah, this looks like exactly what I had in mind for HEAD. I'm not too sure about the back branches though. I think we could apply all of it to 8.3, but further back is going to require a separate investigation for each branch. Will you take that on? regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> Anyway I think we are on the same page about the rest of the issues. > >> Did you want to work on fixing them, or shall I? > > > Is this more or less what you had in mind? > > Yeah, this looks like exactly what I had in mind for HEAD. I'm not too > sure about the back branches though. I think we could apply all of it > to 8.3, but further back is going to require a separate investigation > for each branch. Will you take that on? Sure. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
I wrote: > Yeah, this looks like exactly what I had in mind for HEAD. I'm not too > sure about the back branches though. I think we could apply all of it > to 8.3, but further back is going to require a separate investigation > for each branch. Will you take that on? BTW, I did a quick look at all the uses of RecentGlobalXmin in the back branches, and I think we might be all right before 8.2. The older branches do in fact init RecentGlobalXmin to InvalidTransactionId, and the only thing they use it for is "is this tuple dead to everyone" tests. Since InvalidTransactionId compares older than anything else, the only consequence of not having set it is overly-conservative decisions not to mark tuples killed. So unless you see a problem I missed, I think we only have an issue-worth-fixing in 8.2 and 8.3. regards, tom lane
Tom Lane wrote: > BTW, I did a quick look at all the uses of RecentGlobalXmin in the back > branches, and I think we might be all right before 8.2. The older > branches do in fact init RecentGlobalXmin to InvalidTransactionId, > and the only thing they use it for is "is this tuple dead to everyone" > tests. Since InvalidTransactionId compares older than anything else, > the only consequence of not having set it is overly-conservative > decisions not to mark tuples killed. So unless you see a problem I > missed, I think we only have an issue-worth-fixing in 8.2 and 8.3. Actually, 8.2 initializes it to InvalidTransactionId too, so it doesn't seem like there's a problem there either. Since the problem stems from using it as initializer for the Min() calculation, there's no actual bug on 8.2 either. The bug only appeared on 8.3 when the initializer was changed. And given that there's no HOT in 8.2, then there's no danger of misusing it in page pruning either. Still, I produced a patch to 8.2, but given that I'm not sure if it's worth applying. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Attachment
On Wed, Sep 10, 2008 at 9:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > On reflection I'm not even sure that this is strictly an autovacuum > bug. It can be cast more generically as "RecentGlobalXmin getting > used without ever having been set", and it sure looks to me like the > HOT patch may have introduced a few risks of that sort. > ISTM that HOT may be safe here because even if RecentGlobalXmin is not set and has the boot time value of FirstNormalTransactionId, the heap_page_prune_opt() would just return without doing any real work. This is OK except in VACUUM where we anyways use OldestXmin. So I don't see a real problem here. But its good to fix the larger problem as per the suggestion. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > On Wed, Sep 10, 2008 at 9:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> On reflection I'm not even sure that this is strictly an autovacuum >> bug. It can be cast more generically as "RecentGlobalXmin getting >> used without ever having been set", and it sure looks to me like the >> HOT patch may have introduced a few risks of that sort. > ISTM that HOT may be safe here because even if RecentGlobalXmin is not > set and has the boot time value of FirstNormalTransactionId, the > heap_page_prune_opt() would just return without doing any real work. Until the XID counter advances past the 2billion mark. Then the uninitialized RecentGlobalXmin would be "in the future" and would result in mistaken decisions *to* prune, not to not prune. In short this is a risk-of-data-loss bug. Once the patch is in I think we ought to start thinking about a set of update releases... regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Actually, 8.2 initializes it to InvalidTransactionId too, so it doesn't > seem like there's a problem there either. Since the problem stems from > using it as initializer for the Min() calculation, there's no actual bug > on 8.2 either. The bug only appeared on 8.3 when the initializer was > changed. And given that there's no HOT in 8.2, then there's no danger > of misusing it in page pruning either. I concur that 8.2 has no problem except in vac_update_datfrozenxid, but I think it is an actual bug there. If newFrozenXid starts out as InvalidTransactionId, it'll stay that way because InvalidTransactionId sorts as older than anything else. The result will be that the routine fails to advance datfrozenxid, which leads to exactly the type of autovacuum misbehavior that I saw in HEAD. (Actually there's another problem in an assert-enabled build: it'll fail at the Assert(TransactionIdIsNormal(newFrozenXid)) ...) regards, tom lane