Thread: could not truncate directory "pg_subtrans": apparent wraparound
I noticed this today on my 9.4.2 server running on FreeBSD 10.1: Jun 5 18:59:40 slocum postgres[986]: [3957-1] LOG: could not truncate directory "pg_subtrans": apparent wraparound Looking at a post from 2010, Tom Lane suggest this information was useful: [root@slocum:/usr/local/pgsql/data/pg_subtrans] # ls -l total 1 -rw------- 1 pgsql pgsql 8192 Jun 5 19:04 0032 This not not a high throughput server. — Dan Langille http://langille.org/
On Sat, Jun 6, 2015 at 9:45 AM, Dan Langille <dan@langille.org> wrote: > I noticed this today on my 9.4.2 server running on FreeBSD 10.1: > > Jun 5 18:59:40 slocum postgres[986]: [3957-1] LOG: could not truncate directory "pg_subtrans": apparent wraparound > > Looking at a post from 2010, Tom Lane suggest this information was useful: > > [root@slocum:/usr/local/pgsql/data/pg_subtrans] # ls -l > total 1 > -rw------- 1 pgsql pgsql 8192 Jun 5 19:04 0032 > > This not not a high throughput server. It looks like subtrantransactions may have a fencepost error similar to multixacts, described here: http://www.postgresql.org/message-id/CAEepm=0DqAtnM=23OQ44BbnwvN3g6+DXx+s5g4JRBP-VY8gEwQ@mail.gmail.com I will try to repro this. -- Thomas Munro http://www.enterprisedb.com
On Sat, Jun 6, 2015 at 10:12 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sat, Jun 6, 2015 at 9:45 AM, Dan Langille <dan@langille.org> wrote: >> I noticed this today on my 9.4.2 server running on FreeBSD 10.1: >> >> Jun 5 18:59:40 slocum postgres[986]: [3957-1] LOG: could not truncate directory "pg_subtrans": apparent wraparound >> >> Looking at a post from 2010, Tom Lane suggest this information was useful: >> >> [root@slocum:/usr/local/pgsql/data/pg_subtrans] # ls -l >> total 1 >> -rw------- 1 pgsql pgsql 8192 Jun 5 19:04 0032 >> >> This not not a high throughput server. > > It looks like subtrantransactions may have a fencepost error similar > to multixacts, described here: > > http://www.postgresql.org/message-id/CAEepm=0DqAtnM=23OQ44BbnwvN3g6+DXx+s5g4JRBP-VY8gEwQ@mail.gmail.com > > I will try to repro this. My idea was that if I could get oldestXact == next XID in TruncateSUBSTRANS, then TransactionIdToPage(oldestXact) for a value of oldestXact that happens to be immediately after a page boundary (so that xid % 2048 == 0) might give page number that is >= latest_page_number, causing SimpleLruTruncate to print that message. But I can't figure out how to get next XID == oldest XID, because vacuumdb --freeze --all consumes xids itself, so in my first attempt at this, next XID is always 3 ahead of the oldest XID when a checkpoint is run. -- Thomas Munro http://www.enterprisedb.com
Thomas Munro wrote: > My idea was that if I could get oldestXact == next XID in > TruncateSUBSTRANS, then TransactionIdToPage(oldestXact) for a value of > oldestXact that happens to be immediately after a page boundary (so > that xid % 2048 == 0) might give page number that is >= > latest_page_number, causing SimpleLruTruncate to print that message. > But I can't figure out how to get next XID == oldest XID, because > vacuumdb --freeze --all consumes xids itself, so in my first attempt > at this, next XID is always 3 ahead of the oldest XID when a > checkpoint is run. vacuumdb starts by querying pg_database, which eats one XID. Vacuum itself only uses one XID when vac_truncate_clog() is called. This is called from vac_update_datfrozenxid(), which always happen at the end of each user-invoked VACUUM (so three times for vacuumdb if you have three databases); autovacuum does it also at the end of each run. Maybe you can get autovacuum to quit before doing it. OTOH, if the values in the pg_database entry do not change, vac_truncate_clog is not called, and thus vacuum would finish without consuming an XID. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jun 6, 2015 at 1:25 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Thomas Munro wrote: > >> My idea was that if I could get oldestXact == next XID in >> TruncateSUBSTRANS, then TransactionIdToPage(oldestXact) for a value of >> oldestXact that happens to be immediately after a page boundary (so >> that xid % 2048 == 0) might give page number that is >= >> latest_page_number, causing SimpleLruTruncate to print that message. >> But I can't figure out how to get next XID == oldest XID, because >> vacuumdb --freeze --all consumes xids itself, so in my first attempt >> at this, next XID is always 3 ahead of the oldest XID when a >> checkpoint is run. > > vacuumdb starts by querying pg_database, which eats one XID. > > Vacuum itself only uses one XID when vac_truncate_clog() is called. > This is called from vac_update_datfrozenxid(), which always happen at > the end of each user-invoked VACUUM (so three times for vacuumdb if you > have three databases); autovacuum does it also at the end of each run. > Maybe you can get autovacuum to quit before doing it. > > OTOH, if the values in the pg_database entry do not change, > vac_truncate_clog is not called, and thus vacuum would finish without > consuming an XID. I have manage to reproduce it a few times but haven't quite found the right synchronisation hacks to make it reliable so I'm not posting a repro script yet. I think it's a scary sounding message but very rare and entirely harmless (unless you really have wrapped around...). The fix is probably something like: if oldest XID == next XID, then just don't call SimpleLruTruncate (truncation is deferred until the next checkpoint), or perhaps (if we can confirm this doesn't cause problems for dirty pages or that there can't be any dirty pages before cutoff page because of the preceding flush (as I suspect)) we could use cutoffPage = TransactionIdToPage(oldextXact - 1) if oldest == next, or maybe even always. -- Thomas Munro http://www.enterprisedb.com
On Sat, Jun 6, 2015 at 4:51 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sat, Jun 6, 2015 at 1:25 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Thomas Munro wrote: >> >>> My idea was that if I could get oldestXact == next XID in >>> TruncateSUBSTRANS, then TransactionIdToPage(oldestXact) for a value of >>> oldestXact that happens to be immediately after a page boundary (so >>> that xid % 2048 == 0) might give page number that is >= >>> latest_page_number, causing SimpleLruTruncate to print that message. >>> But I can't figure out how to get next XID == oldest XID, because >>> vacuumdb --freeze --all consumes xids itself, so in my first attempt >>> at this, next XID is always 3 ahead of the oldest XID when a >>> checkpoint is run. >> >> vacuumdb starts by querying pg_database, which eats one XID. >> >> Vacuum itself only uses one XID when vac_truncate_clog() is called. >> This is called from vac_update_datfrozenxid(), which always happen at >> the end of each user-invoked VACUUM (so three times for vacuumdb if you >> have three databases); autovacuum does it also at the end of each run. >> Maybe you can get autovacuum to quit before doing it. >> >> OTOH, if the values in the pg_database entry do not change, >> vac_truncate_clog is not called, and thus vacuum would finish without >> consuming an XID. > > I have manage to reproduce it a few times but haven't quite found the > right synchronisation hacks to make it reliable so I'm not posting a > repro script yet. > > I think it's a scary sounding message but very rare and entirely > harmless (unless you really have wrapped around...). The fix is > probably something like: if oldest XID == next XID, then just don't > call SimpleLruTruncate (truncation is deferred until the next > checkpoint), or perhaps (if we can confirm this doesn't cause problems > for dirty pages or that there can't be any dirty pages before cutoff > page because of the preceding flush (as I suspect)) we could use > cutoffPage = TransactionIdToPage(oldextXact - 1) if oldest == next, or > maybe even always. Here's a repro script and a suggested patch. (What I said about dirty pages in parentheses above was nonsense, I was confusing this with something else.) -- Thomas Munro http://www.enterprisedb.com
Attachment
On Mon, Jun 8, 2015 at 12:29 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Here's a repro script and a suggested patch. Argh... I realised immediately after sending this that subtransaction truncation doesn't even use the oldest XID computed by vacuum, it uses GetOldestXmin (the "oldest transaction that was running when any current transaction was started"). So here is an even simpler repro, though the fix is the same (with different comments). -- Thomas Munro http://www.enterprisedb.com
Attachment
<div dir="ltr">If there's anything I can try on my servers to help diagnose the issues, please let me know. If desired,I can arrange access for debugging.<br /></div><div class="gmail_extra"><br /><div class="gmail_quote">On Sat, Jun6, 2015 at 12:51 AM, Thomas Munro <span dir="ltr"><<a href="mailto:thomas.munro@enterprisedb.com" target="_blank">thomas.munro@enterprisedb.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 00 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sat, Jun 6, 2015 at 1:25 PM, Alvaro Herrera <<ahref="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>> wrote:<br /> > Thomas Munro wrote:<br />><br /> >> My idea was that if I could get oldestXact == next XID in<br /> >> TruncateSUBSTRANS, then TransactionIdToPage(oldestXact)for a value of<br /> >> oldestXact that happens to be immediately after a page boundary(so<br /> >> that xid % 2048 == 0) might give page number that is >=<br /> >> latest_page_number,causing SimpleLruTruncate to print that message.<br /> >> But I can't figure out how to get nextXID == oldest XID, because<br /> >> vacuumdb --freeze --all consumes xids itself, so in my first attempt<br />>> at this, next XID is always 3 ahead of the oldest XID when a<br /> >> checkpoint is run.<br /> ><br />> vacuumdb starts by querying pg_database, which eats one XID.<br /> ><br /> > Vacuum itself only uses one XIDwhen vac_truncate_clog() is called.<br /> > This is called from vac_update_datfrozenxid(), which always happen at<br/> > the end of each user-invoked VACUUM (so three times for vacuumdb if you<br /> > have three databases); autovacuumdoes it also at the end of each run.<br /> > Maybe you can get autovacuum to quit before doing it.<br /> ><br/> > OTOH, if the values in the pg_database entry do not change,<br /> > vac_truncate_clog is not called, andthus vacuum would finish without<br /> > consuming an XID.<br /><br /></span>I have manage to reproduce it a few timesbut haven't quite found the<br /> right synchronisation hacks to make it reliable so I'm not posting a<br /> repro scriptyet.<br /><br /> I think it's a scary sounding message but very rare and entirely<br /> harmless (unless you reallyhave wrapped around...). The fix is<br /> probably something like: if oldest XID == next XID, then just don't<br />call SimpleLruTruncate (truncation is deferred until the next<br /> checkpoint), or perhaps (if we can confirm this doesn'tcause problems<br /> for dirty pages or that there can't be any dirty pages before cutoff<br /> page because of thepreceding flush (as I suspect)) we could use<br /> cutoffPage = TransactionIdToPage(oldextXact - 1) if oldest == next,or<br /> maybe even always.<br /><div class="HOEnZb"><div class="h5"><br /> --<br /> Thomas Munro<br /><a href="http://www.enterprisedb.com"target="_blank">http://www.enterprisedb.com</a><br /></div></div></blockquote></div><br/></div>
Hi Since the multixact equivalent of this problem[1] fell through the cracks on the multixact mega-thread, here is an updated patch that addresses this problem for both pg_subtrans and pg_multixact/offsets using the same approach: always step back one multixact/xid (rather than doing so only if oldest == next, which seemed like an unnecessary complication, and a bit futile since the result of such a test is only an instantaneous snapshot). I've added this to the commitfest[2]. I am also attaching a new set of repro scripts including a pair to test the case where next multixact/xid == first valid ID (the scripts with 'wraparound' in the name, which use dirty pg_resetxlog tricks to get into that situation). In my previous patch I naively subtracted one, which didn't work for those (even rarer!) cases. The new patch steps over the special ID values. This is a low priority bug: it just produces low probability bogus (but perhaps alarming) LOG messages and skips truncation during checkpoints on low activity systems. There have been occasional reports of these pg_subtrans messages going back as far as 2007 (and Alvaro was barking up the correct tree[3] back in 2010), so I figured it was worth following up. I also took a look at the pg_clog and pg_commit_ts truncation functions. You could argue that they have the same problem in theory (they pass a page number derived from the oldest xid to SimpleLruTruncate, and maybe there is a way for that to be an xid that hasn't been issued yet), but in practice I don't think it's a reachable condition. They use the frozen xid that is updated by vacuuming, but vacuum itself advances the next xid counter in the process. Is there a path though the vacuum code that ever exposes frozen xid == next xid? In contrast, for pg_subtrans we use GetOldestXmin(), which is equal to the next xid if there are no running transactions, and for pg_multixact we use the oldest multixact, which can be equal to the next multixact ID after a wraparound vacuum because vacuum itself doesn't always consume multixacts. [1] http://www.postgresql.org/message-id/CAEepm=0DqAtnM=23OQ44BbnwvN3g6+DXx+s5g4JRBP-VY8gEwQ@mail.gmail.com [2] https://commitfest.postgresql.org/5/265/ [3] http://www.postgresql.org/message-id/1274373980-sup-3414@alvh.no-ip.org -- Thomas Munro http://www.enterprisedb.com
Attachment
On 06/13/2015 05:02 AM, Thomas Munro wrote: > Since the multixact equivalent of this problem[1] fell through the > cracks on the multixact mega-thread, here is an updated patch that > addresses this problem for both pg_subtrans and pg_multixact/offsets > using the same approach: always step back one multixact/xid (rather > than doing so only if oldest == next, which seemed like an unnecessary > complication, and a bit futile since the result of such a test is only > an instantaneous snapshot). I've added this to the commitfest[2]. I > am also attaching a new set of repro scripts including a pair to test > the case where next multixact/xid == first valid ID (the scripts with > 'wraparound' in the name, which use dirty pg_resetxlog tricks to get > into that situation). In my previous patch I naively subtracted one, > which didn't work for those (even rarer!) cases. The new patch steps > over the special ID values. Thanks, great repro scripts! > I also took a look at the pg_clog and pg_commit_ts truncation > functions. You could argue that they have the same problem in theory > (they pass a page number derived from the oldest xid to > SimpleLruTruncate, and maybe there is a way for that to be an xid that > hasn't been issued yet), but in practice I don't think it's a > reachable condition. They use the frozen xid that is updated by > vacuuming, but vacuum itself advances the next xid counter in the > process. Is there a path though the vacuum code that ever exposes > frozen xid == next xid? In contrast, for pg_subtrans we use > GetOldestXmin(), which is equal to the next xid if there are no > running transactions, and for pg_multixact we use the oldest > multixact, which can be equal to the next multixact ID after a > wraparound vacuum because vacuum itself doesn't always consume > multixacts. Yeah, I think pg_clog and pg_commit_ts are safe, for now. It's rather accidental, though. There is one more instance of this bug though: in 9.2 and below, pg_multixact members are truncated similarly. Attached are corresponding repro-scripts for that case. Committed with the additional fix for that. - Heikki