Re: could not truncate directory "pg_subtrans": apparent wraparound - Mailing list pgsql-hackers

From Dan Langille
Subject Re: could not truncate directory "pg_subtrans": apparent wraparound
Date
Msg-id CAPG9OKf=8cXfjsu-2g=mk46yvwRXT11dkDn16GsSjVKODLqCnw@mail.gmail.com
Whole thread Raw
In response to Re: could not truncate directory "pg_subtrans": apparent wraparound  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
<div dir="ltr">If there's anything I can try on my servers to help diagnose the issues, please let me know.  If
desired,I can arrange access for debugging.<br /></div><div class="gmail_extra"><br /><div class="gmail_quote">On Sat,
Jun6, 2015 at 12:51 AM, Thomas Munro <span dir="ltr"><<a href="mailto:thomas.munro@enterprisedb.com"
target="_blank">thomas.munro@enterprisedb.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0
00 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sat, Jun 6, 2015 at 1:25 PM, Alvaro Herrera
<<ahref="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>> wrote:<br /> > Thomas Munro wrote:<br
/>><br /> >> My idea was that if I could get oldestXact == next XID in<br /> >> TruncateSUBSTRANS, then
TransactionIdToPage(oldestXact)for a value of<br /> >> oldestXact that happens to be immediately after a page
boundary(so<br /> >> that xid % 2048 == 0) might give page number that is >=<br /> >>
latest_page_number,causing SimpleLruTruncate to print that message.<br /> >> But I can't figure out how to get
nextXID == oldest XID, because<br /> >> vacuumdb --freeze --all consumes xids itself, so in my first attempt<br
/>>> at this, next XID is always 3 ahead of the oldest XID when a<br /> >> checkpoint is run.<br /> ><br
/>> vacuumdb starts by querying pg_database, which eats one XID.<br /> ><br /> > Vacuum itself only uses one
XIDwhen vac_truncate_clog() is called.<br /> > This is called from vac_update_datfrozenxid(), which always happen
at<br/> > the end of each user-invoked VACUUM (so three times for vacuumdb if you<br /> > have three databases);
autovacuumdoes it also at the end of each run.<br /> > Maybe you can get autovacuum to quit before doing it.<br />
><br/> > OTOH, if the values in the pg_database entry do not change,<br /> > vac_truncate_clog is not called,
andthus vacuum would finish without<br /> > consuming an XID.<br /><br /></span>I have manage to reproduce it a few
timesbut haven't quite found the<br /> right synchronisation hacks to make it reliable so I'm not posting a<br /> repro
scriptyet.<br /><br /> I think it's a scary sounding message but very rare and entirely<br /> harmless (unless you
reallyhave wrapped around...).  The fix is<br /> probably something like: if oldest XID == next XID, then just don't<br
/>call SimpleLruTruncate (truncation is deferred until the next<br /> checkpoint), or perhaps (if we can confirm this
doesn'tcause problems<br /> for dirty pages or that there can't be any dirty pages before cutoff<br /> page because of
thepreceding flush (as I suspect)) we could use<br /> cutoffPage = TransactionIdToPage(oldextXact - 1) if oldest ==
next,or<br /> maybe even always.<br /><div class="HOEnZb"><div class="h5"><br /> --<br /> Thomas Munro<br /><a
href="http://www.enterprisedb.com"target="_blank">http://www.enterprisedb.com</a><br
/></div></div></blockquote></div><br/></div> 

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: reaper should restart archiver even on standby
Next
From: Shay Rojansky
Date:
Subject: Cancel race condition