Thread: could not truncate directory "pg_subtrans": apparent wraparound

could not truncate directory "pg_subtrans": apparent wraparound

From

Dan Langille

Date:

05 June 2015, 21:45:56

I noticed this today on my 9.4.2 server running on FreeBSD 10.1:

Jun  5 18:59:40 slocum postgres[986]: [3957-1] LOG:  could not truncate directory "pg_subtrans": apparent wraparound

Looking at a post from 2010, Tom Lane suggest this information was useful:

[root@slocum:/usr/local/pgsql/data/pg_subtrans] # ls -l
total 1
-rw-------  1 pgsql  pgsql  8192 Jun  5 19:04 0032

This not not a high throughput server.

—
Dan Langille
http://langille.org/

Re: could not truncate directory "pg_subtrans": apparent wraparound

From

Thomas Munro

Date:

05 June 2015, 22:13:00

On Sat, Jun 6, 2015 at 9:45 AM, Dan Langille <dan@langille.org> wrote:
> I noticed this today on my 9.4.2 server running on FreeBSD 10.1:
>
> Jun  5 18:59:40 slocum postgres[986]: [3957-1] LOG:  could not truncate directory "pg_subtrans": apparent wraparound
>
> Looking at a post from 2010, Tom Lane suggest this information was useful:
>
> [root@slocum:/usr/local/pgsql/data/pg_subtrans] # ls -l
> total 1
> -rw-------  1 pgsql  pgsql  8192 Jun  5 19:04 0032
>
> This not not a high throughput server.

It looks like subtrantransactions may have a fencepost error similar
to multixacts, described here:

http://www.postgresql.org/message-id/CAEepm=0DqAtnM=23OQ44BbnwvN3g6+DXx+s5g4JRBP-VY8gEwQ@mail.gmail.com

I will try to repro this.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: could not truncate directory "pg_subtrans": apparent wraparound

From

Thomas Munro

Date:

05 June 2015, 23:03:11

On Sat, Jun 6, 2015 at 10:12 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sat, Jun 6, 2015 at 9:45 AM, Dan Langille <dan@langille.org> wrote:
>> I noticed this today on my 9.4.2 server running on FreeBSD 10.1:
>>
>> Jun  5 18:59:40 slocum postgres[986]: [3957-1] LOG:  could not truncate directory "pg_subtrans": apparent
wraparound
>>
>> Looking at a post from 2010, Tom Lane suggest this information was useful:
>>
>> [root@slocum:/usr/local/pgsql/data/pg_subtrans] # ls -l
>> total 1
>> -rw-------  1 pgsql  pgsql  8192 Jun  5 19:04 0032
>>
>> This not not a high throughput server.
>
> It looks like subtrantransactions may have a fencepost error similar
> to multixacts, described here:
>
> http://www.postgresql.org/message-id/CAEepm=0DqAtnM=23OQ44BbnwvN3g6+DXx+s5g4JRBP-VY8gEwQ@mail.gmail.com
>
> I will try to repro this.

My idea was that if I could get oldestXact == next XID in
TruncateSUBSTRANS, then TransactionIdToPage(oldestXact) for a value of
oldestXact that happens to be immediately after a page boundary (so
that xid % 2048 == 0) might give page number that is >=
latest_page_number, causing SimpleLruTruncate to print that message.
But I can't figure out how to get next XID == oldest XID, because
vacuumdb --freeze --all consumes xids itself, so in my first attempt
at this, next XID is always 3 ahead of the oldest XID when a
checkpoint is run.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: could not truncate directory "pg_subtrans": apparent wraparound

From

Alvaro Herrera

Date:

06 June 2015, 01:26:17

Thomas Munro wrote:

> My idea was that if I could get oldestXact == next XID in
> TruncateSUBSTRANS, then TransactionIdToPage(oldestXact) for a value of
> oldestXact that happens to be immediately after a page boundary (so
> that xid % 2048 == 0) might give page number that is >=
> latest_page_number, causing SimpleLruTruncate to print that message.
> But I can't figure out how to get next XID == oldest XID, because
> vacuumdb --freeze --all consumes xids itself, so in my first attempt
> at this, next XID is always 3 ahead of the oldest XID when a
> checkpoint is run.

vacuumdb starts by querying pg_database, which eats one XID.

Vacuum itself only uses one XID when vac_truncate_clog() is called.
This is called from vac_update_datfrozenxid(), which always happen at
the end of each user-invoked VACUUM (so three times for vacuumdb if you
have three databases); autovacuum does it also at the end of each run.
Maybe you can get autovacuum to quit before doing it.

OTOH, if the values in the pg_database entry do not change,
vac_truncate_clog is not called, and thus vacuum would finish without
consuming an XID.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: could not truncate directory "pg_subtrans": apparent wraparound

From

Thomas Munro

Date:

06 June 2015, 04:51:10

On Sat, Jun 6, 2015 at 1:25 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Thomas Munro wrote:
>
>> My idea was that if I could get oldestXact == next XID in
>> TruncateSUBSTRANS, then TransactionIdToPage(oldestXact) for a value of
>> oldestXact that happens to be immediately after a page boundary (so
>> that xid % 2048 == 0) might give page number that is >=
>> latest_page_number, causing SimpleLruTruncate to print that message.
>> But I can't figure out how to get next XID == oldest XID, because
>> vacuumdb --freeze --all consumes xids itself, so in my first attempt
>> at this, next XID is always 3 ahead of the oldest XID when a
>> checkpoint is run.
>
> vacuumdb starts by querying pg_database, which eats one XID.
>
> Vacuum itself only uses one XID when vac_truncate_clog() is called.
> This is called from vac_update_datfrozenxid(), which always happen at
> the end of each user-invoked VACUUM (so three times for vacuumdb if you
> have three databases); autovacuum does it also at the end of each run.
> Maybe you can get autovacuum to quit before doing it.
>
> OTOH, if the values in the pg_database entry do not change,
> vac_truncate_clog is not called, and thus vacuum would finish without
> consuming an XID.

I have manage to reproduce it a few times but haven't quite found the
right synchronisation hacks to make it reliable so I'm not posting a
repro script yet.

I think it's a scary sounding message but very rare and entirely
harmless (unless you really have wrapped around...).  The fix is
probably something like: if oldest XID == next XID, then just don't
call SimpleLruTruncate (truncation is deferred until the next
checkpoint), or perhaps (if we can confirm this doesn't cause problems
for dirty pages or that there can't be any dirty pages before cutoff
page because of the preceding flush (as I suspect)) we could use
cutoffPage = TransactionIdToPage(oldextXact - 1) if oldest == next, or
maybe even always.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: could not truncate directory "pg_subtrans": apparent wraparound

From

Thomas Munro

Date:

08 June 2015, 00:29:59

On Sat, Jun 6, 2015 at 4:51 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Sat, Jun 6, 2015 at 1:25 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Thomas Munro wrote:
>>
>>> My idea was that if I could get oldestXact == next XID in
>>> TruncateSUBSTRANS, then TransactionIdToPage(oldestXact) for a value of
>>> oldestXact that happens to be immediately after a page boundary (so
>>> that xid % 2048 == 0) might give page number that is >=
>>> latest_page_number, causing SimpleLruTruncate to print that message.
>>> But I can't figure out how to get next XID == oldest XID, because
>>> vacuumdb --freeze --all consumes xids itself, so in my first attempt
>>> at this, next XID is always 3 ahead of the oldest XID when a
>>> checkpoint is run.
>>
>> vacuumdb starts by querying pg_database, which eats one XID.
>>
>> Vacuum itself only uses one XID when vac_truncate_clog() is called.
>> This is called from vac_update_datfrozenxid(), which always happen at
>> the end of each user-invoked VACUUM (so three times for vacuumdb if you
>> have three databases); autovacuum does it also at the end of each run.
>> Maybe you can get autovacuum to quit before doing it.
>>
>> OTOH, if the values in the pg_database entry do not change,
>> vac_truncate_clog is not called, and thus vacuum would finish without
>> consuming an XID.
>
> I have manage to reproduce it a few times but haven't quite found the
> right synchronisation hacks to make it reliable so I'm not posting a
> repro script yet.
>
> I think it's a scary sounding message but very rare and entirely
> harmless (unless you really have wrapped around...).  The fix is
> probably something like: if oldest XID == next XID, then just don't
> call SimpleLruTruncate (truncation is deferred until the next
> checkpoint), or perhaps (if we can confirm this doesn't cause problems
> for dirty pages or that there can't be any dirty pages before cutoff
> page because of the preceding flush (as I suspect)) we could use
> cutoffPage = TransactionIdToPage(oldextXact - 1) if oldest == next, or
> maybe even always.

Here's a repro script and a suggested patch.  (What I said about dirty
pages in parentheses above was nonsense, I was confusing this with
something else.)

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: could not truncate directory "pg_subtrans": apparent wraparound

From

Thomas Munro

Date:

08 June 2015, 01:05:39

On Mon, Jun 8, 2015 at 12:29 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here's a repro script and a suggested patch.

Argh... I realised immediately after sending this that subtransaction
truncation doesn't even use the oldest XID computed by vacuum, it uses
GetOldestXmin (the "oldest transaction that was running when any
current transaction was started").  So here is an even simpler repro,
though the fix is the same (with different comments).

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: could not truncate directory "pg_subtrans": apparent wraparound

From

Dan Langille

Date:

08 June 2015, 20:23:57

<div dir="ltr">If there's anything I can try on my servers to help diagnose the issues, please let me know.  If
desired,I can arrange access for debugging.<br /></div><div class="gmail_extra"><br /><div class="gmail_quote">On Sat,
Jun6, 2015 at 12:51 AM, Thomas Munro <span dir="ltr"><<a href="mailto:thomas.munro@enterprisedb.com"
target="_blank">thomas.munro@enterprisedb.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0
00 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sat, Jun 6, 2015 at 1:25 PM, Alvaro Herrera
<<ahref="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>> wrote:<br /> > Thomas Munro wrote:<br
/>><br /> >> My idea was that if I could get oldestXact == next XID in<br /> >> TruncateSUBSTRANS, then
TransactionIdToPage(oldestXact)for a value of<br /> >> oldestXact that happens to be immediately after a page
boundary(so<br /> >> that xid % 2048 == 0) might give page number that is >=<br /> >>
latest_page_number,causing SimpleLruTruncate to print that message.<br /> >> But I can't figure out how to get
nextXID == oldest XID, because<br /> >> vacuumdb --freeze --all consumes xids itself, so in my first attempt<br
/>>> at this, next XID is always 3 ahead of the oldest XID when a<br /> >> checkpoint is run.<br /> ><br
/>> vacuumdb starts by querying pg_database, which eats one XID.<br /> ><br /> > Vacuum itself only uses one
XIDwhen vac_truncate_clog() is called.<br /> > This is called from vac_update_datfrozenxid(), which always happen
at<br/> > the end of each user-invoked VACUUM (so three times for vacuumdb if you<br /> > have three databases);
autovacuumdoes it also at the end of each run.<br /> > Maybe you can get autovacuum to quit before doing it.<br />
><br/> > OTOH, if the values in the pg_database entry do not change,<br /> > vac_truncate_clog is not called,
andthus vacuum would finish without<br /> > consuming an XID.<br /><br /></span>I have manage to reproduce it a few
timesbut haven't quite found the<br /> right synchronisation hacks to make it reliable so I'm not posting a<br /> repro
scriptyet.<br /><br /> I think it's a scary sounding message but very rare and entirely<br /> harmless (unless you
reallyhave wrapped around...).  The fix is<br /> probably something like: if oldest XID == next XID, then just don't<br
/>call SimpleLruTruncate (truncation is deferred until the next<br /> checkpoint), or perhaps (if we can confirm this
doesn'tcause problems<br /> for dirty pages or that there can't be any dirty pages before cutoff<br /> page because of
thepreceding flush (as I suspect)) we could use<br /> cutoffPage = TransactionIdToPage(oldextXact - 1) if oldest ==
next,or<br /> maybe even always.<br /><div class="HOEnZb"><div class="h5"><br /> --<br /> Thomas Munro<br /><a
href="http://www.enterprisedb.com"target="_blank">http://www.enterprisedb.com</a><br
/></div></div></blockquote></div><br/></div>

Re: could not truncate directory "pg_subtrans": apparent wraparound

From

Thomas Munro

Date:

13 June 2015, 02:02:52

Since the multixact equivalent of this problem[1] fell through the
cracks on the multixact mega-thread, here is an updated patch that
addresses this problem for both pg_subtrans and pg_multixact/offsets
using the same approach: always step back one multixact/xid (rather
than doing so only if oldest == next, which seemed like an unnecessary
complication, and a bit futile since the result of such a test is only
an instantaneous snapshot). I've added this to the commitfest[2]. I
am also attaching a new set of repro scripts including a pair to test
the case where next multixact/xid == first valid ID (the scripts with
'wraparound' in the name, which use dirty pg_resetxlog tricks to get
into that situation). In my previous patch I naively subtracted one,
which didn't work for those (even rarer!) cases. The new patch steps
over the special ID values.

This is a low priority bug: it just produces low probability bogus
(but perhaps alarming) LOG messages and skips truncation during
checkpoints on low activity systems. There have been occasional
reports of these pg_subtrans messages going back as far as 2007 (and
Alvaro was barking up the correct tree[3] back in 2010), so I figured
it was worth following up.

I also took a look at the pg_clog and pg_commit_ts truncation
functions. You could argue that they have the same problem in theory
(they pass a page number derived from the oldest xid to
SimpleLruTruncate, and maybe there is a way for that to be an xid that
hasn't been issued yet), but in practice I don't think it's a
reachable condition. They use the frozen xid that is updated by
vacuuming, but vacuum itself advances the next xid counter in the
process. Is there a path though the vacuum code that ever exposes
frozen xid == next xid? In contrast, for pg_subtrans we use
GetOldestXmin(), which is equal to the next xid if there are no
running transactions, and for pg_multixact we use the oldest
multixact, which can be equal to the next multixact ID after a
wraparound vacuum because vacuum itself doesn't always consume
multixacts.

[1] http://www.postgresql.org/message-id/CAEepm=0DqAtnM=23OQ44BbnwvN3g6+DXx+s5g4JRBP-VY8gEwQ@mail.gmail.com
[2] https://commitfest.postgresql.org/5/265/
[3] http://www.postgresql.org/message-id/1274373980-sup-3414@alvh.no-ip.org

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: could not truncate directory "pg_subtrans": apparent wraparound

From

Heikki Linnakangas

Date:

22 July 2015, 22:45:12

On 06/13/2015 05:02 AM, Thomas Munro wrote:
> Since the multixact equivalent of this problem[1] fell through the
> cracks on the multixact mega-thread, here is an updated patch that
> addresses this problem for both pg_subtrans and pg_multixact/offsets
> using the same approach: always step back one multixact/xid (rather
> than doing so only if oldest == next, which seemed like an unnecessary
> complication, and a bit futile since the result of such a test is only
> an instantaneous snapshot).  I've added this to the commitfest[2].  I
> am also attaching a new set of repro scripts including a pair to test
> the case where next multixact/xid == first valid ID (the scripts with
> 'wraparound' in the name, which use dirty pg_resetxlog tricks to get
> into that situation).  In my previous patch I naively subtracted one,
> which didn't work for those (even rarer!) cases.  The new patch steps
> over the special ID values.

Thanks, great repro scripts!

> I also took a look at the pg_clog and pg_commit_ts truncation
> functions.  You could argue that they have the same problem in theory
> (they pass a page number derived from the oldest xid to
> SimpleLruTruncate, and maybe there is a way for that to be an xid that
> hasn't been issued yet), but in practice I don't think it's a
> reachable condition.  They use the frozen xid that is updated by
> vacuuming, but vacuum itself advances the next xid counter in the
> process.  Is there a path though the vacuum code that ever exposes
> frozen xid == next xid?  In contrast, for pg_subtrans we use
> GetOldestXmin(), which is equal to the next xid if there are no
> running transactions, and for pg_multixact we use the oldest
> multixact, which can be equal to the next multixact ID after a
> wraparound vacuum because vacuum itself doesn't always consume
> multixacts.

Yeah, I think pg_clog and pg_commit_ts are safe, for now. It's rather
accidental, though.

There is one more instance of this bug though: in 9.2 and below,
pg_multixact members are truncated similarly. Attached are corresponding
repro-scripts for that case.

Committed with the additional fix for that.

- Heikki