Re: could not truncate directory "pg_subtrans": apparent wraparound - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: could not truncate directory "pg_subtrans": apparent wraparound
Date
Msg-id CAEepm=3YwqYB0O-5hXefaNGyu2NKuCduC4833K5nfTvYLNjs8Q@mail.gmail.com
Whole thread Raw
In response to Re: could not truncate directory "pg_subtrans": apparent wraparound  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: could not truncate directory "pg_subtrans": apparent wraparound  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: could not truncate directory "pg_subtrans": apparent wraparound  (Dan Langille <info1@dvl-software.com>)
List pgsql-hackers
On Sat, Jun 6, 2015 at 1:25 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Thomas Munro wrote:
>
>> My idea was that if I could get oldestXact == next XID in
>> TruncateSUBSTRANS, then TransactionIdToPage(oldestXact) for a value of
>> oldestXact that happens to be immediately after a page boundary (so
>> that xid % 2048 == 0) might give page number that is >=
>> latest_page_number, causing SimpleLruTruncate to print that message.
>> But I can't figure out how to get next XID == oldest XID, because
>> vacuumdb --freeze --all consumes xids itself, so in my first attempt
>> at this, next XID is always 3 ahead of the oldest XID when a
>> checkpoint is run.
>
> vacuumdb starts by querying pg_database, which eats one XID.
>
> Vacuum itself only uses one XID when vac_truncate_clog() is called.
> This is called from vac_update_datfrozenxid(), which always happen at
> the end of each user-invoked VACUUM (so three times for vacuumdb if you
> have three databases); autovacuum does it also at the end of each run.
> Maybe you can get autovacuum to quit before doing it.
>
> OTOH, if the values in the pg_database entry do not change,
> vac_truncate_clog is not called, and thus vacuum would finish without
> consuming an XID.

I have manage to reproduce it a few times but haven't quite found the
right synchronisation hacks to make it reliable so I'm not posting a
repro script yet.

I think it's a scary sounding message but very rare and entirely
harmless (unless you really have wrapped around...).  The fix is
probably something like: if oldest XID == next XID, then just don't
call SimpleLruTruncate (truncation is deferred until the next
checkpoint), or perhaps (if we can confirm this doesn't cause problems
for dirty pages or that there can't be any dirty pages before cutoff
page because of the preceding flush (as I suspect)) we could use
cutoffPage = TransactionIdToPage(oldextXact - 1) if oldest == next, or
maybe even always.

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
Next
From: Michael Paquier
Date:
Subject: Re: [CORE] Restore-reliability mode