Home > mailing lists

Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Recovery inconsistencies, standby much larger than primary
Date	February 15, 2014 03:30:54
Msg-id	31058.1392435045@sss.pgh.pa.us Whole thread Raw
In response to	Re: Recovery inconsistencies, standby much larger than primary (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: Recovery inconsistencies, standby much larger than primary
List	pgsql-hackers

Tree view

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-02-14 20:46:01 +0000, Greg Stark wrote:
>> Going over this I think this is still a potential issue:
>> On 31 Jan 2014 15:56, "Andres Freund" <andres@2ndquadrant.com> wrote:
>>> I am not sure that explains the issue, but I think the redo action for
>>> truncation is not safe across crashes.  A XLOG_SMGR_TRUNCATE will just
>>> do a smgrtruncate() (and then mdtruncate) which will iterate over the
>>> segments starting at 0 till mdnblocks()/segment_size and *truncate* but
>>> not delete individual segment files that are not needed anymore, right?
>>> If we crash in the midst of that a new mdtruncate() will be issued, but
>>> it will get a shorter value back from mdnblocks().

>> I'm not too familiar with md.c but my reading of the code is that we
>> truncate the files in reverse order?

> That's what I had assumed as well, but it doesn't look that way:

No, it's deleting forward.

We could probably fix things so it deleted backwards; it'd be a tad
tedious because the list structure isn't organized that way, but we
could do it.  Not sure if that's good enough though.  If you don't
want to assume the filesystem metadata is coherent after a crash,
we might have nonzero-size segments after zero-size ones, even if
the truncate calls had been issued in the right order.

Another possibility is to keep opening and truncating files until
we don't find the next segment in sequence, looking directly at the
filesystem not at the mdfd chain.  I don't think this would be
appropriate in normal operation, but we could do it if InRecovery
(and maybe even only if we don't think the database is consistent?)
        regards, tom lane

pgsql-hackers by date:

From: Florian Pflug
Date: 15 February 2014, 03:20:27
Subject: Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

From: Bruce Momjian
Date: 15 February 2014, 03:51:51
Subject: Small psql memory fix

Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

Previous

Next