Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
Date
Msg-id CA+TgmoYC8yY7WDWJ5cankEEOMAT=v7aZPXLT9Z-HCRxPATebhQ@mail.gmail.com
Whole thread Raw
In response to Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)  (Benedikt Grundmann <bgrundmann@janestreet.com>)
Responses Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)  (David Powers <dpowers@janestreet.com>)
List pgsql-hackers
On Tue, May 28, 2013 at 10:53 AM, Benedikt Grundmann
<bgrundmann@janestreet.com> wrote:
> Today we have seen
>
> 2013-05-28 04:11:12.300 EDT,,,30600,,51a41946.7788,1,,2013-05-27 22:41:10
> EDT,,0,ERROR,XX000,"xlog flush request 1E95/AFB2DB10 is not satisfied ---
> flushed only to 1E7E/21CB79A0",,,,,"writing block 9 of relation
> base/16416/293974676",,,,""
> 2013-05-28 04:11:13.316 EDT,,,30600,,51a41946.7788,2,,2013-05-27 22:41:10
> EDT,,0,ERROR,XX000,"xlog flush request 1E95/AFB2DB10 is not satisfied ---
> flushed only to 1E7E/21CB79A0",,,,,"writing block 9 of relation
> base/16416/293974676",,,,""
>
> while taking the backup of the primary.  We have been running for a few days
> like that and today is the first day where we see these problems again.  So
> it's not entirely deterministic / we don't know yet what we have to do to
> reproduce.
>
> So this makes Robert's theory more likely.  However we have also using this
> method (LVM + rsync with hardlinks from primary) for years without these
> problems.  So the big question is what changed?

Well... I don't know.  But my guess is there's something wrong with
the way you're using hardlinks.  Remember, a hardlink means two
logical pointers to the same file on disk.  So if either file gets
modified after the fact, then the other pointer is going to see the
changes.  The xlog flush request not satisfied stuff could happen if,
for example, the backup is pointing to a file, and the primary is
pointing to the same file, and the primary modifies the file after the
backup is taken (thus modifying the backup after-the-fact).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: preserving forensic information when we freeze
Next
From: Robert Haas
Date:
Subject: Re: preserving forensic information when we freeze