Re: Hot Backup with rsync fails at pg_clog if under load - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Hot Backup with rsync fails at pg_clog if under load
Date
Msg-id 6C3D7EDA-573E-46DC-9047-5FEB92876DA8@phlo.org
Whole thread Raw
In response to Re: Hot Backup with rsync fails at pg_clog if under load  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Hot Backup with rsync fails at pg_clog if under load
List pgsql-hackers
On Oct25, 2011, at 14:51 , Simon Riggs wrote:
> On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug <fgp@phlo.org> wrote:
>
>> What I don't understand is how this affects the CLOG. How does oldestActiveXID
>> factor into CLOG initialization?
>
> It is an entirely different error.

Ah, OK. I assumed that you believe the wrong oldestActiveXID computation
solved both the SUBTRANS-related *and* the CLOG-related errors, since you
said "We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly" at the start of the mail.

> Chris' clog error was caused by a file read error. The file was
> opened, we did a seek within the file and then the call to read()
> failed to return a complete page from the file.
>
> The xid shown is 22811359, which is the nextxid in the control file.
>
> So StartupClog() must have failed trying to read the clog page from disk.

Yep.

> That isn't a Hot Standby problem, a recovery problem nor is it certain
> its a PostgreSQL problem.

It's very likely that it's a PostgreSQL problem, though. It's probably
not a pilot error since it happens even for backups taken with pg_basebackup(),
so the only explanation other than a PostgreSQL bug is broken hardware or
a pretty serious kernel/filesystem bug.

> OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
> files during recovery, so maybe we can think of a way to make recovery
> cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

As long as we don't understand how the CLOG-related errors happen in
the first place, I think it's a bad idea to silence them.

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: TOAST versus VACUUM, or "missing chunk number 0 for toast value" identified
Next
From: Florian Pflug
Date:
Subject: Re: Hot Backup with rsync fails at pg_clog if under load