Re: Hot Backup with rsync fails at pg_clog if under load - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Hot Backup with rsync fails at pg_clog if under load
Date
Msg-id CA+TgmoY8hcGnQCkEzUWhtKLpge5jFSZVzTpdFru2Q7CwQ_uFrA@mail.gmail.com
Whole thread Raw
In response to Re: Hot Backup with rsync fails at pg_clog if under load  (Euler Taveira de Oliveira <euler@timbira.com>)
List pgsql-hackers
2011/9/22 Euler Taveira de Oliveira <euler@timbira.com>:
> On 22-09-2011 11:24, Linas Virbalas wrote:
>>
>> In order to check more cases, I have changed the procedure to force an
>> immediate checkpoint, i.e. pg_start_backup('backup_under_load', true).
>> With
>> the same load generator running, pg_start_backup returned almost
>> instantaneously compared to how long it took previously.
>>
>> Most importantly, after doing this change, I cannot reproduce the pg_clog
>> error message anymore. In other words, with immediate checkpoint hot
>> backup
>> succeeds under this load!
>>
> Interesting. I remembered someone reporting this same problem but it was not
> reproducible by some of us.

So maybe there's some action that has to happen between the time the
redo pointer is set and the time the checkpoint is WAL-logged to
tickle the bug.  Like... CLOG extension, maybe?

*grep grep grep*

OK, so ExtendCLOG() just zeroes the page in memory, writes the WAL
record, and calls it good.  All the interesting stuff is done while
holding CLogControlLock.  So, at checkpoint time, we'd better make
sure to flush those pages out to disk before writing the checkpoint
record.  Otherwise, the redo pointer might advance past the
CLOG-extension record before the corresponding page hits the disk.
That's the job of CheckPointCLOG(), which is called from
CheckPointGuts(), which is called just from CreateCheckPoint() just
after setting the redo pointer.  Now, there is some funny business
with the locking here as we're writing the dirty pages
(CheckPointCLOG() calls SimpleLruFlush()).  We release and reacquire
the control lock many times.  But I don't see how that can cause a
problem, because it's all being done after the redo pointer has
already been said.  We could end up having buffers get dirtied again
after they are flushed, but that shouldn't matter either as long as
each buffer is written out at least once.  And if the write fails we
throw an error.  So I don't see any holes there.

Anybody else have an idea?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: citext operator precedence fix
Next
From: "Kevin Grittner"
Date:
Subject: Re: memory barriers (was: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs)