Home > mailing lists

Re: Hot standby, recovery infra - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Hot standby, recovery infra
Date	February 5, 2009 05:46:19
Msg-id	498AB55D.50408@enterprisedb.com Whole thread Raw
In response to	Re: Hot standby, recovery infra (Simon Riggs <simon@2ndQuadrant.com>)
Responses	Re: Hot standby, recovery infra
List	pgsql-hackers

Tree view

Simon Riggs wrote:
> On Thu, 2009-02-05 at 10:31 +0200, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> On Thu, 2009-02-05 at 09:28 +0200, Heikki Linnakangas wrote:
>>>> I got rid of minSafeStartPoint, advancing minRecoveryPoint instead. And 
>>>> it's advanced in XLogFlush instead of XLogFileRead. I'll post an updated 
>>>> patch soon.
>>> Why do you think XLogFlush is called less frequently than XLogFileRead?
>> It's not, but we only need to update the control file when we're 
>> "flushing" an LSN that's greater than current minRecoveryPoint. And when 
>> we do update minRecoveryPoint, we can update it to the LSN of the last 
>> record we've read from the archive.
> 
> So we might end up flushing more often *and* we will be doing it
> potentially in the code path of other users.

For example, imagine a database that fits completely in shared buffers. 
If we update at every XLogFileRead, we have to fsync every 16MB of WAL. 
If we update in XLogFlush the way I described, you only need to update 
when we flush a page from the buffer cache, which will only happen at 
restartpoints. That's far less updates.

Expanding that example to a database that doesn't fit in cache, you're 
still replacing pages from the buffer cache that have been untouched for 
longest. Such pages will have an old LSN, too, so we shouldn't need to 
update very often.

I'm sure you can come up with an example of where we end up fsyncing 
more often, but it doesn't seem like the common case to me.

> This change seems speculative and also against what has previously been
> agreed with Tom. If he chooses not to comment on your changes, that's up
> to him, but I don't think you should remove things quietly that have
> been put there through the community process, as if they caused
> problems. I feel like I'm in the middle here. 

I'd like to have the extra protection that this approach gives. If we 
let safeStartPoint to be ahead of the actual WAL we've replayed, we have 
to just assume we're fine if we reach end of WAL before reaching that 
point. That assumption falls down if e.g recovery is stopped, and you go 
and remove the last few WAL segments from the archive before restarting 
it, or signal pg_standby to trigger failover too early. Tracking the 
real safe starting point and enforcing it always protects you from that.

(we did discuss this a week ago: 
http://archives.postgresql.org/message-id/4981F6E0.2040503@enterprisedb.com)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Simon Riggs
Date: 05 February 2009, 05:32:02
Subject: Re: Hot standby, recovery infra

From: Simon Riggs
Date: 05 February 2009, 06:31:22
Subject: Re: Hot standby, recovery infra

Re: Hot standby, recovery infra - Mailing list pgsql-hackers

Previous

Next