Thread: Hot Standby and handling max_standby_delay

Hot Standby and handling max_standby_delay

From
Simon Riggs
Date:
We need to calculate a more accurate time since WAL arrived to make
max_standby_delay sensible in all cases. Difficult to know exactly when
to record new timestamps for received WAL. So, proposal is...

if (Base time is earlier than WAL record time)standby_delay = WAL record time - Base time
elsestandby_delay = now() - Base time

When standby_mode = off we record new base time when a new WAL file
arrives.

When standby_mode = on we record new base time each time we do
XLogWalRcvFlush(). We also record a new base time on first entry to the
main for loop in XLogRecv(), i.e. each time we start writing a new burst
of streamed WAL data.

So in either case, when we are waiting for new input we reset the timer
as soon as new WAL is received. The resolution/accuracy of standby_delay
will be no more than the time taken to replay a single file. This
shouldn't matter, since sane settings of max_standby_delay are either 0
or a number like 5-20 (seconds).

Which means if we are busy we don't record many new times, whereas if we
are working in sporadic bursts we keep up with the latest time of
receipt. This also works when we are performing an archive_recovery for
an old backup.

Startup process will access base time each time it begins to wait and
calculate current standby_delay before comparing against
max_standby_delay.

Comments?

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and handling max_standby_delay

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> We need to calculate a more accurate time since WAL arrived to make
> max_standby_delay sensible in all cases. Difficult to know exactly when
> to record new timestamps for received WAL. So, proposal is...
> 
> if (Base time is earlier than WAL record time)
>     standby_delay = WAL record time - Base time
> else
>     standby_delay = now() - Base time
> 
> When standby_mode = off we record new base time when a new WAL file
> arrives.
> 
> When standby_mode = on we record new base time each time we do
> XLogWalRcvFlush(). We also record a new base time on first entry to the
> main for loop in XLogRecv(), i.e. each time we start writing a new burst
> of streamed WAL data.
> 
> So in either case, when we are waiting for new input we reset the timer
> as soon as new WAL is received. The resolution/accuracy of standby_delay
> will be no more than the time taken to replay a single file. This
> shouldn't matter, since sane settings of max_standby_delay are either 0
> or a number like 5-20 (seconds).

That would change the meaning of max_standby_delay. Currently, it's the
delay between *generating* and applying a WAL record, your proposal
would change it to mean delay between receiving and applying it. That
seems a lot less useful to me.

With the current definition, I would feel pretty comfortable setting it
to say 15 minutes, knowing that if the standby falls behind for any
reason, as soon as the connection is re-established or
archiving/restoring fixed, it will catch up quickly, blowing away any
read-only queries if required. With your new definition, the standby
would in the worst case pause for 15 minutes at every WAL file.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby and handling max_standby_delay

From
Simon Riggs
Date:
On Fri, 2010-01-15 at 20:50 +0200, Heikki Linnakangas wrote:

> > So in either case, when we are waiting for new input we reset the timer
> > as soon as new WAL is received. The resolution/accuracy of standby_delay
> > will be no more than the time taken to replay a single file. This
> > shouldn't matter, since sane settings of max_standby_delay are either 0
> > or a number like 5-20 (seconds).
> 
> That would change the meaning of max_standby_delay. Currently, it's the
> delay between *generating* and applying a WAL record, your proposal
> would change it to mean delay between receiving and applying it. That
> seems a lot less useful to me.

Remember that this proposal is about responding to your comments. You
showed that the time difference between generating and applying a WAL
record lacked useful meaning in cases where the generation was not
smooth and continuous. So, taking your earlier refutation as still
observing a problem, I definitely do redefine the meaning of
max_standby_delay. As you say "standby delay" means the difference
between receive and apply.

The bottom line here is: are you willing to dismiss your earlier
observation of difficulties? I don't think you can...

> With the current definition, I would feel pretty comfortable setting it
> to say 15 minutes, knowing that if the standby falls behind for any
> reason, as soon as the connection is re-established or
> archiving/restoring fixed, it will catch up quickly, blowing away any
> read-only queries if required. With your new definition, the standby
> would in the worst case pause for 15 minutes at every WAL file.

Yes, it does. And I know you're thinking along those lines because we
are concurrently discussing how to handle re-connection after updates.

The alternative is this: after being disconnected for 15 minutes we
reconnect. For the next X minutes the standby will be almost unusable
for queries while we catch up again.

---

So, I'm left with thinking that both of these ways are right, in
different circumstances and with different priorities.

If your priority is High Availability, then you are willing to give up
the capability for long-ish queries when that conflicts with the role of
HA server. (delay = apply - generate). If your priority is a Reporting
Server, then you are willing to give up HA capability in return for
relatively uninterrupted querying (delay = apply - receive).

Do we agree the two goals are mutually exclusive? If so, I think we need
another parameter to express those configuration goals.

Also, I think we need some ways to explicitly block recovery to allow
queries to run, and some ways to explicitly block queries so recovery
can run.

Perhaps we need a way to block new queries on a regular basis, so that
recovery gets a chance to run. Kind of time-slicing algorithm, like OS.
That way we could assign a relative priority to each.

Hmmm.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and handling max_standby_delay

From
Dimitri Fontaine
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Fri, 2010-01-15 at 20:50 +0200, Heikki Linnakangas wrote:
> Yes, it does. And I know you're thinking along those lines because we
> are concurrently discussing how to handle re-connection after updates.

With my State Machine proposal, we could only apply max_standby_delay if
in sync state, and cancel query unconditionally otherwise.

> The alternative is this: after being disconnected for 15 minutes we
> reconnect. For the next X minutes the standby will be almost unusable
> for queries while we catch up again.

That's it. And it could be the cause of another GUC, do we want to give
priority to catching-up to get back in sync, or to running queries. That
would affect to when we apply max_standby_delay, and when set to prefer
running queries it'd apply in any state as soon as we accept connections.

Regards,
-- 
dim


Re: Hot Standby and handling max_standby_delay

From
Simon Riggs
Date:
On Sat, 2010-01-16 at 14:08 +0100, Dimitri Fontaine wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Fri, 2010-01-15 at 20:50 +0200, Heikki Linnakangas wrote:
> > Yes, it does. And I know you're thinking along those lines because we
> > are concurrently discussing how to handle re-connection after updates.
> 
> With my State Machine proposal, we could only apply max_standby_delay if
> in sync state, and cancel query unconditionally otherwise.
> 
> > The alternative is this: after being disconnected for 15 minutes we
> > reconnect. For the next X minutes the standby will be almost unusable
> > for queries while we catch up again.
> 
> That's it. And it could be the cause of another GUC, do we want to give
> priority to catching-up to get back in sync, or to running queries. That
> would affect to when we apply max_standby_delay, and when set to prefer
> running queries it'd apply in any state as soon as we accept connections.

Agreed.

I'm wondering if it wouldn't just be easier to put in a plugin for
recovery conflict handling, so the user can decide what to do
themselves. That seems like a better plan than chewing through these
issues now. 

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and handling max_standby_delay

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> I'm wondering if it wouldn't just be easier to put in a plugin for
> recovery conflict handling, so the user can decide what to do
> themselves. That seems like a better plan than chewing through these
> issues now. 

Making it a plugin doesn't solve anything.  This is not the kind of
thing where people can come up with some random policy and it will
work well.  Anyone competent to invent a better policy would be quite
capable of modifying the source to suit themselves.
        regards, tom lane


Re: Hot Standby and handling max_standby_delay

From
Simon Riggs
Date:
On Fri, 2010-01-15 at 20:50 +0200, Heikki Linnakangas wrote:

> That would change the meaning of max_standby_delay. Currently, it's the
> delay between *generating* and applying a WAL record, your proposal
> would change it to mean delay between receiving and applying it. That
> seems a lot less useful to me.

It would be good if there was a keepalive WAL record with a timestamp on
it generated every N seconds while in streaming mode.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and handling max_standby_delay

From
Simon Riggs
Date:
On Sat, 2010-01-16 at 11:37 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > I'm wondering if it wouldn't just be easier to put in a plugin for
> > recovery conflict handling, so the user can decide what to do
> > themselves. That seems like a better plan than chewing through these
> > issues now. 
> 
> Making it a plugin doesn't solve anything.  This is not the kind of
> thing where people can come up with some random policy and it will
> work well.  Anyone competent to invent a better policy would be quite
> capable of modifying the source to suit themselves.

Agreed, with some regrets.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and handling max_standby_delay

From
"Joshua D. Drake"
Date:
On Sat, 2010-01-16 at 18:20 +0000, Simon Riggs wrote:
> On Sat, 2010-01-16 at 11:37 -0500, Tom Lane wrote:
> > Simon Riggs <simon@2ndQuadrant.com> writes:
> > > I'm wondering if it wouldn't just be easier to put in a plugin for
> > > recovery conflict handling, so the user can decide what to do
> > > themselves. That seems like a better plan than chewing through these
> > > issues now.
> >
> > Making it a plugin doesn't solve anything.  This is not the kind of
> > thing where people can come up with some random policy and it will
> > work well.  Anyone competent to invent a better policy would be quite
> > capable of modifying the source to suit themselves.
>
> Agreed, with some regrets.

Although I agree in principle, I have to say that a plugin might make
sense. Yes... the person is good enough to modify the code, but should
they? A plugin allows them to make those decisions without running a
custom code base for core.

Just a thought.

Joshua D. Drake


>
> --
>  Simon Riggs           www.2ndQuadrant.com
>
>


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

Re: Hot Standby and handling max_standby_delay

From
"Joshua D. Drake"
Date:
On Sat, 2010-01-16 at 18:20 +0000, Simon Riggs wrote:
> On Sat, 2010-01-16 at 11:37 -0500, Tom Lane wrote:
> > Simon Riggs <simon@2ndQuadrant.com> writes:
> > > I'm wondering if it wouldn't just be easier to put in a plugin for
> > > recovery conflict handling, so the user can decide what to do
> > > themselves. That seems like a better plan than chewing through these
> > > issues now. 
> > 
> > Making it a plugin doesn't solve anything.  This is not the kind of
> > thing where people can come up with some random policy and it will
> > work well.  Anyone competent to invent a better policy would be quite
> > capable of modifying the source to suit themselves.
> 
> Agreed, with some regrets.

Although I agree in principle, I have to say that a plugin might make
sense. Yes... the person is good enough to modify the code, but should
they? A plugin allows them to make those decisions without running a
custom code base for core. 

Just a thought.

Joshua D. Drake


> 
> -- 
>  Simon Riggs           www.2ndQuadrant.com
> 
> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.



Re: Hot Standby and handling max_standby_delay

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Fri, 2010-01-15 at 20:50 +0200, Heikki Linnakangas wrote:
> 
>>> So in either case, when we are waiting for new input we reset the timer
>>> as soon as new WAL is received. The resolution/accuracy of standby_delay
>>> will be no more than the time taken to replay a single file. This
>>> shouldn't matter, since sane settings of max_standby_delay are either 0
>>> or a number like 5-20 (seconds).
>> That would change the meaning of max_standby_delay. Currently, it's the
>> delay between *generating* and applying a WAL record, your proposal
>> would change it to mean delay between receiving and applying it. That
>> seems a lot less useful to me.
> 
> Remember that this proposal is about responding to your comments. You
> showed that the time difference between generating and applying a WAL
> record lacked useful meaning in cases where the generation was not
> smooth and continuous. 

Yeah, I remember that. What the DBA cares about is the time between a
commit record being generated in the master, and the same record being
applied in the standby. That's easy to explain and tune for, and that's
what max_standby_delay should be. Let's not redefine it into something
less useful just because there's corner cases where we can't calculate
it easily.

The standby would really need to know the timestamp of the *next* commit
record in the WAL. Next from the record that's being applied. We don't
want to peek ahead, and we might not even have received the next commit
record yet even if it's already been generated in master, so we
approximate the timestamp of next commit record with timetamp of
previous commit record.

> It would be good if there was a keepalive WAL record with a timestamp on
> it generated every N seconds while in streaming mode.

Yeah, that would help. In streaming replication we could also send such
timestamp as a separate message, not within WAL.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby and handling max_standby_delay

From
Simon Riggs
Date:
On Sun, 2010-01-17 at 22:57 +0200, Heikki Linnakangas wrote:
> 
> > It would be good if there was a keepalive WAL record with a
> timestamp on it generated every N seconds while in streaming mode.
> 
> Yeah, that would help. In streaming replication we could also send
> such timestamp as a separate message, not within WAL.

Is that something you're working on?

Do we need a new record type for that, is there a handy record type to
bounce from?

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and handling max_standby_delay

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Sun, 2010-01-17 at 22:57 +0200, Heikki Linnakangas wrote:
>>> It would be good if there was a keepalive WAL record with a
>> timestamp on it generated every N seconds while in streaming mode.
>>
>> Yeah, that would help. In streaming replication we could also send
>> such timestamp as a separate message, not within WAL.
> 
> Is that something you're working on?

No.

> Do we need a new record type for that, is there a handy record type to
> bounce from?

After starting streaming, slices of WAL are sent as CopyData messages.
The CopyData payload begins with an XLogRecPtr, followed by the WAL
data. That payload format needs to be extended with a 'message type'
field and a new message type for the timestamps need to be added.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby and handling max_standby_delay

From
Simon Riggs
Date:
On Mon, 2010-01-18 at 08:28 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Sun, 2010-01-17 at 22:57 +0200, Heikki Linnakangas wrote:
> >>> It would be good if there was a keepalive WAL record with a
> >> timestamp on it generated every N seconds while in streaming mode.
> >>
> >> Yeah, that would help. In streaming replication we could also send
> >> such timestamp as a separate message, not within WAL.
> > 
> > Is that something you're working on?
> 
> No.

How accurate is this now? With regard to remaining items of work.
http://wiki.postgresql.org/wiki/Streaming_Replication

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and handling max_standby_delay

From
Simon Riggs
Date:
On Mon, 2010-01-18 at 08:28 +0200, Heikki Linnakangas wrote:

> > Do we need a new record type for that, is there a handy record type to
> > bounce from?
> 
> After starting streaming, slices of WAL are sent as CopyData messages.
> The CopyData payload begins with an XLogRecPtr, followed by the WAL
> data. That payload format needs to be extended with a 'message type'
> field and a new message type for the timestamps need to be added.

It wouldn't be a good use of all of our time for me to work on this. I
have zero unallocated time remaining and you'd still need to review what
I'd written, in any case.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and handling max_standby_delay

From
Fujii Masao
Date:
On Mon, Jan 18, 2010 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> How accurate is this now? With regard to remaining items of work.
> http://wiki.postgresql.org/wiki/Streaming_Replication

Not accurate. I'll correct that and provide the link from
"v8.5 Open Items page" to that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Hot Standby and handling max_standby_delay

From
Fujii Masao
Date:
On Mon, Jan 18, 2010 at 6:35 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Jan 18, 2010 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> How accurate is this now? With regard to remaining items of work.
>> http://wiki.postgresql.org/wiki/Streaming_Replication
>
> Not accurate. I'll correct that and provide the link from
> "v8.5 Open Items page" to that.

I listed the TODO items that need to be addressed for v8.5.
http://wiki.postgresql.org/wiki/Streaming_Replication#v8.5

If you find any other TODO items, please add them to the list.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Hot Standby and handling max_standby_delay

From
Simon Riggs
Date:
On Mon, 2010-01-18 at 20:18 +0900, Fujii Masao wrote:
> On Mon, Jan 18, 2010 at 6:35 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > On Mon, Jan 18, 2010 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> How accurate is this now? With regard to remaining items of work.
> >> http://wiki.postgresql.org/wiki/Streaming_Replication
> >
> > Not accurate. I'll correct that and provide the link from
> > "v8.5 Open Items page" to that.
> 
> I listed the TODO items that need to be addressed for v8.5.
> http://wiki.postgresql.org/wiki/Streaming_Replication#v8.5
> 
> If you find any other TODO items, please add them to the list.

What were the blockers that prevented sync rep from being included? I
must have missed the discussion on that part.

-- Simon Riggs           www.2ndQuadrant.com



Re: Hot Standby and handling max_standby_delay

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Mon, Jan 18, 2010 at 6:35 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Mon, Jan 18, 2010 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> How accurate is this now? With regard to remaining items of work.
>>> http://wiki.postgresql.org/wiki/Streaming_Replication
>> Not accurate. I'll correct that and provide the link from
>> "v8.5 Open Items page" to that.
> 
> I listed the TODO items that need to be addressed for v8.5.
> http://wiki.postgresql.org/wiki/Streaming_Replication#v8.5
> 
> If you find any other TODO items, please add them to the list.

Thanks! That's very useful.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby and handling max_standby_delay

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> What were the blockers that prevented sync rep from being included? I
> must have missed the discussion on that part.

For one, figuring out how to send back the notifications about WAL
applied in standby, and all the IPC required for that.

Streaming replication is a complex enough patch in just asynchronous
mode. Including synchronous mode would certainly have meant missing 8.5,
we just don't have the resources to review all at once. Even if we did,
splitting the project into smaller increments is a good idea anyway.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby and handling max_standby_delay

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Simon Riggs wrote:
>> Do we need a new record type for that, is there a handy record type to
>> bounce from?

> After starting streaming, slices of WAL are sent as CopyData messages.
> The CopyData payload begins with an XLogRecPtr, followed by the WAL
> data. That payload format needs to be extended with a 'message type'
> field and a new message type for the timestamps need to be added.

Whether or not anyone bothers with the timestamp message, I think adding
a message type header is a Must Fix item.  A protocol with no provision
for extension is certainly going to bite us in the rear before long.
        regards, tom lane


Re: Hot Standby and handling max_standby_delay

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Simon Riggs wrote:
>>> Do we need a new record type for that, is there a handy record type to
>>> bounce from?
> 
>> After starting streaming, slices of WAL are sent as CopyData messages.
>> The CopyData payload begins with an XLogRecPtr, followed by the WAL
>> data. That payload format needs to be extended with a 'message type'
>> field and a new message type for the timestamps need to be added.
> 
> Whether or not anyone bothers with the timestamp message, I think adding
> a message type header is a Must Fix item.  A protocol with no provision
> for extension is certainly going to bite us in the rear before long.

Agreed a message type header is a good idea, although we don't expect
streaming replication and the protocol to work across different major
versions anyway.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Hot Standby and handling max_standby_delay

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> Whether or not anyone bothers with the timestamp message, I think adding
>> a message type header is a Must Fix item.  A protocol with no provision
>> for extension is certainly going to bite us in the rear before long.

> Agreed a message type header is a good idea, although we don't expect
> streaming replication and the protocol to work across different major
> versions anyway.

Speaking of which, just where is the defense that makes sure that
walsender and walreceiver are compatible?  We should be checking not
only version, but all of the configuration variables that are embedded
in pg_control.
        regards, tom lane


Re: Hot Standby and handling max_standby_delay

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Speaking of which, just where is the defense that makes sure that
> walsender and walreceiver are compatible?  We should be checking not
> only version, but all of the configuration variables that are embedded
> in pg_control.

That happens at startup when pg_control is read, before streaming
starts. Remember that you need to start with a base backup.

We also check that the system_identifier in the standby matches that in
the primary, when the connection is established. That protects you from
starting streaming from wrong base backup.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com