Thread: failover vs. read only queries

failover vs. read only queries

From
Fujii Masao
Date:
Hi,

When the trigger file is created while the recovery keeps
waiting for the release of the lock by read only queries,
it might take a very long time for the standby to become
the master. The recovery cannot go ahead until those read
only queries have gone away. This would increase the downtime
at the failover, and degrade the high availability.

To fix the problem, when the trigger file is found, I think
that we should cancel all the running read only queries
immediately (or forcibly use -1 as the max_standby_delay
since that point) and make the recovery go ahead. If some
people prefer queries over failover even when they create the
trigger file, we can make the trigger behavior selectable in
response to the content of the trigger file like pg_standby
does.

This problem looks like a bug, so I'd like to fix that for
9.0. But the amount of code change might not be small.
Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: failover vs. read only queries

From
Tatsuo Ishii
Date:
> When the trigger file is created while the recovery keeps
> waiting for the release of the lock by read only queries,
> it might take a very long time for the standby to become
> the master. The recovery cannot go ahead until those read
> only queries have gone away. This would increase the downtime
> at the failover, and degrade the high availability.
> 
> To fix the problem, when the trigger file is found, I think
> that we should cancel all the running read only queries
> immediately (or forcibly use -1 as the max_standby_delay
> since that point) and make the recovery go ahead. If some
> people prefer queries over failover even when they create the
> trigger file, we can make the trigger behavior selectable in
> response to the content of the trigger file like pg_standby
> does.
> 
> This problem looks like a bug, so I'd like to fix that for
> 9.0. But the amount of code change might not be small.
> Thought?

+1. Down time of HA system is really important for HA users.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


Re: failover vs. read only queries

From
Takahiro Itagaki
Date:
Fujii Masao <masao.fujii@gmail.com> wrote:

> To fix the problem, when the trigger file is found, I think
> that we should cancel all the running read only queries
> immediately (or forcibly use -1 as the max_standby_delay
> since that point) and make the recovery go ahead.

Hmmm, does the following sequence work as your expect instead of the chanage?
It requires text-file manipulation in 1, but seems to be more flexible.
 1. Reset max_standby_delay = 0 in postgresql.conf 2. pg_ctl reload 3. Create a trigger file

BTW, I hope we will have "pg_ctl failover --timeout=N" in 9.1
instead of the trigger file based management.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: failover vs. read only queries

From
Fujii Masao
Date:
On Wed, Jun 9, 2010 at 5:47 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> To fix the problem, when the trigger file is found, I think
> that we should cancel all the running read only queries
> immediately (or forcibly use -1 as the max_standby_delay
> since that point) and make the recovery go ahead.

Oops! I made an error. I meant 0 instead of -1, as the max_standby_delay.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: failover vs. read only queries

From
Fujii Masao
Date:
On Wed, Jun 9, 2010 at 6:13 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>> To fix the problem, when the trigger file is found, I think
>> that we should cancel all the running read only queries
>> immediately (or forcibly use -1 as the max_standby_delay
>> since that point) and make the recovery go ahead.
>
> Hmmm, does the following sequence work as your expect instead of the chanage?
> It requires text-file manipulation in 1, but seems to be more flexible.
>
>  1. Reset max_standby_delay = 0 in postgresql.conf
>  2. pg_ctl reload
>  3. Create a trigger file

As far as I read the HS code, SIGHUP is not checked while a recovery
is waiting for queries :(  So pg_ctl reload would have no effect on
the conflicting queries.

Independently from the problem I raised, I think that we should call
HandleStartupProcInterrupts() in that sleep loop.

> BTW, I hope we will have "pg_ctl failover --timeout=N" in 9.1
> instead of the trigger file based management.

Please feel free to try that ;)

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: failover vs. read only queries

From
Tom Lane
Date:
Fujii Masao <masao.fujii@gmail.com> writes:
> When the trigger file is created while the recovery keeps
> waiting for the release of the lock by read only queries,
> it might take a very long time for the standby to become
> the master. The recovery cannot go ahead until those read
> only queries have gone away. This would increase the downtime
> at the failover, and degrade the high availability.

> To fix the problem, when the trigger file is found, I think
> that we should cancel all the running read only queries
> immediately (or forcibly use -1 as the max_standby_delay
> since that point) and make the recovery go ahead. If some
> people prefer queries over failover even when they create the
> trigger file, we can make the trigger behavior selectable in
> response to the content of the trigger file like pg_standby
> does.

> This problem looks like a bug, so I'd like to fix that for
> 9.0. But the amount of code change might not be small.
> Thought?

-1.  This looks like 9.1 material to me, and besides I'm not even
convinced that what you propose is a good solution.
        regards, tom lane


Re: failover vs. read only queries

From
Josh Berkus
Date:
> To fix the problem, when the trigger file is found, I think
> that we should cancel all the running read only queries
> immediately (or forcibly use -1 as the max_standby_delay
> since that point) and make the recovery go ahead. If some
> people prefer queries over failover even when they create the
> trigger file, we can make the trigger behavior selectable in
> response to the content of the trigger file like pg_standby
> does.

Well, the question is: are there users who would prefer not to have
slave queries cancelled and are willing to wait for failover?  If so,
behavior of failover should really be slaved to max_standby_delay.  If
not, there should be new behavior (i.e. "when the trigger file is found,
cancel all running queries").   One could argue that there are no users
of the first case.

The fact that failover current does *not* terminate existing queries and
transactions was regarded as a feature by the audience, rather than a
bug, when I did demos of HS/SR.  Of course, they might not have been
thinking of the delay for writes.

If there were an easy way to make the trigger file cancel all running
queries, apply remaining logs and come up, then I'd vote for that for
9.0.  I think it's the more desired behavior by most users.  However,
I'm opposed to any complex solutions which might delay 9.0 release.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: failover vs. read only queries

From
Robert Haas
Date:
On Wed, Jun 9, 2010 at 3:22 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> To fix the problem, when the trigger file is found, I think
>> that we should cancel all the running read only queries
>> immediately (or forcibly use -1 as the max_standby_delay
>> since that point) and make the recovery go ahead. If some
>> people prefer queries over failover even when they create the
>> trigger file, we can make the trigger behavior selectable in
>> response to the content of the trigger file like pg_standby
>> does.
>
> Well, the question is: are there users who would prefer not to have
> slave queries cancelled and are willing to wait for failover?  If so,
> behavior of failover should really be slaved to max_standby_delay.  If
> not, there should be new behavior (i.e. "when the trigger file is found,
> cancel all running queries").   One could argue that there are no users
> of the first case.
>
> The fact that failover current does *not* terminate existing queries and
> transactions was regarded as a feature by the audience, rather than a
> bug, when I did demos of HS/SR.  Of course, they might not have been
> thinking of the delay for writes.
>
> If there were an easy way to make the trigger file cancel all running
> queries, apply remaining logs and come up, then I'd vote for that for
> 9.0.  I think it's the more desired behavior by most users.  However,
> I'm opposed to any complex solutions which might delay 9.0 release.

One complication here is that, at least as I understand it, Tom is
planning to overhaul max_standby_delay.  So it might be premature to
try to figure out how this should work until the dust settles.  But my
intuition is similar to yours, overall.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: failover vs. read only queries

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> The fact that failover current does *not* terminate existing queries and
> transactions was regarded as a feature by the audience, rather than a
> bug, when I did demos of HS/SR.  Of course, they might not have been
> thinking of the delay for writes.

> If there were an easy way to make the trigger file cancel all running
> queries, apply remaining logs and come up, then I'd vote for that for
> 9.0.  I think it's the more desired behavior by most users.  However,
> I'm opposed to any complex solutions which might delay 9.0 release.

My feeling about it is that if you want fast failover you should not
have your failover target server configured as hot standby at all, let
alone hot standby with a long max_standby_delay.  Such a slave could be
very far behind on applying WAL when the crunch comes, and no amount of
query killing will save you from that.  Put your long-running standby
queries on a different slave instead.

We should consider whether we can improve the situation in 9.1, but it
is not a must-fix for 9.0; especially when the correct behavior isn't
immediately obvious.
        regards, tom lane


Re: failover vs. read only queries

From
Simon Riggs
Date:
On Wed, 2010-06-09 at 12:22 -0700, Josh Berkus wrote:
> > To fix the problem, when the trigger file is found, I think
> > that we should cancel all the running read only queries
> > immediately (or forcibly use -1 as the max_standby_delay
> > since that point) and make the recovery go ahead. If some
> > people prefer queries over failover even when they create the
> > trigger file, we can make the trigger behavior selectable in
> > response to the content of the trigger file like pg_standby
> > does.
> 
> Well, the question is: are there users who would prefer not to have
> slave queries cancelled and are willing to wait for failover?  If so,
> behavior of failover should really be slaved to max_standby_delay.  If
> not, there should be new behavior (i.e. "when the trigger file is found,
> cancel all running queries").   One could argue that there are no users
> of the first case.
> 
> The fact that failover current does *not* terminate existing queries and
> transactions was regarded as a feature by the audience, rather than a
> bug, when I did demos of HS/SR.  Of course, they might not have been
> thinking of the delay for writes.

+1

Just to add: there is only a delay in triggering *if* the standby is
waiting on a query at or after triggering. If there is a wait, it is
never more than max_standby_delay, which is what the user said they
would be happy to accept.

> If there were an easy way to make the trigger file cancel all running
> queries, apply remaining logs and come up, then I'd vote for that for
> 9.0.  I think it's the more desired behavior by most users.  However,
> I'm opposed to any complex solutions which might delay 9.0 release.

In 8.4 you could specify "fast" failover or "smart" failover. In 9.0,
AFAICS we have only implemented "smart" failover, which means it will
continue until the end of the WAL stream before triggering. So under
heavy streaming load or with considerable lag the trigger won't cause
failover for some time. So there is less function in 9.0 than was
available in 8.4. If that removal was intended, it wasn't discussed.

-- Simon Riggs           www.2ndQuadrant.com



Re: failover vs. read only queries

From
Takahiro Itagaki
Date:
Fujii Masao <masao.fujii@gmail.com> wrote:

> > 1. Reset max_standby_delay = 0 in postgresql.conf
> > 2. pg_ctl reload
> > 3. Create a trigger file
> 
> As far as I read the HS code, SIGHUP is not checked while a recovery
> is waiting for queries :(  So pg_ctl reload would have no effect on
> the conflicting queries.
> 
> Independently from the problem I raised, I think that we should call
> HandleStartupProcInterrupts() in that sleep loop.

Hmmm, if reload doesn't work, can we write a query like below?
 SELECT pg_terminate_backend(pid)   FROM pg_locks  WHERE conflicted-with-recovery-process;

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: failover vs. read only queries

From
Tatsuo Ishii
Date:
> The fact that failover current does *not* terminate existing queries and
> transactions was regarded as a feature by the audience, rather than a
> bug, when I did demos of HS/SR.  Of course, they might not have been
> thinking of the delay for writes.

Probably you would hear different respose from serious users who are
willing to have usable HA systems. I have number of customers who are
using our HA systems (they use several technologies such as commercial
HA solutions, pgpool-II and Slony-I). The one of top 3 questions I got
when we propose them our HA solution is, "how long will it take to
do failover when the master DB crashes?"
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


Re: failover vs. read only queries

From
Mark Kirkwood
Date:
On 10/06/10 14:07, Tatsuo Ishii wrote:
>
> The one of top 3 questions I got
> when we propose them our HA solution is, "how long will it take to
> do failover when the master DB crashes?"
>
>    

Same here +1



Re: failover vs. read only queries

From
Fujii Masao
Date:
On Thu, Jun 10, 2010 at 5:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> The fact that failover current does *not* terminate existing queries and
>> transactions was regarded as a feature by the audience, rather than a
>> bug, when I did demos of HS/SR.  Of course, they might not have been
>> thinking of the delay for writes.
>
>> If there were an easy way to make the trigger file cancel all running
>> queries, apply remaining logs and come up, then I'd vote for that for
>> 9.0.  I think it's the more desired behavior by most users.  However,
>> I'm opposed to any complex solutions which might delay 9.0 release.
>
> My feeling about it is that if you want fast failover you should not
> have your failover target server configured as hot standby at all, let
> alone hot standby with a long max_standby_delay.  Such a slave could be
> very far behind on applying WAL when the crunch comes, and no amount of
> query killing will save you from that.  Put your long-running standby
> queries on a different slave instead.
>
> We should consider whether we can improve the situation in 9.1, but it
> is not a must-fix for 9.0; especially when the correct behavior isn't
> immediately obvious.

OK. Let's revisit in 9.1.

I attached the proposal patch for 9.1. The patch treats max_standby_delay
as zero (i.e., cancels all the conflicting queries immediately), ever since
the trigger file is created. So we can cause a recovery to end without
waiting for any lock held by queries, and minimize the failover time.
OTOH, queries which don't conflict with a recovery survive the failover.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: failover vs. read only queries

From
Fujii Masao
Date:
On Thu, Jun 10, 2010 at 9:58 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>
> Fujii Masao <masao.fujii@gmail.com> wrote:
>
>> > 1. Reset max_standby_delay = 0 in postgresql.conf
>> > 2. pg_ctl reload
>> > 3. Create a trigger file
>>
>> As far as I read the HS code, SIGHUP is not checked while a recovery
>> is waiting for queries :(  So pg_ctl reload would have no effect on
>> the conflicting queries.
>>
>> Independently from the problem I raised, I think that we should call
>> HandleStartupProcInterrupts() in that sleep loop.
>
> Hmmm, if reload doesn't work, can we write a query like below?
>
>  SELECT pg_terminate_backend(pid)
>    FROM pg_locks
>   WHERE conflicted-with-recovery-process;

I'm not sure that, but as you suggested, we can minimize the failover
time by using the following operation even in 9.0.
   1. Reset max_standby_delay = 0 in postgresql.conf   2. pg_ctl reload   3. Cancel all the queries or all the
conflictingones   4. Create a trigger file 

For now, I'll use the above when building the HA system using 9.0
and a clusterware.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: failover vs. read only queries

From
Josh Berkus
Date:
On 06/09/2010 07:36 PM, Mark Kirkwood wrote:
> On 10/06/10 14:07, Tatsuo Ishii wrote:
>>
>> The one of top 3 questions I got
>> when we propose them our HA solution is, "how long will it take to
>> do failover when the master DB crashes?"
>>
>
> Same here +1

In that case, wouldn't they set max_standby_delay to 0?  In which case 
the failover problem goes away, no?

--                                   -- Josh Berkus                                     PostgreSQL Experts Inc.
                           http://www.pgexperts.com
 


Re: failover vs. read only queries

From
Fujii Masao
Date:
On Fri, Jun 11, 2010 at 1:48 AM, Josh Berkus <josh@agliodbs.com> wrote:
> On 06/09/2010 07:36 PM, Mark Kirkwood wrote:
>>
>> On 10/06/10 14:07, Tatsuo Ishii wrote:
>>>
>>> The one of top 3 questions I got
>>> when we propose them our HA solution is, "how long will it take to
>>> do failover when the master DB crashes?"
>>>
>>
>> Same here +1
>
> In that case, wouldn't they set max_standby_delay to 0?  In which case the
> failover problem goes away, no?

Yes, but I guess they'd also like to run read only queries on the standby.
Setting max_standby_delay to 0 would prevent them from doing that because
the conflict with the replay of the VACUUM or HOT record would often happen.
vacuum_defer_cleanup_age would be helpful for that case, but it seems to be
hard to tune that.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: failover vs. read only queries

From
Bruce Momjian
Date:
Fujii Masao wrote:
> On Thu, Jun 10, 2010 at 5:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Josh Berkus <josh@agliodbs.com> writes:
> >> The fact that failover current does *not* terminate existing queries and
> >> transactions was regarded as a feature by the audience, rather than a
> >> bug, when I did demos of HS/SR. ?Of course, they might not have been
> >> thinking of the delay for writes.
> >
> >> If there were an easy way to make the trigger file cancel all running
> >> queries, apply remaining logs and come up, then I'd vote for that for
> >> 9.0. ?I think it's the more desired behavior by most users. ?However,
> >> I'm opposed to any complex solutions which might delay 9.0 release.
> >
> > My feeling about it is that if you want fast failover you should not
> > have your failover target server configured as hot standby at all, let
> > alone hot standby with a long max_standby_delay. ?Such a slave could be
> > very far behind on applying WAL when the crunch comes, and no amount of
> > query killing will save you from that. ?Put your long-running standby
> > queries on a different slave instead.
> >
> > We should consider whether we can improve the situation in 9.1, but it
> > is not a must-fix for 9.0; especially when the correct behavior isn't
> > immediately obvious.
> 
> OK. Let's revisit in 9.1.
> 
> I attached the proposal patch for 9.1. The patch treats max_standby_delay
> as zero (i.e., cancels all the conflicting queries immediately), ever since
> the trigger file is created. So we can cause a recovery to end without
> waiting for any lock held by queries, and minimize the failover time.
> OTOH, queries which don't conflict with a recovery survive the failover.

Should this be added to the first 9.1 commitfest?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + None of us is going to be here forever. +


Re: failover vs. read only queries

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Fujii Masao wrote:
>> On Thu, Jun 10, 2010 at 5:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> My feeling about it is that if you want fast failover you should not
>>> have your failover target server configured as hot standby at all, let
>>> alone hot standby with a long max_standby_delay.  Such a slave could be
>>> very far behind on applying WAL when the crunch comes, and no amount of
>>> query killing will save you from that.  Put your long-running standby
>>> queries on a different slave instead.
>>> 
>>> We should consider whether we can improve the situation in 9.1, but it
>>> is not a must-fix for 9.0; especially when the correct behavior isn't
>>> immediately obvious.

>> OK. Let's revisit in 9.1.
>> 
>> I attached the proposal patch for 9.1. The patch treats max_standby_delay
>> as zero (i.e., cancels all the conflicting queries immediately), ever since
>> the trigger file is created. So we can cause a recovery to end without
>> waiting for any lock held by queries, and minimize the failover time.
>> OTOH, queries which don't conflict with a recovery survive the failover.

> Should this be added to the first 9.1 commitfest?

Not sure ... it seems like proof of concept for a pretty dubious
concept.  If you want a slave to be ready for fast failover then you
should not be letting it get far behind the master in the first place.
I think there's some missing piece here, but I'm not quite sure what
to propose.
        regards, tom lane