Thread: HS/SR and smart shutdown

HS/SR and smart shutdown

From
Josh Berkus
Date:
I've been working on my demo, and I'm discovering that due to the
connection from the walsender and walreceiver, "smart" shutdown from
pg_ctl doesn't work if replication is active.

This seems worth fixing; if we don't fix it, we should at least document it.

Comments?

--Josh


Re: HS/SR and smart shutdown

From
Fujii Masao
Date:
On Thu, Jan 21, 2010 at 8:04 AM, Josh Berkus <josh@agliodbs.com> wrote:
> I've been working on my demo, and I'm discovering that due to the
> connection from the walsender and walreceiver, "smart" shutdown from
> pg_ctl doesn't work if replication is active.
>
> This seems worth fixing; if we don't fix it, we should at least document it.
>
> Comments?

Thanks for the report.

Which servers (primary or standby) did you try a "smart" shutdown on?

If it's "primary", could you show me the reproducible test set? At least
in my box, a "smart" shutdown on the primary works fine.

If it's "standby", it's a previously-existing behavior that a "smart"
shutdown doesn't work immediately during recovery. After a recovery
has been completed, it would work. Of course, I agree that such a
behavior should be documented.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: HS/SR and smart shutdown

From
Josh Berkus
Date:
> If it's "standby", it's a previously-existing behavior that a "smart"
> shutdown doesn't work immediately during recovery. After a recovery
> has been completed, it would work. Of course, I agree that such a
> behavior should be documented.

Well, as long as streaming rep is running, you can't do a smart shutdown
... smart shutdown seems to treat the walreciever as a client
connection.  At the very least, this should be in the documentation.

--Josh Berkus



Re: HS/SR and smart shutdown

From
Robert Haas
Date:
On Wed, Jan 20, 2010 at 8:44 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> If it's "standby", it's a previously-existing behavior that a "smart"
>> shutdown doesn't work immediately during recovery. After a recovery
>> has been completed, it would work. Of course, I agree that such a
>> behavior should be documented.
>
> Well, as long as streaming rep is running, you can't do a smart shutdown
> ... smart shutdown seems to treat the walreciever as a client
> connection.  At the very least, this should be in the documentation.

How hard is it to fix?

...Robert


Re: HS/SR and smart shutdown

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jan 20, 2010 at 8:44 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Well, as long as streaming rep is running, you can't do a smart shutdown
>> ... smart shutdown seems to treat the walreciever as a client
>> connection. �At the very least, this should be in the documentation.

> How hard is it to fix?

I think the first question is do we *want* to fix it, or is it
appropriate behavior?

If the master shuts down, will the slaves try to fail over to become
masters?  When the master restarts, will the slaves automatically
reconnect?  If these questions have the wrong answers, shutting down the
master isn't something to be done lightly, and automatically
disconnecting slaves would be a real bad idea.
        regards, tom lane


Re: HS/SR and smart shutdown

From
Robert Haas
Date:
On Wed, Jan 20, 2010 at 8:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Jan 20, 2010 at 8:44 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> Well, as long as streaming rep is running, you can't do a smart shutdown
>>> ... smart shutdown seems to treat the walreciever as a client
>>> connection.  At the very least, this should be in the documentation.
>
>> How hard is it to fix?
>
> I think the first question is do we *want* to fix it, or is it
> appropriate behavior?
>
> If the master shuts down, will the slaves try to fail over to become
> masters?  When the master restarts, will the slaves automatically
> reconnect?  If these questions have the wrong answers, shutting down the
> master isn't something to be done lightly, and automatically
> disconnecting slaves would be a real bad idea.

I thought the scenario in question was that someone wanted to manually
shut down the slave.  Am I misunderstanding?

...Robert


Re: HS/SR and smart shutdown

From
Mark Kirkwood
Date:
Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>   
>> On Wed, Jan 20, 2010 at 8:44 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>     
>>> Well, as long as streaming rep is running, you can't do a smart shutdown
>>> ... smart shutdown seems to treat the walreciever as a client
>>> connection.  At the very least, this should be in the documentation.
>>>       
>
>   
>> How hard is it to fix?
>>     
>
> I think the first question is do we *want* to fix it, or is it
> appropriate behavior?
>
> If the master shuts down, will the slaves try to fail over to become
> masters?  When the master restarts, will the slaves automatically
> reconnect?  If these questions have the wrong answers, shutting down the
> master isn't something to be done lightly, and automatically
> disconnecting slaves would be a real bad idea.
>
>     
Right - surely people who have been using pg_standby etc have discovered 
this behaviour, so documenting it is fine I would think.

regards

Mark


Re: HS/SR and smart shutdown

From
Fujii Masao
Date:
On Thu, Jan 21, 2010 at 10:44 AM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> If it's "standby", it's a previously-existing behavior that a "smart"
>> shutdown doesn't work immediately during recovery. After a recovery
>> has been completed, it would work. Of course, I agree that such a
>> behavior should be documented.
>
> Well, as long as streaming rep is running, you can't do a smart shutdown
> ... smart shutdown seems to treat the walreciever as a client
> connection.

Even if SR is not running, as long as the startup process is running,
we can't do a smart shutdown. It's not peculiar to SR.

> At the very least, this should be in the documentation.

Agreed. Something like "smart shutdown is not allowed during recovery"
should be in the following section.
http://developer.postgresql.org/pgdocs/postgres/server-shutdown.html

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: HS/SR and smart shutdown

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Thu, Jan 21, 2010 at 10:44 AM, Josh Berkus <josh@agliodbs.com> wrote:
>>> If it's "standby", it's a previously-existing behavior that a "smart"
>>> shutdown doesn't work immediately during recovery. After a recovery
>>> has been completed, it would work. Of course, I agree that such a
>>> behavior should be documented.
>> Well, as long as streaming rep is running, you can't do a smart shutdown
>> ... smart shutdown seems to treat the walreciever as a client
>> connection.
> 
> Even if SR is not running, as long as the startup process is running,
> we can't do a smart shutdown. It's not peculiar to SR.

Right, that's the way a standby server (= one still in recovery) has
always behaved. It has made sense in the past: it's not in the spirit of
smart shutdown to kill the WAL replay immediately. "smart" means wait
for recovery to finish, then shutdown.

It's a good question if that still makes sense with Hot Standby. Perhaps
we should redefine smart shutdown in standby mode to shut down as soon
as all read-only connections have died.

>>  At the very least, this should be in the documentation.
> 
> Agreed. Something like "smart shutdown is not allowed during recovery"
> should be in the following section.
> http://developer.postgresql.org/pgdocs/postgres/server-shutdown.html

It's allowed, it just doesn't do what you might expect.


In the master, smart shutdown shuts down as soon as all regular backends
are gone. It doesn't wait for the standby connections to die. In fact
they're not killed until after the shutdown checkpoint is written, so
that it gets sent to the standbys too. I think we're good there.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HS/SR and smart shutdown

From
Greg Smith
Date:
Heikki Linnakangas wrote:
> It's a good question if that still makes sense with Hot Standby. Perhaps
> we should redefine smart shutdown in standby mode to shut down as soon
> as all read-only connections have died.
>   

I've advocated in the past that an escalating shutdown procedure would 
be helpful in general to have available.  Start kicking off clients with 
smart, continue to fast if there's any left, and if there's still any 
left after that (have seen COPY clients that ignore fast) disconnect 
them and go to immediate to completely kill them.  Once you've started 
the server on the road to shutdown, even with smart, you've basically 
committed to going all the way down by whatever means is available 
anyway, so why not make that more automated and easier.

If something like that were available, I could see inserting a step in 
the middle there specifically aimed at resolving this issue.  Maybe it's 
just a change to the beginning of fast shutdown, or to the end of smart 
as I think you're suggesting.  Perhaps you only get it if you do one of 
these escalating shutdowns I'm proposing, making that the preferred way 
to handle HS servers.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: HS/SR and smart shutdown

From
Fujii Masao
Date:
On Thu, Jan 21, 2010 at 4:27 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> It's a good question if that still makes sense with Hot Standby. Perhaps
> we should redefine smart shutdown in standby mode to shut down as soon
> as all read-only connections have died.

Okay. Let's work out the details.

I guess that the startup process and the walreceiver should wait
for all read only backends to exit in smart shutdown case. It's
because those backends might be waiting for the record that conflicts
with their queries to be replayed. Is this OK? Or we should kill the
startup process and the walreceiver on ahead?

If my guess is right, we would need to add new PMState to cancel
recovery and replication after all read only connections have died.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: HS/SR and smart shutdown

From
Josh Berkus
Date:
Fujii,

> I guess that the startup process and the walreceiver should wait
> for all read only backends to exit in smart shutdown case. It's
> because those backends might be waiting for the record that conflicts
> with their queries to be replayed. Is this OK? Or we should kill the
> startup process and the walreceiver on ahead?
> 
> If my guess is right, we would need to add new PMState to cancel
> recovery and replication after all read only connections have died.

How could existing read queries on the slave be waiting on a WAL record?I don't follow this.

--Josh Berkus



Re: HS/SR and smart shutdown

From
Heikki Linnakangas
Date:
Josh Berkus wrote:
>> I guess that the startup process and the walreceiver should wait
>> for all read only backends to exit in smart shutdown case. It's
>> because those backends might be waiting for the record that conflicts
>> with their queries to be replayed. Is this OK? Or we should kill the
>> startup process and the walreceiver on ahead?
>>
>> If my guess is right, we would need to add new PMState to cancel
>> recovery and replication after all read only connections have died.
> 
> How could existing read queries on the slave be waiting on a WAL record?

Imagine that you do this in the master:

begin;
DROP TABLE foo (id int4);
< a lot of other stuff>
commit;

When the DROP is replayed in the standby, the startup process acquires a
lock on table foo, on behalf of the transaction that it's replaying. If
you run "SELECT * FROM foo" in the standby after that, it will block
until the startup process replays the COMMIT record and releases the lock.

This is similar to the deadlock situation in hot standby that was
discussed on the other thread, "Re: pgsql: In HS, Startup process sets
SIGALRM when waiting for buffer pin."

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: HS/SR and smart shutdown

From
Simon Riggs
Date:
On Thu, 2010-01-21 at 09:27 +0200, Heikki Linnakangas wrote:

> Right, that's the way a standby server (= one still in recovery) has
> always behaved. It has made sense in the past: it's not in the spirit
> of smart shutdown to kill the WAL replay immediately. "smart" means
> wait for recovery to finish, then shutdown.
> 
> It's a good question if that still makes sense with Hot Standby.
> Perhaps we should redefine smart shutdown in standby mode to shut down
> as soon as all read-only connections have died.

It's clear that "smart" shutdown doesn't work while something is active.
Recovery is "active" and so we shouldn't shutdown. It makes sense, it
works like this already, lets leave it. Document it if needed.

-- Simon Riggs           www.2ndQuadrant.com



Re: HS/SR and smart shutdown

From
Josh Berkus
Date:
>> It's a good question if that still makes sense with Hot Standby.
>> Perhaps we should redefine smart shutdown in standby mode to shut down
>> as soon as all read-only connections have died.
> 
> It's clear that "smart" shutdown doesn't work while something is active.
> Recovery is "active" and so we shouldn't shutdown. It makes sense, it
> works like this already, lets leave it. Document it if needed.

I don't think it's clear, or intuitive for users.  In SR, recovery is
*never* done, so smart shutdown never completes (even if the master is
shut down, when I tested it).  This is particularly an important issue
when you consider that some/many service and init scripts only use smart
shutdown ... so we'll get a lot of "bug reports" of "posgresql does not
shut down".

HOWEVER, I do believe this is an issue we could live with for 9.0 if
it's going to lead to a whole lot of additional debugging of SR.  But if
it's an easy fix, it'll avoid a lot of complaints on pgsql-general.

--Josh Berkus


Re: HS/SR and smart shutdown

From
Robert Haas
Date:
On Fri, Jan 29, 2010 at 7:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> It's a good question if that still makes sense with Hot Standby.
>>> Perhaps we should redefine smart shutdown in standby mode to shut down
>>> as soon as all read-only connections have died.
>> It's clear that "smart" shutdown doesn't work while something is active.
>> Recovery is "active" and so we shouldn't shutdown. It makes sense, it
>> works like this already, lets leave it. Document it if needed.
> I don't think it's clear, or intuitive for users.  In SR, recovery is
> *never* done, so smart shutdown never completes (even if the master is
> shut down, when I tested it).  This is particularly an important issue
> when you consider that some/many service and init scripts only use smart
> shutdown ... so we'll get a lot of "bug reports" of "posgresql does not
> shut down".

Absolutely agreed.  The existing smart shutdown behavior makes sense
from a certain point of view, but it doesn't seem very... what's the
word I'm looking for?... smart.

> HOWEVER, I do believe this is an issue we could live with for 9.0 if
> it's going to lead to a whole lot of additional debugging of SR.  But if
> it's an easy fix, it'll avoid a lot of complaints on pgsql-general.

Also agreed.

...Robert


Re: HS/SR and smart shutdown

From
Fujii Masao
Date:
On Sat, Jan 30, 2010 at 9:01 AM, Josh Berkus <josh@agliodbs.com> wrote:
> I don't think it's clear, or intuitive for users.  In SR, recovery is
> *never* done, so smart shutdown never completes (even if the master is
> shut down, when I tested it).

If you specify the trigger_file parameter in the recovery.conf, the presence
of the trigger file would complete recovery. So the existing smart shutdown
waits for it to be created. I agree that this behavior is somewhat confusing
for users.

> HOWEVER, I do believe this is an issue we could live with for 9.0 if
> it's going to lead to a whole lot of additional debugging of SR.  But if
> it's an easy fix, it'll avoid a lot of complaints on pgsql-general.

I think that the latter statement is right.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: HS/SR and smart shutdown

From
Magnus Hagander
Date:
On Sat, Jan 30, 2010 at 01:05, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 29, 2010 at 7:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>>> It's a good question if that still makes sense with Hot Standby.
>>>> Perhaps we should redefine smart shutdown in standby mode to shut down
>>>> as soon as all read-only connections have died.
>>> It's clear that "smart" shutdown doesn't work while something is active.
>>> Recovery is "active" and so we shouldn't shutdown. It makes sense, it
>>> works like this already, lets leave it. Document it if needed.
>> I don't think it's clear, or intuitive for users.  In SR, recovery is
>> *never* done, so smart shutdown never completes (even if the master is
>> shut down, when I tested it).  This is particularly an important issue
>> when you consider that some/many service and init scripts only use smart
>> shutdown ... so we'll get a lot of "bug reports" of "posgresql does not
>> shut down".
>
> Absolutely agreed.  The existing smart shutdown behavior makes sense
> from a certain point of view, but it doesn't seem very... what's the
> word I'm looking for?... smart.

Yeah.
How about we change it so it's not the default anymore?

The fact is that for most applications, it's just broken. Consider any
application that uses connection pooling, which happens to be what we
recommend people to do. A smart shutdown will never shut that server
down. But it will make it not accept new connections. Which is
probably the worst possible behavior in most cases.


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: HS/SR and smart shutdown

From
Fujii Masao
Date:
On Sat, Jan 30, 2010 at 12:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> HOWEVER, I do believe this is an issue we could live with for 9.0 if
>> it's going to lead to a whole lot of additional debugging of SR.  But if
>> it's an easy fix, it'll avoid a lot of complaints on pgsql-general.
>
> I think that the latter statement is right.

Though we've not reached consensus on smart shutdown during
recovery yet, I wrote the patch that changes its behavior:
shut down the server (including the startup process and the
walreceiver) as soon as all read-only connections have died.
The code is also available in the 'replication' branch in
my git repository.

And, let's discuss whether something like the attached patch
is required for v9.0 or not.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: HS/SR and smart shutdown

From
Fujii Masao
Date:
On Mon, Feb 1, 2010 at 11:49 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sat, Jan 30, 2010 at 12:54 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> HOWEVER, I do believe this is an issue we could live with for 9.0 if
>>> it's going to lead to a whole lot of additional debugging of SR.  But if
>>> it's an easy fix, it'll avoid a lot of complaints on pgsql-general.
>>
>> I think that the latter statement is right.
>
> Though we've not reached consensus on smart shutdown during
> recovery yet, I wrote the patch that changes its behavior:
> shut down the server (including the startup process and the
> walreceiver) as soon as all read-only connections have died.
> The code is also available in the 'replication' branch in
> my git repository.
>
> And, let's discuss whether something like the attached patch
> is required for v9.0 or not.

There is no post about this for over a month. Can I remove this
from TODO item of SR for 9.0? Thought? Objection?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: HS/SR and smart shutdown

From
Greg Stark
Date:
On Thu, Mar 4, 2010 at 12:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> There is no post about this for over a month. Can I remove this
> from TODO item of SR for 9.0? Thought? Objection?
>

Does smart shutdown still fail to shut down a slave?

-- 
greg


Re: HS/SR and smart shutdown

From
Fujii Masao
Date:
On Thu, Mar 4, 2010 at 11:55 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Thu, Mar 4, 2010 at 12:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> There is no post about this for over a month. Can I remove this
>> from TODO item of SR for 9.0? Thought? Objection?
>>
>
> Does smart shutdown still fail to shut down a slave?

Yes. More precisely, smart shutdown during recovery does not complete
until recovery ends.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: HS/SR and smart shutdown

From
Robert Haas
Date:
On Thu, Mar 4, 2010 at 10:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Mar 4, 2010 at 11:55 PM, Greg Stark <gsstark@mit.edu> wrote:
>> On Thu, Mar 4, 2010 at 12:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> There is no post about this for over a month. Can I remove this
>>> from TODO item of SR for 9.0? Thought? Objection?
>>>
>>
>> Does smart shutdown still fail to shut down a slave?
>
> Yes. More precisely, smart shutdown during recovery does not complete
> until recovery ends.

Well, I don't think we should let smart shutdown just never terminate
when standby_mode = on.  That's really a minefield for the unwary.  I
think we either need to make it work, or somehow give the user an
error that says "try a different shutdown mode".

...Robert


Re: HS/SR and smart shutdown

From
Greg Stark
Date:
On Thu, Mar 4, 2010 at 3:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 4, 2010 at 10:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>> Yes. More precisely, smart shutdown during recovery does not complete
>> until recovery ends.
>
> Well, I don't think we should let smart shutdown just never terminate
> when standby_mode = on.  That's really a minefield for the unwary.  I
> think we either need to make it work, or somehow give the user an
> error that says "try a different shutdown mode".

It also seems dangerous to let someone think they have a standby
database ready to go and the minute they need it -- it shuts down....


--
greg


Re: HS/SR and smart shutdown

From
Robert Haas
Date:
On Thu, Mar 4, 2010 at 12:39 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Thu, Mar 4, 2010 at 3:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Mar 4, 2010 at 10:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>
>>> Yes. More precisely, smart shutdown during recovery does not complete
>>> until recovery ends.
>>
>> Well, I don't think we should let smart shutdown just never terminate
>> when standby_mode = on.  That's really a minefield for the unwary.  I
>> think we either need to make it work, or somehow give the user an
>> error that says "try a different shutdown mode".
>
> It also seems dangerous to let someone think they have a standby
> database ready to go and the minute they need it -- it shuts down....

LOL.

Yeah, that would not be cool.

...Robert