Re: Sync Rep and shutdown Re: Sync Rep v19 - Mailing list pgsql-hackers

From Yeb Havinga
Subject Re: Sync Rep and shutdown Re: Sync Rep v19
Date
Msg-id 4D77908C.7010200@gmail.com
Whole thread Raw
In response to Re: Sync Rep and shutdown Re: Sync Rep v19  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Sync Rep and shutdown Re: Sync Rep v19
List pgsql-hackers
On 2011-03-09 15:10, Simon Riggs wrote:
> On Wed, 2011-03-09 at 16:38 +0900, Fujii Masao wrote:
>> On Wed, Mar 9, 2011 at 2:14 PM, Jaime Casanova<jaime@2ndquadrant.com>  wrote:
>>> On Tue, Mar 8, 2011 at 11:58 AM, Robert Haas<robertmhaas@gmail.com>  wrote:
>>>> The fast shutdown handling seems fine, but why not just handle smart
>>>> shutdown the same way?
>>> currently, smart shutdown means no new connections, wait until
>>> existing ones close normally. for consistency, it should behave the
>>> same for sync rep.
>> Agreed. I think that user who wants to request smart shutdown expects all
>> the existing connections to basically be closed normally by the client. So it
>> doesn't seem to be good idea to forcibly close the connection and prevent
>> the COMMIT from being returned in smart shutdown case. But I'm all ears
>> for better suggestions.
>>
>> Anyway, we got the consensus about how fast shutdown should work with
>> sync rep. So I created the patch. Please feel free to comment and commit
>> the patch first ;)
> We're just about to publish Alpha4 with this feature in.
>
> If we release waiters too early we will cause effective data loss, that
> part is agreed. We've also accepted that there are few ways to release
> the waiters.
>
> I want to release the first version as "safe" and then relax from there
> after feedback.
This is not safe and possible in the first version:

1) issue stop on master when no sync standby is connected:
mgrid@mg73:~$ pg_ctl -D /data stop
waiting for server to shut 
down............................................................... failed
pg_ctl: server does not shut down

2) start the standby that failed
mgrid@mg72:~$ pg_ctl -D /data start
pg_ctl: another server might be running; trying to start server anyway
LOG:  00000: database system was interrupted while in recovery at log 
time 2011-03-09 15:22:31 CET
HINT:  If this has occurred more than once some data might be corrupted 
and you might need to choose an earlier recovery target.
LOG:  00000: entering standby mode
LOG:  00000: redo starts at 57/1A000078
LOG:  00000: consistent recovery state reached at 57/1A0000A0
FATAL:  XX000: could not connect to the primary server: FATAL:  the 
database system is shutting down

LOCATION:  libpqrcv_connect, libpqwalreceiver.c:102
server starting
mgrid@mg72:~$ FATAL:  XX000: could not connect to the primary server: 
FATAL:  the database system is shutting down

A safe solution would be to prevent smart shutdown on the master if it 
is in sync mode and there are no sync standbys connected.

The current situation is definately unsafe because it forces people that 
are in this state to do a fast shutdown.. but that fails as well, so 
they are only left with immediate.

mgrid@mg73:~$ pg_ctl -D /data stop -m fast
waiting for server to shut 
down............................................................... failed
pg_ctl: server does not shut down
mgrid@mg73:~$ pg_ctl -D /data stop -m immediate
waiting for server to shut down.... done
server stopped

regards,
Yeb Havinga



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Fwd: index corruption in PG 8.3.13
Next
From: Nikhil Sontakke
Date:
Subject: Re: Fwd: index corruption in PG 8.3.13