Re: Sync Rep v17 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Sync Rep v17
Date
Msg-id 1298918388.12992.1714.camel@ebony
Whole thread Raw
In response to Re: Sync Rep v17  (Yeb Havinga <yebhavinga@gmail.com>)
List pgsql-hackers
On Mon, 2011-02-28 at 10:31 +0100, Yeb Havinga wrote:

> 1) no automatic switch to other synchronous standby
> - start master server, add synchronous standby 1
> - change allow_standalone_primary to off
> - add second synchronous standby
> - wait until pg_stat_replication shows both standby's are in STREAMING state
> - stop standby 1
> what happens is that the master stalls, where I expected that it 
> would've switched to standby 2 acknowledge commits.
> 
> The following thing was pilot error, but since I was test-piloting a new 
> plane, I still think it might be usual feedback. In my opinion, any 
> number and order of pg_ctl stops and starts on both the master and 
> standby servers, as long as they are not with -m immediate, should never 
> cause the state I reached.

The behaviour of "allow_synchronous_standby = off" is pretty much
untested and does seem to have various gotchas in there.

> 2) reaching some sort of shutdown deadlock state
> - start master server, add synchronous standby
> - change allow_standalone_primary to off
> then I did all sorts of test things, everything still ok. Then I wanted 
> to shutdown everything, and maybe because of some symmetry (stack like) 
> I did the following because I didn't think it through
> - pg_ctl stop on standby (didn't actualy wait until done, but 
> immediately in other terminal)
> - pg_ctl stop on master
> O wait.. master needs to sync transactions
> - start standby again. but now: FATAL:  the database system is shutting down
> 
> There is no clean way to get out of this situation. 
> allow_standalone_primary in the face of shutdowns might be tricky. Maybe 
> shutdown must be prohibited to enter the shutting down phase in 
> allow_standalone_primary = off together with no sync standby, that would 
> allow for the sync standby to attach again.

The behaviour of "allow_synchronous_standby = off" is not something I'm
worried about personally and I've argued all along it sounds pretty
silly to me. If someone wants to spend some time defining how it
*should* work that might help matters. I'm inclined to remove it before
commit if it can't work cleanly, to be re-added at a later date if it
makes sense.

> 
> 3) PANIC on standby server
> At some point a standby suddenly disconnected after I started a new 
> pgbench run on a existing master/standby pair, with the following error 
> in the logfile.
> 
> LOCATION:  libpqrcv_connect, libpqwalreceiver.c:171
> PANIC:  XX000: heap_update_redo: failed to add tuple
> CONTEXT:  xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new 
> 305453/102
> LOCATION:  heap_xlog_update, heapam.c:4724
> LOG:  00000: startup process (PID 32597) was terminated by signal 6: Aborted
> 
> This might be due to pilot error as well; I did a several tests over the 
> weekend and after this error I was more alert on remembering immediate 
> shutdowns/starting with a clean backup after that, and didn't see 
> similar errors since.

Good. There are no changes in the patch for that section of code.

> 4) The performance of the syncrep seems to be quite an improvement over 
> the previous syncrep patches, I've seen tps-ses of O(650) where the 
> others were more like O(20). The O(650) tps is limited by the speed of 
> the standby server I used-at several times the master would halt only 
> because of heavy disk activity at the standby. A warning in the docs 
> might be right: be sure to use good IO hardware for your synchronous 
> replicas! With that bottleneck gone, I suspect the current syncrep 
> version can go beyond 1000tps over 1 Gbit.

Good, thanks.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Sync Rep v17
Next
From: Simon Riggs
Date:
Subject: Re: Sync Rep v17