Re: Sync Rep v17 - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Sync Rep v17 |
Date | |
Msg-id | 1298918388.12992.1714.camel@ebony Whole thread Raw |
In response to | Re: Sync Rep v17 (Yeb Havinga <yebhavinga@gmail.com>) |
List | pgsql-hackers |
On Mon, 2011-02-28 at 10:31 +0100, Yeb Havinga wrote: > 1) no automatic switch to other synchronous standby > - start master server, add synchronous standby 1 > - change allow_standalone_primary to off > - add second synchronous standby > - wait until pg_stat_replication shows both standby's are in STREAMING state > - stop standby 1 > what happens is that the master stalls, where I expected that it > would've switched to standby 2 acknowledge commits. > > The following thing was pilot error, but since I was test-piloting a new > plane, I still think it might be usual feedback. In my opinion, any > number and order of pg_ctl stops and starts on both the master and > standby servers, as long as they are not with -m immediate, should never > cause the state I reached. The behaviour of "allow_synchronous_standby = off" is pretty much untested and does seem to have various gotchas in there. > 2) reaching some sort of shutdown deadlock state > - start master server, add synchronous standby > - change allow_standalone_primary to off > then I did all sorts of test things, everything still ok. Then I wanted > to shutdown everything, and maybe because of some symmetry (stack like) > I did the following because I didn't think it through > - pg_ctl stop on standby (didn't actualy wait until done, but > immediately in other terminal) > - pg_ctl stop on master > O wait.. master needs to sync transactions > - start standby again. but now: FATAL: the database system is shutting down > > There is no clean way to get out of this situation. > allow_standalone_primary in the face of shutdowns might be tricky. Maybe > shutdown must be prohibited to enter the shutting down phase in > allow_standalone_primary = off together with no sync standby, that would > allow for the sync standby to attach again. The behaviour of "allow_synchronous_standby = off" is not something I'm worried about personally and I've argued all along it sounds pretty silly to me. If someone wants to spend some time defining how it *should* work that might help matters. I'm inclined to remove it before commit if it can't work cleanly, to be re-added at a later date if it makes sense. > > 3) PANIC on standby server > At some point a standby suddenly disconnected after I started a new > pgbench run on a existing master/standby pair, with the following error > in the logfile. > > LOCATION: libpqrcv_connect, libpqwalreceiver.c:171 > PANIC: XX000: heap_update_redo: failed to add tuple > CONTEXT: xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new > 305453/102 > LOCATION: heap_xlog_update, heapam.c:4724 > LOG: 00000: startup process (PID 32597) was terminated by signal 6: Aborted > > This might be due to pilot error as well; I did a several tests over the > weekend and after this error I was more alert on remembering immediate > shutdowns/starting with a clean backup after that, and didn't see > similar errors since. Good. There are no changes in the patch for that section of code. > 4) The performance of the syncrep seems to be quite an improvement over > the previous syncrep patches, I've seen tps-ses of O(650) where the > others were more like O(20). The O(650) tps is limited by the speed of > the standby server I used-at several times the master would halt only > because of heavy disk activity at the standby. A warning in the docs > might be right: be sure to use good IO hardware for your synchronous > replicas! With that bottleneck gone, I suspect the current syncrep > version can go beyond 1000tps over 1 Gbit. Good, thanks. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: