Re: upgrade to repmgr3 - Mailing list pgsql-general
From | Martín Marqués |
---|---|
Subject | Re: upgrade to repmgr3 |
Date | |
Msg-id | 600001f5-60fe-9a29-12b6-b8dd5d67309c@2ndquadrant.com Whole thread Raw |
In response to | Re: upgrade to repmgr3 (Pekka Rinne <tsierkkis@gmail.com>) |
Responses |
Re: upgrade to repmgr3
|
List | pgsql-general |
Hi, El 08/08/16 a las 05:57, Pekka Rinne escribió: > > Meanwhile I did some more testing with my environment using repmgr3 and > noticed an issue with promoting standby node. Here is roughly what I did. > > 1. Install repmgr3.1.2 RPM to all nodes as upgrade to previous 2.0.2. > 2. I took repmgr release upgrade SQL-scripts from github 3.1 stable and > ran those on master (all three scripts in order). > 3. on master I stopped postgresql service > 4. on standby I said standby promote which does some things and then > hangs forever. > > This standby promote was working fine before repmgr upgrade. > > There is a COMMIT command visible with ps: > > 3324 ? Ss 0:00 postgres: repmgr repmgr <new master IP>(43666) > COMMIT waiting for 2/4E000548 You mean it doesn't release the execution and give back the prompt? Do you by chance have synchronous replication set? That ps output alone doesn't say much, but being stuck on COMMIT normally points to failure to sync the replication on a standby. > [2016-08-08 10:29:03] [DEBUG] get_pg_setting(): returned value is > "/var/lib/pgsql/data" > [2016-08-08 10:29:03] [NOTICE] promoting server using '/usr/bin/pg_ctl > -D /var/lib/pgsql/data promote' > server promoting Here, it runs the promote command. > [2016-08-08 10:29:03] [INFO] reconnecting to promoted server > [2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP> > user=repmgr dbname=repmgr fallback_application_name='repmgr'' > [2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT > pg_catalog.pg_is_in_recovery() > [2016-08-08 10:29:05] [DEBUG] is_standby(): SELECT > pg_catalog.pg_is_in_recovery() > [2016-08-08 10:29:05] [DEBUG] setting node 2 as master and marking > existing master as failed At this point, the promoted standby is a primary server (master) and repmgr will then update the nodes table to reflect that: > [2016-08-08 10:29:05] [DEBUG] begin_transaction() > [2016-08-08 10:29:05] [DEBUG] commit_transaction() If this commit is what you are seeing stuck (from ps output from above), the it's like that you have a mixture of synchronous_commit set on and synchronous_standby_names having 1 standby which is not available. What settings are you using for those 2 parameters? > The system is left in a strange state after this. If I start postgresql > again in old master node and issue cluster show it lists both nodes as > masters. That's not a surprise. This is called a brain split, something repmgr doesn't fully take care of (we rely on other tools to do the fencing or STONITH) Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-general by date: