Re: upgrade to repmgr3 - Mailing list pgsql-general

From Martín Marqués
Subject Re: upgrade to repmgr3
Date
Msg-id 600001f5-60fe-9a29-12b6-b8dd5d67309c@2ndquadrant.com
Whole thread Raw
In response to Re: upgrade to repmgr3  (Pekka Rinne <tsierkkis@gmail.com>)
Responses Re: upgrade to repmgr3
List pgsql-general
Hi,

El 08/08/16 a las 05:57, Pekka Rinne escribió:
>
> Meanwhile I did some more testing with my environment using repmgr3 and
> noticed an issue with promoting standby node. Here is roughly what I did.
>
> 1. Install repmgr3.1.2 RPM to all nodes as upgrade to previous 2.0.2.
> 2. I took repmgr release upgrade SQL-scripts from github 3.1 stable and
> ran those on master (all three scripts in order).
> 3. on master I stopped postgresql service
> 4. on standby I said standby promote which does some things and then
> hangs forever.
>
> This standby promote was working fine before repmgr upgrade.
>
> There is a COMMIT command visible with ps:
>
> 3324 ?        Ss     0:00 postgres: repmgr repmgr <new master IP>(43666)
> COMMIT waiting for 2/4E000548

You mean it doesn't release the execution and give back the prompt?

Do you by chance have synchronous replication set? That ps output alone
doesn't say much, but being stuck on COMMIT normally points to failure
to sync the replication on a standby.

> [2016-08-08 10:29:03] [DEBUG] get_pg_setting(): returned value is
> "/var/lib/pgsql/data"
> [2016-08-08 10:29:03] [NOTICE] promoting server using '/usr/bin/pg_ctl
> -D /var/lib/pgsql/data promote'
> server promoting

Here, it runs the promote command.

> [2016-08-08 10:29:03] [INFO] reconnecting to promoted server
> [2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP>
> user=repmgr dbname=repmgr fallback_application_name='repmgr''
> [2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT
> pg_catalog.pg_is_in_recovery()
> [2016-08-08 10:29:05] [DEBUG] is_standby(): SELECT
> pg_catalog.pg_is_in_recovery()
> [2016-08-08 10:29:05] [DEBUG] setting node 2 as master and marking
> existing master as failed

At this point, the promoted standby is a primary server (master) and
repmgr will then update the nodes table to reflect that:

> [2016-08-08 10:29:05] [DEBUG] begin_transaction()
> [2016-08-08 10:29:05] [DEBUG] commit_transaction()

If this commit is what you are seeing stuck (from ps output from above),
the it's like that you have a mixture of synchronous_commit set on and
synchronous_standby_names having 1 standby which is not available.

What settings are you using for those 2 parameters?

> The system is left in a strange state after this. If I start postgresql
> again in old master node and issue cluster show it lists both nodes as
> masters.

That's not a surprise. This is called a brain split, something repmgr
doesn't fully take care of (we rely on other tools to do the fencing or
STONITH)

Regards,

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


pgsql-general by date:

Previous
From: "hari.prasath"
Date:
Subject: Materialized view auto refresh
Next
From: Kevin Grittner
Date:
Subject: Re: Materialized view auto refresh