Re: upgrade to repmgr3 - Mailing list pgsql-general

From Pekka Rinne
Subject Re: upgrade to repmgr3
Date
Msg-id CACAG2VWWkuZr24LvwWrJEhYS4S72U397+eVn6m8Q2NKr8_OqAQ@mail.gmail.com
Whole thread Raw
In response to Re: upgrade to repmgr3  (Ian Barwick <ian@2ndquadrant.com>)
Responses Re: upgrade to repmgr3  (Martín Marqués <martin@2ndquadrant.com>)
List pgsql-general
hello

Thanks for your comments. They are very helpful. If you have any draft level documentation available of the upgrade procedure I would be very interested in seeing it and maybe trying it out as well.

Meanwhile I did some more testing with my environment using repmgr3 and noticed an issue with promoting standby node. Here is roughly what I did.

1. Install repmgr3.1.2 RPM to all nodes as upgrade to previous 2.0.2.
2. I took repmgr release upgrade SQL-scripts from github 3.1 stable and ran those on master (all three scripts in order).
3. on master I stopped postgresql service
4. on standby I said standby promote which does some things and then hangs forever.

This standby promote was working fine before repmgr upgrade.

There is a COMMIT command visible with ps:

3324 ?        Ss     0:00 postgres: repmgr repmgr <new master IP>(43666) COMMIT waiting for 2/4E000548

What I see in the console is here:

[2016-08-08 10:29:03] [NOTICE] using configuration file "/var/lib/pgsql/repmgr/repmgr.conf"
[2016-08-08 10:29:03] [INFO] connecting to standby database
[2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP> user=repmgr dbname=repmgr fallback_application_name='repmgr''
[2016-08-08 10:29:03] [INFO] connected to standby, checking its state
[2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT pg_catalog.pg_is_in_recovery()
[2016-08-08 10:29:03] [INFO] retrieving node list for cluster 'sensorlog'
[2016-08-08 10:29:03] [DEBUG] get_master_connection():
  SELECT id, conninfo,          CASE WHEN type = 'master' THEN 1 ELSE 2 END AS type_priority    FROM "repmgr_sensorlog".repl_nodes    WHERE cluster = 'sensorlog'      AND type != 'witness' ORDER BY active DESC, type_priority, priority, id
[2016-08-08 10:29:03] [INFO] checking role of cluster node '1'
[2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<old master IP> user=repmgr dbname=repmgr fallback_application_name='repmgr''
[2016-08-08 10:29:03] [ERROR] connection to database failed: could not connect to server: Connection refused
        Is the server running on host "<old master IP>" and accepting
        TCP/IP connections on port 5432?
[2016-08-08 10:29:03] [INFO] checking role of cluster node '2'
[2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP> user=repmgr dbname=repmgr fallback_application_name='repmgr''
[2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT pg_catalog.pg_is_in_recovery()
[2016-08-08 10:29:03] [INFO] checking role of cluster node '3'
[2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<old master IP> user=repmgr dbname=repmgr fallback_application_name='repmgr''
[2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT pg_catalog.pg_is_in_recovery()
[2016-08-08 10:29:03] [NOTICE] promoting standby
[2016-08-08 10:29:03] [DEBUG] get_pg_setting(): SELECT name, setting  FROM pg_settings WHERE name = 'data_directory'
[2016-08-08 10:29:03] [DEBUG] get_pg_setting(): returned value is "/var/lib/pgsql/data"
[2016-08-08 10:29:03] [NOTICE] promoting server using '/usr/bin/pg_ctl -D /var/lib/pgsql/data promote'
server promoting
[2016-08-08 10:29:03] [INFO] reconnecting to promoted server
[2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP> user=repmgr dbname=repmgr fallback_application_name='repmgr''
[2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT pg_catalog.pg_is_in_recovery()
[2016-08-08 10:29:05] [DEBUG] is_standby(): SELECT pg_catalog.pg_is_in_recovery()
[2016-08-08 10:29:05] [DEBUG] setting node 2 as master and marking existing master as failed
[2016-08-08 10:29:05] [DEBUG] begin_transaction()
[2016-08-08 10:29:05] [DEBUG] commit_transaction()

The system is left in a strange state after this. If I start postgresql again in old master node and issue cluster show it lists both nodes as masters.

In this experiment I did not take slot based replication into use at all. Everything is left just as it was before except the repmgr upgrade.

br,
Pekka


2016-08-04 16:19 GMT+03:00 Ian Barwick <ian@2ndquadrant.com>:
Hi

On 08/04/2016 05:57 PM, Pekka Rinne wrote:
hi!

I have been using postgres 9.4 and repmgr2.0 combination and been doing
replication (hot standby). Now I'd like to start doing slot based
replication and have installed repmgr3 and exeuted the provided sql scripts
and also added use_replication_slots=1 into repmgr.conf.

The question is that what is the correct procedure to switch into using
slots (max_replication_slots) in this case as the system has been set up
already without them? Do I have to unregister and re-register all the
standbys? Hopefully re-clone could be avoided.

No reclone needed.

What I tried was that I configured max_replication_hosts=5, restarted
master, created some slots using select * from
pg_create_physical_replication_slot(<name>), configured one created slot
into recovery.conf in the slave. What I noticed was that replication seemed
to be still working after this but in repl_nodes table slot_name remained
empty. Then I did standby re-register with force and slot_name was filled
with repmgr_slot_2 value which is not the name I gave for the slot. I think
repmgr invented this name but in the pg_replication_slots table
repmgr_slot_2 does not exist. There is only the slot I created myself
(active=t). So I guess this approach is not quite right.

What if I just skip doing the re-register. Does is matter if slot_name
remains empty in repl_nodes?

This won't affect replication, however if you attempt any failover actions
using repmgr (e.g. following a new master), it will probably cause problems when
attempting to create a replication slot on the new master.

As a workaround you can manually update the slot name in the repl_nodes table
to match the one you've chosen. We'll update repmgr to better handle this
kind of situation. I don't think we've had this particular use-case before,
so I'll add some notes to the documentation on how best to handle it.


Regards

Ian Barwick

--
 Ian Barwick                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-general by date:

Previous
From: "Charles Clavadetscher"
Date:
Subject: Re: lower() silently fails for 9.5.3 on Windows, but works on Mac, Linux
Next
From: Tom Lane
Date:
Subject: Re: lower() silently fails for 9.5.3 on Windows, but works on Mac, Linux