Thread: DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1

Hello,

Currently we are working on setting up replication and testing failover scenarios and failback. During our testing, failover is getting successful. During Failback, when we are reverting the original primary instance as the new standby, we are getting pg_rewind errors. Kindly can someone check and let us know.


DETAIL: pg_rewind command is "/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2'"
DEBUG: executing:
  /opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2' 2>/tmp/repmgr_command.wgVGPS
DEBUG: result of command was 1 (256)
DEBUG: local_command(): output returned was:
pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/802B668

___________________________

DEVESH KUMAR

Database Admin I – India

M: +91 6366843695

devesh.kumar@cmegroup.com



CC24_EC010-Great-Place-to-Work-India-email-sign-260x100px_v2 (1) (1).jpg

Address: Tridib Building Block B 5th Floor

Bagmane Tech Park CV Raman Nagar,

Bengaluru,  560093,  IN
www.cmegroup.com

 



NOTICE: This message, and any attachments, are for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at https://www.cmegroup.com/tools-information/communications/e-communication-disclaimer.html If you are not the intended recipient, please delete this message. CME Group and its subsidiaries reserve the right to monitor all email communications that occur on CME Group information systems.
Attachment
On Sat, 2024-04-27 at 00:36 +0530, Kumar, Devesh wrote:
> Currently we are working on setting up replication and testing failover scenarios
> and failback. During our testing, failover is getting successful. During Failback,
> when we are reverting the original primary instance as the new standby, we are
> getting pg_rewind errors. Kindly can someone check and let us know.
>
> pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
> pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
> pg_rewind: error: could not find previous WAL record at 0/802B668

You should show the exact commands used for failover and failback.

Yours,
Laurenz Albe



Hello Laurenz

Thanks for the response. I am putting the details as below:

Primary repmgr.conf Details
image.png

Secondary repmgr.conf Details

image.png

Failover steps:

We stopped the primary server pg service and repmgrd automatically did the failover to standby and made standby as the new primary.

See the below status after failover

image.png


Failback steps;

1. We executed a checkpoint on the new primary( originally standby ).
 2. We ran the below node rejoin command with --dry-run

repmgr node rejoin -f /opt/postgresql/15.6/bin/repmgr.conf -d 'host=10.29.97.241 port=5432 user=repmgr dbname=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.local.conf,pg_hba.conf -v --dry-run ///try to check if original_primary is eligible to rejoin


NOTICE: rejoin target is node "d-dba-pg-rnh9" (ID: 2)
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 7360952088605465701
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/9000028
INFO: prerequisites for using pg_rewind are met
INFO: file "postgresql.conf" would be copied to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/postgresql.conf"
WARNING: specified file "/pgresdata101/data/postgresql.local.conf" not found, skipping
INFO: file "pg_hba.conf" would be copied to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/pg_hba.conf"
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
  /opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met

3. executed node rejoin command 

repmgr node rejoin -f /opt/postgresql/15.6/bin/repmgr.conf -d 'host=10.29.97.241 port=5432 user=repmgr dbname=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.local.conf,pg_hba.conf -v
NOTICE: using provided configuration file "/opt/postgresql/15.6/bin/repmgr.conf"
DEBUG: server version number is: 150000
DEBUG: set_config():
  SET synchronous_commit TO 'local'
DEBUG: get_primary_node_id():
SELECT node_id                   FROM repmgr.nodes     WHERE type = 'primary'    AND active IS TRUE
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 2
NOTICE: rejoin target is node "d-dba-pg-rnh9" (ID: 2)
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=10.29.97.241 port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
  SET synchronous_commit TO 'local'
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1
DEBUG: local timeline: 1; rejoin target timeline: 2
DEBUG: get_timeline_history():
TIMELINE_HISTORY 2
DEBUG: local tli: 1; local_xlogpos: 0/9000028; follow_target_history->tli: 1; follow_target_history->end: 0/9000000
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/9000028
DEBUG: guc_set():
SELECT true FROM pg_catalog.pg_settings  WHERE name = 'full_page_writes' AND setting = 'off'
DEBUG: guc_set():
SELECT true FROM pg_catalog.pg_settings  WHERE name = 'wal_log_hints' AND setting = 'on'
INFO: prerequisites for using pg_rewind are met
DEBUG: using archive directory "/tmp/repmgr-config-archive-d-dba-pg-0ptt"
DEBUG: copying "postgresql.conf" to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/postgresql.conf"
WARNING: specified file "/pgresdata101/data/postgresql.local.conf" not found, skipping
DEBUG: copying "pg_hba.conf" to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/pg_hba.conf"
INFO: 2 files copied to "/tmp/repmgr-config-archive-d-dba-pg-0ptt"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2'"
DEBUG: executing:
  /opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2' 2>/tmp/repmgr_command.wgVGPS
DEBUG: result of command was 1 (256)
DEBUG: local_command(): output returned was:
pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/802B668

ERROR: pg_rewind execution failed
DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/802B668

___________________________

DEVESH KUMAR

Database Admin I – India

M: +91 6366843695

devesh.kumar@cmegroup.com



CC24_EC010-Great-Place-to-Work-India-email-sign-260x100px_v2 (1) (1).jpg

Address: Tridib Building Block B 5th Floor

Bagmane Tech Park CV Raman Nagar,

Bengaluru,  560093,  IN
www.cmegroup.com

 



On Mon, Apr 29, 2024 at 3:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
This email is from an external source. Do not click links or open attachments you do not trust. EXERCISE CAUTION.

On Sat, 2024-04-27 at 00:36 +0530, Kumar, Devesh wrote:
> Currently we are working on setting up replication and testing failover scenarios
> and failback. During our testing, failover is getting successful. During Failback,
> when we are reverting the original primary instance as the new standby, we are
> getting pg_rewind errors. Kindly can someone check and let us know.
>
> pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
> pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
> pg_rewind: error: could not find previous WAL record at 0/802B668

You should show the exact commands used for failover and failback.

Yours,
Laurenz Albe


NOTICE: This message, and any attachments, are for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at https://www.cmegroup.com/tools-information/communications/e-communication-disclaimer.html If you are not the intended recipient, please delete this message. CME Group and its subsidiaries reserve the right to monitor all email communications that occur on CME Group information systems.
Attachment