Thread: DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
From
"Kumar, Devesh"
Date:
Hello,
Currently we are working on setting up replication and testing failover scenarios and failback. During our testing, failover is getting successful. During Failback, when we are reverting the original primary instance as the new standby, we are getting pg_rewind errors. Kindly can someone check and let us know.
DETAIL: pg_rewind command is "/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2'"
DEBUG: executing:
/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2' 2>/tmp/repmgr_command.wgVGPS
DEBUG: result of command was 1 (256)
DEBUG: local_command(): output returned was:
pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/802B668
Currently we are working on setting up replication and testing failover scenarios and failback. During our testing, failover is getting successful. During Failback, when we are reverting the original primary instance as the new standby, we are getting pg_rewind errors. Kindly can someone check and let us know.
DETAIL: pg_rewind command is "/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2'"
DEBUG: executing:
/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2' 2>/tmp/repmgr_command.wgVGPS
DEBUG: result of command was 1 (256)
DEBUG: local_command(): output returned was:
pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/802B668
___________________________
DEVESH KUMAR
Database Admin I – India
M: +91 6366843695
Address: Tridib Building Block B 5th Floor
Bagmane Tech Park CV Raman Nagar,
Bengaluru, 560093, IN
www.cmegroup.com
Attachment
Re: DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
From
Laurenz Albe
Date:
On Sat, 2024-04-27 at 00:36 +0530, Kumar, Devesh wrote: > Currently we are working on setting up replication and testing failover scenarios > and failback. During our testing, failover is getting successful. During Failback, > when we are reverting the original primary instance as the new standby, we are > getting pg_rewind errors. Kindly can someone check and let us know. > > pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1 > pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory > pg_rewind: error: could not find previous WAL record at 0/802B668 You should show the exact commands used for failover and failback. Yours, Laurenz Albe
Re: DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
From
"Kumar, Devesh"
Date:
Hello Laurenz
Thanks for the response. I am putting the details as below:
Primary repmgr.conf Details
Secondary repmgr.conf Details
Thanks for the response. I am putting the details as below:
Primary repmgr.conf Details
Secondary repmgr.conf Details
Failover steps:
We stopped the primary server pg service and repmgrd automatically did the failover to standby and made standby as the new primary.
See the below status after failover
Failback steps;
1. We executed a checkpoint on the new primary( originally standby ).
2. We ran the below node rejoin command with --dry-run
repmgr node rejoin -f /opt/postgresql/15.6/bin/repmgr.conf -d 'host=10.29.97.241 port=5432 user=repmgr dbname=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.local.conf,pg_hba.conf -v --dry-run ///try to check if original_primary is eligible to rejoin
NOTICE: rejoin target is node "d-dba-pg-rnh9" (ID: 2)
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 7360952088605465701
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/9000028
INFO: prerequisites for using pg_rewind are met
INFO: file "postgresql.conf" would be copied to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/postgresql.conf"
WARNING: specified file "/pgresdata101/data/postgresql.local.conf" not found, skipping
INFO: file "pg_hba.conf" would be copied to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/pg_hba.conf"
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met
3. executed node rejoin command
repmgr node rejoin -f /opt/postgresql/15.6/bin/repmgr.conf -d 'host=10.29.97.241 port=5432 user=repmgr dbname=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.local.conf,pg_hba.conf -v
NOTICE: using provided configuration file "/opt/postgresql/15.6/bin/repmgr.conf"
DEBUG: server version number is: 150000
DEBUG: set_config():
SET synchronous_commit TO 'local'
DEBUG: get_primary_node_id():
SELECT node_id FROM repmgr.nodes WHERE type = 'primary' AND active IS TRUE
DEBUG: get_node_record():
SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 2
NOTICE: rejoin target is node "d-dba-pg-rnh9" (ID: 2)
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=10.29.97.241 port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
SET synchronous_commit TO 'local'
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: get_node_record():
SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1
DEBUG: local timeline: 1; rejoin target timeline: 2
DEBUG: get_timeline_history():
TIMELINE_HISTORY 2
DEBUG: local tli: 1; local_xlogpos: 0/9000028; follow_target_history->tli: 1; follow_target_history->end: 0/9000000
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/9000028
DEBUG: guc_set():
SELECT true FROM pg_catalog.pg_settings WHERE name = 'full_page_writes' AND setting = 'off'
DEBUG: guc_set():
SELECT true FROM pg_catalog.pg_settings WHERE name = 'wal_log_hints' AND setting = 'on'
INFO: prerequisites for using pg_rewind are met
DEBUG: using archive directory "/tmp/repmgr-config-archive-d-dba-pg-0ptt"
DEBUG: copying "postgresql.conf" to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/postgresql.conf"
WARNING: specified file "/pgresdata101/data/postgresql.local.conf" not found, skipping
DEBUG: copying "pg_hba.conf" to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/pg_hba.conf"
INFO: 2 files copied to "/tmp/repmgr-config-archive-d-dba-pg-0ptt"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2'"
DEBUG: executing:
/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2' 2>/tmp/repmgr_command.wgVGPS
DEBUG: result of command was 1 (256)
DEBUG: local_command(): output returned was:
pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/802B668
ERROR: pg_rewind execution failed
DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/802B668
We stopped the primary server pg service and repmgrd automatically did the failover to standby and made standby as the new primary.
See the below status after failover
Failback steps;
1. We executed a checkpoint on the new primary( originally standby ).
2. We ran the below node rejoin command with --dry-run
repmgr node rejoin -f /opt/postgresql/15.6/bin/repmgr.conf -d 'host=10.29.97.241 port=5432 user=repmgr dbname=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.local.conf,pg_hba.conf -v --dry-run ///try to check if original_primary is eligible to rejoin
NOTICE: rejoin target is node "d-dba-pg-rnh9" (ID: 2)
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 7360952088605465701
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/9000028
INFO: prerequisites for using pg_rewind are met
INFO: file "postgresql.conf" would be copied to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/postgresql.conf"
WARNING: specified file "/pgresdata101/data/postgresql.local.conf" not found, skipping
INFO: file "pg_hba.conf" would be copied to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/pg_hba.conf"
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met
3. executed node rejoin command
repmgr node rejoin -f /opt/postgresql/15.6/bin/repmgr.conf -d 'host=10.29.97.241 port=5432 user=repmgr dbname=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.local.conf,pg_hba.conf -v
NOTICE: using provided configuration file "/opt/postgresql/15.6/bin/repmgr.conf"
DEBUG: server version number is: 150000
DEBUG: set_config():
SET synchronous_commit TO 'local'
DEBUG: get_primary_node_id():
SELECT node_id FROM repmgr.nodes WHERE type = 'primary' AND active IS TRUE
DEBUG: get_node_record():
SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 2
NOTICE: rejoin target is node "d-dba-pg-rnh9" (ID: 2)
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=10.29.97.241 port=5432 fallback_application_name=repmgr options=-csearch_path="
DEBUG: set_config():
SET synchronous_commit TO 'local'
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: get_node_record():
SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1
DEBUG: local timeline: 1; rejoin target timeline: 2
DEBUG: get_timeline_history():
TIMELINE_HISTORY 2
DEBUG: local tli: 1; local_xlogpos: 0/9000028; follow_target_history->tli: 1; follow_target_history->end: 0/9000000
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/9000028
DEBUG: guc_set():
SELECT true FROM pg_catalog.pg_settings WHERE name = 'full_page_writes' AND setting = 'off'
DEBUG: guc_set():
SELECT true FROM pg_catalog.pg_settings WHERE name = 'wal_log_hints' AND setting = 'on'
INFO: prerequisites for using pg_rewind are met
DEBUG: using archive directory "/tmp/repmgr-config-archive-d-dba-pg-0ptt"
DEBUG: copying "postgresql.conf" to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/postgresql.conf"
WARNING: specified file "/pgresdata101/data/postgresql.local.conf" not found, skipping
DEBUG: copying "pg_hba.conf" to "/tmp/repmgr-config-archive-d-dba-pg-0ptt/pg_hba.conf"
INFO: 2 files copied to "/tmp/repmgr-config-archive-d-dba-pg-0ptt"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2'"
DEBUG: executing:
/opt/postgresql/pg/bin/pg_rewind -D '/pgresdata101/data' --source-server='host=10.29.97.241 port=5432 user=repmgr dbname=repmgr connect_timeout=2' 2>/tmp/repmgr_command.wgVGPS
DEBUG: result of command was 1 (256)
DEBUG: local_command(): output returned was:
pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/802B668
ERROR: pg_rewind execution failed
DETAIL: pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
pg_rewind: error: could not find previous WAL record at 0/802B668
___________________________
DEVESH KUMAR
Database Admin I – India
M: +91 6366843695
Address: Tridib Building Block B 5th Floor
Bagmane Tech Park CV Raman Nagar,
Bengaluru, 560093, IN
www.cmegroup.com
On Mon, Apr 29, 2024 at 3:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
This email is from an external source. Do not click links or open attachments you do not trust. EXERCISE CAUTION.
On Sat, 2024-04-27 at 00:36 +0530, Kumar, Devesh wrote:
> Currently we are working on setting up replication and testing failover scenarios
> and failback. During our testing, failover is getting successful. During Failback,
> when we are reverting the original primary instance as the new standby, we are
> getting pg_rewind errors. Kindly can someone check and let us know.
>
> pg_rewind: servers diverged at WAL location 0/9000000 on timeline 1
> pg_rewind: error: could not open file "/pgresdata101/data/pg_wal/000000010000000000000008": No such file or directory
> pg_rewind: error: could not find previous WAL record at 0/802B668
You should show the exact commands used for failover and failback.
Yours,
Laurenz Albe