BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done. - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done. |
Date | |
Msg-id | 18575-1b9b3b60e2a480de@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18575 Logged by: Georgy Silkly Email address: g.shelkovy@arenadata.io PostgreSQL version: 16.3 Operating system: docker alpine Description: Sometimes pg_rewind mistakenly assumes that nothing needs to be done, which results in the replica having data that is not on the master. Steps to reproduce script ```bash #!/bin/sh -eux echo "Goto script's directory" cd "$(dirname "$0")" echo "Stop master if exists" test -f master/postmaster.pid && pg_ctl -w -D master stop -m fast || echo $? echo "Remove master directory" rm -rf master echo "Create master directory" mkdir -p master echo "Initialize master, data checksums are mandatory" pg_ctl initdb -D master -o "--data-checksums -N -A trust --wal-segsize 1" echo "Debug to see recycling WAL, keep only 5 WAL files (because segsize 1MB)" cat >>master/postgresql.conf <<EOF hot_standby = on logging_collector = on log_min_messages = debug wal_keep_size = 5MB EOF echo "Accept replication connections on master" cat >> master/pg_hba.conf <<EOF local replication all trust host replication all 127.0.0.1/32 trust host replication all ::1/128 trust EOF echo "Start master" pg_ctl -w -D master start -o "-p 5000" echo "Stop standby1 if exists" test -f standby1/postmaster.pid && pg_ctl -w -D standby1 stop -m fast || echo $? echo "Remove standby1 directory" rm -rf standby1 echo "Base backup is taken with xlog files included" pg_basebackup -D standby1 -p 5000 -X fetch --verbose --write-recovery-conf echo "Start standby1" pg_ctl -w -D standby1 start -o "-p 5001" echo "Stop standby2 if exists" test -f standby2/postmaster.pid && pg_ctl -w -D standby2 stop -m fast || echo $? echo "Remove standby2 directory" rm -rf standby2 echo "Base backup is taken with xlog files included" pg_basebackup -D standby2 -p 5000 -X fetch --verbose --write-recovery-conf echo "Start standby2" pg_ctl -w -D standby2 start -o "-p 5002" echo "Create table and insert lot of rows" psql -d postgres -a --no-psqlrc -p 5000 <<EOF create table a (a int, b int); insert into a select i, i from generate_series(1, 100000) i; EOF echo "Stop master to recycle WAL files" pg_ctl -w -D master stop -m fast echo "Stop standby1 to recycle WAL files" pg_ctl -w -D standby1 stop -m fast echo "Stop standby2 to recycle WAL files" pg_ctl -w -D standby2 stop -m fast echo "Start master" pg_ctl -w -D master start -o "-p 5000" echo "Start standby1" pg_ctl -w -D standby1 start -o "-p 5001" echo "Start standby2" pg_ctl -w -D standby2 start -o "-p 5002" echo "Now promote standby1" pg_ctl -w -D standby1 promote echo "Wait until standby1 is promoted" while ! pg_isready -p 5001; do sleep 1; done echo "Insert some rows" psql -d postgres -a --no-psqlrc -p 5000 <<EOF insert into a select i, i from generate_series(1, 340) i; EOF echo "Standby2 node need to be stopped as well." pg_ctl -w -D standby2 stop -m fast echo "Stop the master after standby promotion" pg_ctl -w -D master stop -m fast echo "Standby1 node need to be stopped as well." pg_ctl -w -D standby1 stop -m fast echo "Do rewind using standby1 pgdata as source" pg_rewind --progress --debug --source-pgdata=standby1 --target-pgdata=standby2 echo "Parameters for standby2 postgresql.conf" cat >> standby2/postgresql.conf <<EOF primary_conninfo = 'port=5001' EOF echo "Place standby2 signal file" touch standby2/standby.signal echo "Start standby1" pg_ctl -w -D standby1 start -o "-p 5001" echo "Start standby2" pg_ctl -w -D standby2 start -o "-p 5002" echo "Same query" psql -d postgres -a --no-psqlrc -p 5001 <<EOF select count(*) from a; EOF echo "Different results" psql -d postgres -a --no-psqlrc -p 5002 <<EOF select count(*) from a; EOF echo "Stop standby1" pg_ctl -w -D standby1 stop -m fast echo "Stop standby2" pg_ctl -w -D standby2 stop -m fast ``` output ```bash + echo 'Goto script'"'"'s directory' Goto script's directory + dirname /var/lib/postgresql/r.sh + cd /var/lib/postgresql + echo 'Stop master if exists' Stop master if exists + test -f master/postmaster.pid + echo 1 1 + echo 'Remove master directory' Remove master directory + rm -rf master + echo 'Create master directory' Create master directory + mkdir -p master + echo 'Initialize master, data checksums are mandatory' Initialize master, data checksums are mandatory + pg_ctl initdb -D master -o '--data-checksums -N -A trust --wal-segsize 1' The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "ru_RU.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "russian". Data page checksums are enabled. fixing permissions on existing directory master ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... Asia/Yekaterinburg creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok Sync to disk skipped. The data directory might become corrupt if the operating system crashes. Success. You can now start the database server using: /usr/local/bin/pg_ctl -D master -l logfile start + echo 'Debug to see recycling WAL, keep only 5 WAL files (because segsize 1MB)' Debug to see recycling WAL, keep only 5 WAL files (because segsize 1MB) + cat + echo 'Accept replication connections on master' Accept replication connections on master + cat + echo 'Start master' Start master + pg_ctl -w -D master start -o '-p 5000' waiting for server to start....2024-08-07 17:14:59.431 +05 [19618] DEBUG: registering background worker "logical replication launcher" 2024-08-07 17:14:59.431 +05 [19618] DEBUG: mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory 2024-08-07 17:14:59.434 +05 [19618] DEBUG: dynamic shared memory system will support 674 segments 2024-08-07 17:14:59.434 +05 [19618] DEBUG: created dynamic shared memory control segment 1581037720 (26976 bytes) 2024-08-07 17:14:59.434 +05 [19618] DEBUG: max_safe_fds = 986, usable_fds = 1000, already_open = 4 2024-08-07 17:14:59.434 +05 [19618] LOG: redirecting log output to logging collector process 2024-08-07 17:14:59.434 +05 [19618] HINT: Future log output will appear in directory "log". done server started + echo 'Stop standby1 if exists' Stop standby1 if exists + test -f standby1/postmaster.pid + echo 1 1 + echo 'Remove standby1 directory' Remove standby1 directory + rm -rf standby1 + echo 'Base backup is taken with xlog files included' Base backup is taken with xlog files included + pg_basebackup -D standby1 -p 5000 -X fetch --verbose --write-recovery-conf pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/800028 on timeline 1 pg_basebackup: write-ahead log end point: 0/800100 pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed + echo 'Start standby1' Start standby1 + pg_ctl -w -D standby1 start -o '-p 5001' waiting for server to start....2024-08-07 17:15:00.282 +05 [19631] DEBUG: registering background worker "logical replication launcher" 2024-08-07 17:15:00.282 +05 [19631] DEBUG: mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory 2024-08-07 17:15:00.286 +05 [19631] DEBUG: dynamic shared memory system will support 674 segments 2024-08-07 17:15:00.286 +05 [19631] DEBUG: created dynamic shared memory control segment 1807324640 (26976 bytes) 2024-08-07 17:15:00.286 +05 [19631] DEBUG: max_safe_fds = 986, usable_fds = 1000, already_open = 4 2024-08-07 17:15:00.286 +05 [19631] LOG: redirecting log output to logging collector process 2024-08-07 17:15:00.286 +05 [19631] HINT: Future log output will appear in directory "log". done server started + echo 'Stop standby2 if exists' Stop standby2 if exists + test -f standby2/postmaster.pid + echo 1 1 + echo 'Remove standby2 directory' Remove standby2 directory + rm -rf standby2 + echo 'Base backup is taken with xlog files included' Base backup is taken with xlog files included + pg_basebackup -D standby2 -p 5000 -X fetch --verbose --write-recovery-conf pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/900028 on timeline 1 pg_basebackup: write-ahead log end point: 0/900100 pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed + echo 'Start standby2' Start standby2 + pg_ctl -w -D standby2 start -o '-p 5002' waiting for server to start....2024-08-07 17:15:01.123 +05 [19643] DEBUG: registering background worker "logical replication launcher" 2024-08-07 17:15:01.123 +05 [19643] DEBUG: mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory 2024-08-07 17:15:01.126 +05 [19643] DEBUG: dynamic shared memory system will support 674 segments 2024-08-07 17:15:01.126 +05 [19643] DEBUG: created dynamic shared memory control segment 2259555868 (26976 bytes) 2024-08-07 17:15:01.126 +05 [19643] DEBUG: max_safe_fds = 986, usable_fds = 1000, already_open = 4 2024-08-07 17:15:01.126 +05 [19643] LOG: redirecting log output to logging collector process 2024-08-07 17:15:01.126 +05 [19643] HINT: Future log output will appear in directory "log". done server started + echo 'Create table and insert lot of rows' Create table and insert lot of rows + psql -d postgres -a --no-psqlrc -p 5000 create table a (a int, b int); CREATE TABLE insert into a select i, i from generate_series(1, 100000) i; INSERT 0 100000 + echo 'Stop master to recycle WAL files' Stop master to recycle WAL files + pg_ctl -w -D master stop -m fast waiting for server to shut down....2024-08-07 17:15:01.650 +05 [19619] DEBUG: logger shutting down done server stopped + echo 'Stop standby1 to recycle WAL files' Stop standby1 to recycle WAL files + pg_ctl -w -D standby1 stop -m fast waiting for server to shut down....2024-08-07 17:15:01.731 +05 [19632] DEBUG: logger shutting down done server stopped + echo 'Stop standby2 to recycle WAL files' Stop standby2 to recycle WAL files + pg_ctl -w -D standby2 stop -m fast waiting for server to shut down....2024-08-07 17:15:01.831 +05 [19644] DEBUG: logger shutting down done server stopped + echo 'Start master' Start master + pg_ctl -w -D master start -o '-p 5000' waiting for server to start....2024-08-07 17:15:01.872 +05 [19660] DEBUG: registering background worker "logical replication launcher" 2024-08-07 17:15:01.872 +05 [19660] DEBUG: mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory 2024-08-07 17:15:01.876 +05 [19660] DEBUG: dynamic shared memory system will support 674 segments 2024-08-07 17:15:01.876 +05 [19660] DEBUG: created dynamic shared memory control segment 2766949016 (26976 bytes) 2024-08-07 17:15:01.876 +05 [19660] DEBUG: max_safe_fds = 986, usable_fds = 1000, already_open = 4 2024-08-07 17:15:01.876 +05 [19660] LOG: redirecting log output to logging collector process 2024-08-07 17:15:01.876 +05 [19660] HINT: Future log output will appear in directory "log". done server started + echo 'Start standby1' Start standby1 + pg_ctl -w -D standby1 start -o '-p 5001' waiting for server to start....2024-08-07 17:15:01.979 +05 [19670] DEBUG: registering background worker "logical replication launcher" 2024-08-07 17:15:01.979 +05 [19670] DEBUG: mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory 2024-08-07 17:15:01.982 +05 [19670] DEBUG: dynamic shared memory system will support 674 segments 2024-08-07 17:15:01.982 +05 [19670] DEBUG: created dynamic shared memory control segment 1083876086 (26976 bytes) 2024-08-07 17:15:01.982 +05 [19670] DEBUG: max_safe_fds = 986, usable_fds = 1000, already_open = 4 2024-08-07 17:15:01.983 +05 [19670] LOG: redirecting log output to logging collector process 2024-08-07 17:15:01.983 +05 [19670] HINT: Future log output will appear in directory "log". done server started + echo 'Start standby2' Start standby2 + pg_ctl -w -D standby2 start -o '-p 5002' waiting for server to start....2024-08-07 17:15:02.085 +05 [19679] DEBUG: registering background worker "logical replication launcher" 2024-08-07 17:15:02.085 +05 [19679] DEBUG: mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory 2024-08-07 17:15:02.088 +05 [19679] DEBUG: dynamic shared memory system will support 674 segments 2024-08-07 17:15:02.088 +05 [19679] DEBUG: created dynamic shared memory control segment 4052049392 (26976 bytes) 2024-08-07 17:15:02.088 +05 [19679] DEBUG: max_safe_fds = 986, usable_fds = 1000, already_open = 4 2024-08-07 17:15:02.089 +05 [19679] LOG: redirecting log output to logging collector process 2024-08-07 17:15:02.089 +05 [19679] HINT: Future log output will appear in directory "log". done server started + echo 'Now promote standby1' Now promote standby1 + pg_ctl -w -D standby1 promote waiting for server to promote.... done server promoted + echo 'Wait until standby1 is promoted' Wait until standby1 is promoted + pg_isready -p 5001 /tmp:5001 - accepting connections + echo 'Insert some rows' Insert some rows + psql -d postgres -a --no-psqlrc -p 5000 insert into a select i, i from generate_series(1, 340) i; INSERT 0 340 + echo 'Standby2 node need to be stopped as well.' Standby2 node need to be stopped as well. + pg_ctl -w -D standby2 stop -m fast waiting for server to shut down....2024-08-07 17:15:02.411 +05 [19680] DEBUG: logger shutting down done server stopped + echo 'Stop the master after standby promotion' Stop the master after standby promotion + pg_ctl -w -D master stop -m fast waiting for server to shut down....2024-08-07 17:15:02.517 +05 [19661] DEBUG: logger shutting down done server stopped + echo 'Standby1 node need to be stopped as well.' Standby1 node need to be stopped as well. + pg_ctl -w -D standby1 stop -m fast waiting for server to shut down....2024-08-07 17:15:02.617 +05 [19671] DEBUG: logger shutting down done server stopped + echo 'Do rewind using standby1 pgdata as source' Do rewind using standby1 pgdata as source + pg_rewind --progress --debug '--source-pgdata=standby1' '--target-pgdata=standby2' pg_rewind: Source timeline history: pg_rewind: Target timeline history: pg_rewind: 1: 0/0 - 0/0 pg_rewind: servers diverged at WAL location 0/10392E8 on timeline 1 pg_rewind: no rewind required + echo 'Parameters for standby2 postgresql.conf' Parameters for standby2 postgresql.conf + cat + echo 'Place standby2 signal file' Place standby2 signal file + touch standby2/standby.signal + echo 'Start standby1' Start standby1 + pg_ctl -w -D standby1 start -o '-p 5001' waiting for server to start....2024-08-07 17:15:02.723 +05 [19702] DEBUG: registering background worker "logical replication launcher" 2024-08-07 17:15:02.723 +05 [19702] DEBUG: mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory 2024-08-07 17:15:02.726 +05 [19702] DEBUG: dynamic shared memory system will support 674 segments 2024-08-07 17:15:02.726 +05 [19702] DEBUG: created dynamic shared memory control segment 3627107606 (26976 bytes) 2024-08-07 17:15:02.726 +05 [19702] DEBUG: max_safe_fds = 986, usable_fds = 1000, already_open = 4 2024-08-07 17:15:02.726 +05 [19702] LOG: redirecting log output to logging collector process 2024-08-07 17:15:02.726 +05 [19702] HINT: Future log output will appear in directory "log". done server started + echo 'Start standby2' Start standby2 + pg_ctl -w -D standby2 start -o '-p 5002' waiting for server to start....2024-08-07 17:15:02.830 +05 [19712] DEBUG: registering background worker "logical replication launcher" 2024-08-07 17:15:02.830 +05 [19712] DEBUG: mmap(146800640) with MAP_HUGETLB failed, huge pages disabled: Out of memory 2024-08-07 17:15:02.833 +05 [19712] DEBUG: dynamic shared memory system will support 674 segments 2024-08-07 17:15:02.833 +05 [19712] DEBUG: created dynamic shared memory control segment 2291233354 (26976 bytes) 2024-08-07 17:15:02.833 +05 [19712] DEBUG: max_safe_fds = 986, usable_fds = 1000, already_open = 4 2024-08-07 17:15:02.833 +05 [19712] LOG: redirecting log output to logging collector process 2024-08-07 17:15:02.833 +05 [19712] HINT: Future log output will appear in directory "log". done server started + echo 'Same query' Same query + psql -d postgres -a --no-psqlrc -p 5001 select count(*) from a; count -------- 100000 (1 row) + echo 'Different results' Different results + psql -d postgres -a --no-psqlrc -p 5002 select count(*) from a; count -------- 100340 (1 row) + echo 'Stop standby1' Stop standby1 + pg_ctl -w -D standby1 stop -m fast waiting for server to shut down....2024-08-07 17:15:03.077 +05 [19703] DEBUG: logger shutting down done server stopped + echo 'Stop standby2' Stop standby2 + pg_ctl -w -D standby2 stop -m fast waiting for server to shut down....2024-08-07 17:15:03.117 +05 [19713] DEBUG: logger shutting down done server stopped ```
pgsql-bugs by date: