Thread: [pg_rewind] cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
Hi, I am getting this below error - after performing pg_rewind when i try to start new slave ( which earlier was my master) against PGv12 Beta1. " cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory 2019-05-27 18:55:47.387 IST [25500] LOG: entering standby mode cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory " Steps to reproduce - ============= 0)mkdir /tmp/archive_dir1 1)Master Setup -> ./initdb -D master , add these parameters in postgresql.conf file - " wal_level = hot_standby wal_log_hints = on max_wal_senders = 2 wal_keep_segments = 64 hot_standby = on archive_mode=on archive_command='cp %p /tmp//archive_dir1/%f' port=5432 " Start the server (./pg_ctl -D master start) Connect to psql terminal - create table/ insert few rows 2)Slave Setup -> ./pg_basebackup -PR -X stream -c fast -h 127.0.0.1 -U centos -p 5432 -D slave add these parameters in postgresql.conf file - " primary_conninfo = 'user=centos host=127.0.0.1 port=5432' promote_trigger_file = '/tmp/s1.txt' restore_command='cp %p /tmp/archive_dir1/%f' port=5433 " Start Slave (./pg_ctl -D slave start) 3)Touch trigger file (touch /tmp/s1.txt) -> - standby.signal is gone from standby directory and now able to insert rows on standby server. 4)stop master ( ./pg_ctl -D master stop) 5)Perform pg_rewind [centos@mail-arts bin]$ ./pg_rewind -D master/ --source-server="host=localhost port=5433 user=centos password=edb dbname=postgres" pg_rewind: servers diverged at WAL location 0/3003538 on timeline 1 pg_rewind: rewinding from last common checkpoint at 0/2000060 on timeline 1 pg_rewind: Done! 6)Create standby.signal file on master directory ( touch standby.signal) 7)Modify old master/postgresql.conf file - primary_conninfo = 'user=centos host=127.0.0.1 port=5433' promote_trigger_file = '/tmp/s1.txt' restore_command='cp %p /tmp/archive_dir1/%f' port=5432 8)Try to start the new slave/old master - [centos@mail-arts bin]$ ./pg_ctl -D m1/ start waiting for server to start....2019-05-27 18:55:47.237 IST [25499] LOG: starting PostgreSQL 12beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit 2019-05-27 18:55:47.237 IST [25499] LOG: listening on IPv6 address "::1", port 5432 2019-05-27 18:55:47.237 IST [25499] LOG: listening on IPv4 address "127.0.0.1", port 5432 2019-05-27 18:55:47.239 IST [25499] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2019-05-27 18:55:47.259 IST [25500] LOG: database system was interrupted while in recovery at log time 2019-05-27 18:53:45 IST 2019-05-27 18:55:47.259 IST [25500] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory 2019-05-27 18:55:47.387 IST [25500] LOG: entering standby mode cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory 2019-05-27 18:55:47.402 IST [25500] LOG: redo starts at 0/2000028 cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory 2019-05-27 18:55:47.410 IST [25500] LOG: invalid record length at 0/301E740: wanted 24, got 0 2019-05-27 18:55:47.413 IST [25509] FATAL: the database system is starting up 2019-05-27 18:55:47.413 IST [25508] FATAL: could not connect to the primary server: FATAL: the database system is starting up cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory 2019-05-27 18:55:47.424 IST [25513] FATAL: the database system is starting up 2019-05-27 18:55:47.425 IST [25512] FATAL: could not connect to the primary server: FATAL: the database system is starting up cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory .....cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory Is there anything i need to change/add to make it work ? Thanks. -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
Re: [pg_rewind] cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
From
tushar
Date:
Hi, Is anyone able to reproduce this one ? Any pointer to solve this would be helpful. regards, On 05/27/2019 07:27 PM, tushar wrote: > Hi, > > I am getting this below error - after performing pg_rewind when i try > to start new slave ( which earlier was my master) against PGv12 Beta1. > " > cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory > 2019-05-27 18:55:47.387 IST [25500] LOG: entering standby mode > cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory > cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory > " > > Steps to reproduce - > ============= > 0)mkdir /tmp/archive_dir1 > 1)Master Setup -> ./initdb -D master , add these parameters in > postgresql.conf file - > " > wal_level = hot_standby > wal_log_hints = on > max_wal_senders = 2 > wal_keep_segments = 64 > hot_standby = on > archive_mode=on > archive_command='cp %p /tmp//archive_dir1/%f' > port=5432 > " > Start the server (./pg_ctl -D master start) > Connect to psql terminal - create table/ insert few rows > > 2)Slave Setup -> ./pg_basebackup -PR -X stream -c fast -h 127.0.0.1 > -U centos -p 5432 -D slave > > add these parameters in postgresql.conf file - > " > primary_conninfo = 'user=centos host=127.0.0.1 port=5432' > promote_trigger_file = '/tmp/s1.txt' > restore_command='cp %p /tmp/archive_dir1/%f' > port=5433 > " > Start Slave (./pg_ctl -D slave start) > > 3)Touch trigger file (touch /tmp/s1.txt) -> - standby.signal is gone > from standby directory and now able to insert rows on standby server. > 4)stop master ( ./pg_ctl -D master stop) > 5)Perform pg_rewind > [centos@mail-arts bin]$ ./pg_rewind -D master/ > --source-server="host=localhost port=5433 user=centos password=edb > dbname=postgres" > pg_rewind: servers diverged at WAL location 0/3003538 on timeline 1 > pg_rewind: rewinding from last common checkpoint at 0/2000060 on > timeline 1 > > pg_rewind: Done! > > 6)Create standby.signal file on master directory ( touch standby.signal) > > 7)Modify old master/postgresql.conf file - > primary_conninfo = 'user=centos host=127.0.0.1 port=5433' > promote_trigger_file = '/tmp/s1.txt' > restore_command='cp %p /tmp/archive_dir1/%f' > port=5432 > > 8)Try to start the new slave/old master - > > [centos@mail-arts bin]$ ./pg_ctl -D m1/ start > waiting for server to start....2019-05-27 18:55:47.237 IST [25499] > LOG: starting PostgreSQL 12beta1 on x86_64-pc-linux-gnu, compiled by > gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit > 2019-05-27 18:55:47.237 IST [25499] LOG: listening on IPv6 address > "::1", port 5432 > 2019-05-27 18:55:47.237 IST [25499] LOG: listening on IPv4 address > "127.0.0.1", port 5432 > 2019-05-27 18:55:47.239 IST [25499] LOG: listening on Unix socket > "/tmp/.s.PGSQL.5432" > 2019-05-27 18:55:47.259 IST [25500] LOG: database system was > interrupted while in recovery at log time 2019-05-27 18:53:45 IST > 2019-05-27 18:55:47.259 IST [25500] HINT: If this has occurred more > than once some data might be corrupted and you might need to choose an > earlier recovery target. > cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory > 2019-05-27 18:55:47.387 IST [25500] LOG: entering standby mode > cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory > cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory > cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory > 2019-05-27 18:55:47.402 IST [25500] LOG: redo starts at 0/2000028 > cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory > 2019-05-27 18:55:47.410 IST [25500] LOG: invalid record length at > 0/301E740: wanted 24, got 0 > 2019-05-27 18:55:47.413 IST [25509] FATAL: the database system is > starting up > 2019-05-27 18:55:47.413 IST [25508] FATAL: could not connect to the > primary server: FATAL: the database system is starting up > cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory > cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory > 2019-05-27 18:55:47.424 IST [25513] FATAL: the database system is > starting up > 2019-05-27 18:55:47.425 IST [25512] FATAL: could not connect to the > primary server: FATAL: the database system is starting up > cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory > .....cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory > > Is there anything i need to change/add to make it work ? > > Thanks. > -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
Re: [pg_rewind] cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
From
Kuntal Ghosh
Date:
Hello, On Wed, Jun 5, 2019 at 11:55 AM tushar <tushar.ahuja@enterprisedb.com> wrote: I can see two different problems in this setup. > > 2)Slave Setup -> ./pg_basebackup -PR -X stream -c fast -h 127.0.0.1 > > -U centos -p 5432 -D slave > > restore_command='cp %p /tmp/archive_dir1/%f' > > " > > 7)Modify old master/postgresql.conf file - > > restore_command='cp %p /tmp/archive_dir1/%f' When we define a restore command, we tell the server to copy a file a WAL file from the archive. So, it should be restore_command='cp tmp/archive_dir1/%f %p' This is the reason you're getting this following error. > > cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory > > cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory > > 2019-05-27 18:55:47.424 IST [25513] FATAL: the database system is > > starting up > > 2019-05-27 18:55:47.425 IST [25512] FATAL: could not connect to the > > primary server: FATAL: the database system is starting up This case looks interesting. 1. Master is running on port 5432. 2. A standby is created using basebackup with -R option. So, the pg_basebackup appends the primary connection settings to postgresql.auto.conf so that the streaming replication can use the same settings later on. cat postgresql.auto.conf -> primary_conninfo = 'port=5432' 3. The standby is started in port 5433. 4. Standby is promoted and old master is stopped. 5. Using pg_rewind, the old master is synchronized with the promoted standby. As part of the process, it has copied the postgresql.auto.conf of promoted standby in the old master. 6. Now, the old master is configured as a standby but the postgresql.auto.conf still contains the following settings: cat postgresql.auto.conf -> primary_conninfo = 'port=5432' So, the old master tries to connect to the server on port 5432 and finds itself which is still in recovery. This can surely be fixed from the script. While configuring the old master as a standby server, clear/modify the settings in postgresql.auto.conf. But, it contradicts with the comment in the file which forbids the user from editing the file. Any thoughts? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Re: [pg_rewind] cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
From
Robert Haas
Date:
On Mon, Jun 10, 2019 at 7:08 AM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > This can surely be fixed from the script. While configuring the old > master as a standby server, clear/modify the settings in > postgresql.auto.conf. But, it contradicts with the comment in the file > which forbids the user from editing the file. The user isn't really forbidden from editing the file. They can do so safely when the server is down. This whole thing looks like a nonissue to me. If you set it up wrong, it won't work. So don't do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [pg_rewind] cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
From
Kuntal Ghosh
Date:
On Mon, Jun 10, 2019 at 7:19 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jun 10, 2019 at 7:08 AM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > This can surely be fixed from the script. While configuring the old > > master as a standby server, clear/modify the settings in > > postgresql.auto.conf. But, it contradicts with the comment in the file > > which forbids the user from editing the file. > > The user isn't really forbidden from editing the file. They can do so > safely when the server is down. > > This whole thing looks like a nonissue to me. If you set it up wrong, > it won't work. So don't do that. > Yeah. Sounds fair. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Re: [pg_rewind] cp: cannot stat ‘pg_wal/RECOVERYHISTORY’ No suchfile or directory
From
Michael Paquier
Date:
On Mon, Jun 10, 2019 at 07:56:12PM +0530, Kuntal Ghosh wrote: > On Mon, Jun 10, 2019 at 7:19 PM Robert Haas <robertmhaas@gmail.com> wrote: >> This whole thing looks like a nonissue to me. If you set it up wrong, >> it won't work. So don't do that. +1. -- Michael
Attachment
Re: [pg_rewind] cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
From
tushar
Date:
On 06/10/2019 04:37 PM, Kuntal Ghosh wrote: > When we define a restore command, we tell the server to copy a file a > WAL file from the archive. So, it should be > restore_command='cp tmp/archive_dir1/%f %p' > > This is the reason you're getting this following error. Ohh. Mea Culpa. Thanks for pointing out. -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company