Thread: [pg_rewind] cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory

Hi,

I am getting this below error - after performing pg_rewind when i try to 
start new slave ( which earlier was my master) against PGv12 Beta1.
"
cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
2019-05-27 18:55:47.387 IST [25500] LOG:  entering standby mode
cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
"

Steps to reproduce -
=============
0)mkdir /tmp/archive_dir1
1)Master Setup -> ./initdb -D master  , add these parameters in 
postgresql.conf file -
"
wal_level = hot_standby
wal_log_hints = on
max_wal_senders = 2
wal_keep_segments = 64
hot_standby = on
archive_mode=on
archive_command='cp %p /tmp//archive_dir1/%f'
port=5432
"
Start the server  (./pg_ctl -D master start)
Connect to psql terminal - create table/ insert few rows

2)Slave Setup  ->  ./pg_basebackup -PR -X stream -c fast -h 127.0.0.1 -U 
centos -p 5432 -D slave

add these parameters in postgresql.conf file -
"
primary_conninfo = 'user=centos host=127.0.0.1 port=5432'
promote_trigger_file = '/tmp/s1.txt'
restore_command='cp %p /tmp/archive_dir1/%f'
port=5433
"
Start Slave  (./pg_ctl -D slave start)

3)Touch trigger file (touch /tmp/s1.txt)  -> - standby.signal is gone 
from standby directory and now able to insert rows on standby server.
4)stop master ( ./pg_ctl -D master stop)
5)Perform pg_rewind
[centos@mail-arts bin]$ ./pg_rewind -D master/ 
--source-server="host=localhost port=5433 user=centos password=edb 
dbname=postgres"
pg_rewind: servers diverged at WAL location 0/3003538 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/2000060 on timeline 1

pg_rewind: Done!

6)Create standby.signal file on master directory ( touch standby.signal)

7)Modify old master/postgresql.conf file -
primary_conninfo = 'user=centos host=127.0.0.1 port=5433'
promote_trigger_file = '/tmp/s1.txt'
restore_command='cp %p /tmp/archive_dir1/%f'
port=5432

8)Try to start the new slave/old master -

[centos@mail-arts bin]$ ./pg_ctl -D m1/ start
waiting for server to start....2019-05-27 18:55:47.237 IST [25499] LOG:  
starting PostgreSQL 12beta1 on x86_64-pc-linux-gnu, compiled by gcc 
(GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
2019-05-27 18:55:47.237 IST [25499] LOG:  listening on IPv6 address 
"::1", port 5432
2019-05-27 18:55:47.237 IST [25499] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2019-05-27 18:55:47.239 IST [25499] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2019-05-27 18:55:47.259 IST [25500] LOG:  database system was 
interrupted while in recovery at log time 2019-05-27 18:53:45 IST
2019-05-27 18:55:47.259 IST [25500] HINT:  If this has occurred more 
than once some data might be corrupted and you might need to choose an 
earlier recovery target.
cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
2019-05-27 18:55:47.387 IST [25500] LOG:  entering standby mode
cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
2019-05-27 18:55:47.402 IST [25500] LOG:  redo starts at 0/2000028
cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
2019-05-27 18:55:47.410 IST [25500] LOG:  invalid record length at 
0/301E740: wanted 24, got 0
2019-05-27 18:55:47.413 IST [25509] FATAL:  the database system is 
starting up
2019-05-27 18:55:47.413 IST [25508] FATAL:  could not connect to the 
primary server: FATAL:  the database system is starting up
cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
2019-05-27 18:55:47.424 IST [25513] FATAL:  the database system is 
starting up
2019-05-27 18:55:47.425 IST [25512] FATAL:  could not connect to the 
primary server: FATAL:  the database system is starting up
cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
.....cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory

Is there anything i need to change/add  to make it work ?

Thanks.

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Hi,

Is anyone able to reproduce this one ?
Any pointer to solve this would be helpful.

regards,

On 05/27/2019 07:27 PM, tushar wrote:
> Hi,
>
> I am getting this below error - after performing pg_rewind when i try 
> to start new slave ( which earlier was my master) against PGv12 Beta1.
> "
> cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
> 2019-05-27 18:55:47.387 IST [25500] LOG:  entering standby mode
> cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
> cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
> "
>
> Steps to reproduce -
> =============
> 0)mkdir /tmp/archive_dir1
> 1)Master Setup -> ./initdb -D master  , add these parameters in 
> postgresql.conf file -
> "
> wal_level = hot_standby
> wal_log_hints = on
> max_wal_senders = 2
> wal_keep_segments = 64
> hot_standby = on
> archive_mode=on
> archive_command='cp %p /tmp//archive_dir1/%f'
> port=5432
> "
> Start the server  (./pg_ctl -D master start)
> Connect to psql terminal - create table/ insert few rows
>
> 2)Slave Setup  ->  ./pg_basebackup -PR -X stream -c fast -h 127.0.0.1 
> -U centos -p 5432 -D slave
>
> add these parameters in postgresql.conf file -
> "
> primary_conninfo = 'user=centos host=127.0.0.1 port=5432'
> promote_trigger_file = '/tmp/s1.txt'
> restore_command='cp %p /tmp/archive_dir1/%f'
> port=5433
> "
> Start Slave  (./pg_ctl -D slave start)
>
> 3)Touch trigger file (touch /tmp/s1.txt)  -> - standby.signal is gone 
> from standby directory and now able to insert rows on standby server.
> 4)stop master ( ./pg_ctl -D master stop)
> 5)Perform pg_rewind
> [centos@mail-arts bin]$ ./pg_rewind -D master/ 
> --source-server="host=localhost port=5433 user=centos password=edb 
> dbname=postgres"
> pg_rewind: servers diverged at WAL location 0/3003538 on timeline 1
> pg_rewind: rewinding from last common checkpoint at 0/2000060 on 
> timeline 1
>
> pg_rewind: Done!
>
> 6)Create standby.signal file on master directory ( touch standby.signal)
>
> 7)Modify old master/postgresql.conf file -
> primary_conninfo = 'user=centos host=127.0.0.1 port=5433'
> promote_trigger_file = '/tmp/s1.txt'
> restore_command='cp %p /tmp/archive_dir1/%f'
> port=5432
>
> 8)Try to start the new slave/old master -
>
> [centos@mail-arts bin]$ ./pg_ctl -D m1/ start
> waiting for server to start....2019-05-27 18:55:47.237 IST [25499] 
> LOG:  starting PostgreSQL 12beta1 on x86_64-pc-linux-gnu, compiled by 
> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
> 2019-05-27 18:55:47.237 IST [25499] LOG:  listening on IPv6 address 
> "::1", port 5432
> 2019-05-27 18:55:47.237 IST [25499] LOG:  listening on IPv4 address 
> "127.0.0.1", port 5432
> 2019-05-27 18:55:47.239 IST [25499] LOG:  listening on Unix socket 
> "/tmp/.s.PGSQL.5432"
> 2019-05-27 18:55:47.259 IST [25500] LOG:  database system was 
> interrupted while in recovery at log time 2019-05-27 18:53:45 IST
> 2019-05-27 18:55:47.259 IST [25500] HINT:  If this has occurred more 
> than once some data might be corrupted and you might need to choose an 
> earlier recovery target.
> cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
> 2019-05-27 18:55:47.387 IST [25500] LOG:  entering standby mode
> cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
> cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
> cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
> 2019-05-27 18:55:47.402 IST [25500] LOG:  redo starts at 0/2000028
> cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
> 2019-05-27 18:55:47.410 IST [25500] LOG:  invalid record length at 
> 0/301E740: wanted 24, got 0
> 2019-05-27 18:55:47.413 IST [25509] FATAL:  the database system is 
> starting up
> 2019-05-27 18:55:47.413 IST [25508] FATAL:  could not connect to the 
> primary server: FATAL:  the database system is starting up
> cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
> cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
> 2019-05-27 18:55:47.424 IST [25513] FATAL:  the database system is 
> starting up
> 2019-05-27 18:55:47.425 IST [25512] FATAL:  could not connect to the 
> primary server: FATAL:  the database system is starting up
> cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
> .....cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory
>
> Is there anything i need to change/add  to make it work ?
>
> Thanks.
>

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Hello,

On Wed, Jun 5, 2019 at 11:55 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
I can see two different problems in this setup.

> > 2)Slave Setup  ->  ./pg_basebackup -PR -X stream -c fast -h 127.0.0.1
> > -U centos -p 5432 -D slave
> > restore_command='cp %p /tmp/archive_dir1/%f'
> > "
> > 7)Modify old master/postgresql.conf file -
> > restore_command='cp %p /tmp/archive_dir1/%f'
When we define a restore command, we tell the server to copy a file a
WAL file from the archive. So, it should be
restore_command='cp tmp/archive_dir1/%f %p'

This is the reason you're getting this following error.
> > cp: cannot stat ‘pg_wal/RECOVERYHISTORY’: No such file or directory
> > cp: cannot stat ‘pg_wal/RECOVERYXLOG’: No such file or directory


> > 2019-05-27 18:55:47.424 IST [25513] FATAL:  the database system is
> > starting up
> > 2019-05-27 18:55:47.425 IST [25512] FATAL:  could not connect to the
> > primary server: FATAL:  the database system is starting up
This case looks interesting.

1. Master is running on port 5432.
2. A standby is created using basebackup with -R option. So, the
pg_basebackup appends the primary connection settings to
postgresql.auto.conf so that the streaming replication can use the
same settings later on.
cat postgresql.auto.conf -> primary_conninfo = 'port=5432'
3. The standby is started in port 5433.
4. Standby is promoted and old master is stopped.
5. Using pg_rewind, the old master is synchronized with the promoted
standby. As part of the process, it has copied the
postgresql.auto.conf of promoted standby in the old master.
6. Now, the old master is configured as a standby but the
postgresql.auto.conf still contains the following settings:
cat postgresql.auto.conf -> primary_conninfo = 'port=5432'
So, the old master tries to connect to the server on port 5432 and
finds itself which is still in recovery.

This can surely be fixed from the script. While configuring the old
master as a standby server, clear/modify the settings in
postgresql.auto.conf. But, it contradicts with the comment in the file
which forbids the user from editing the file.

Any thoughts?
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



On Mon, Jun 10, 2019 at 7:08 AM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
> This can surely be fixed from the script. While configuring the old
> master as a standby server, clear/modify the settings in
> postgresql.auto.conf. But, it contradicts with the comment in the file
> which forbids the user from editing the file.

The user isn't really forbidden from editing the file.  They can do so
safely when the server is down.

This whole thing looks like a nonissue to me.  If you set it up wrong,
it won't work.  So don't do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Mon, Jun 10, 2019 at 7:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jun 10, 2019 at 7:08 AM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
> > This can surely be fixed from the script. While configuring the old
> > master as a standby server, clear/modify the settings in
> > postgresql.auto.conf. But, it contradicts with the comment in the file
> > which forbids the user from editing the file.
>
> The user isn't really forbidden from editing the file.  They can do so
> safely when the server is down.
>
> This whole thing looks like a nonissue to me.  If you set it up wrong,
> it won't work.  So don't do that.
>
Yeah. Sounds fair.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



On Mon, Jun 10, 2019 at 07:56:12PM +0530, Kuntal Ghosh wrote:
> On Mon, Jun 10, 2019 at 7:19 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> This whole thing looks like a nonissue to me.  If you set it up wrong,
>> it won't work.  So don't do that.

+1.
--
Michael

Attachment
On 06/10/2019 04:37 PM, Kuntal Ghosh wrote:
> When we define a restore command, we tell the server to copy a file a
> WAL file from the archive. So, it should be
> restore_command='cp tmp/archive_dir1/%f %p'
>
> This is the reason you're getting this following error.

Ohh. Mea Culpa.

Thanks for pointing out.

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company