Thread: Automatic restore corruption problem

Automatic restore corruption problem

From

Matthieu Lejeune

Date:

10 July 2015, 12:59:29

Hi all,

I have a script for restoring a database every night to an other
postgresql database

root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
#/bin/bash
/etc/init.d/postgresql stop
mv /var/log/postgresql/postgresql-9.3-main.log
/var/log/postgresql/postgresql-9.3-main.log.old
cd /var/lib/postgresql/9.3/main
psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT
pg_start_backup('sync');"
rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/*
/var/lib/postgresql/9.3/main/
rm backup_label
chown -R postgres:postgres *
psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT
pg_stop_backup();"
/etc/init.d/postgresql start
chmod 777 /var/log/postgresql/postgresql-9.3-main.log
psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
root@p2prddnmdbc:~#


But during the day when the user are using the new database we got error
like this :

2015-06-25 16:20:58 CEST ERROR:  could not read block 257985 in file
"base/16386/14064061.1": read only 0 of 8192 bytes
2015-06-22 15:21:11 CEST ERROR:  could not read block 256801 in file
"base/16386/14064061.1": read only 0 of 8192 bytes

I have check the : filesystem on the vm, on the HW SAN,...

Any idea to fix this problem?

Thanks

Matthieu

Re: Automatic restore corruption problem

From

"michael@sqlexec.com"

Date:

10 July 2015, 13:30:35

Perhaps you did not start up the new database in restore mode pointing to the WAL logs that were generated during your rsync operation that were not captured.

Regards,
Michael

Matthieu Lejeune
Friday, July 10, 2015 8:59 AM
Hi all,

I have a script for restoring a database every night to an other postgresql database

root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
#/bin/bash
/etc/init.d/postgresql stop
mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
cd /var/lib/postgresql/9.3/main
psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
rm backup_label
chown -R postgres:postgres *
psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
/etc/init.d/postgresql start
chmod 777 /var/log/postgresql/postgresql-9.3-main.log
psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
root@p2prddnmdbc:~#

But during the day when the user are using the new database we got error like this :

2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes

I have check the : filesystem on the vm, on the HW SAN,...

Any idea to fix this problem?

Thanks

Matthieu

Re: Automatic restore corruption problem

From

Guillaume Lelarge

Date:

10 July 2015, 14:07:14

Hi,

Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?

Sure. Don't remove the backup_label file, and add the recovery.conf file.

--
Guillaume

Re: Automatic restore corruption problem

From

Keith

Date:

10 July 2015, 14:47:09

A recent, relevant post

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:

Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?
Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume

Re: Automatic restore corruption problem

From

Matthieu Lejeune

Date:

12 July 2015, 08:18:26

Hi thank for your reply

My target is to give a database for buisness testing query and they are modify the database during the buisness day.

Now I got this error if I keep the file backup_label :

root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!

If I put the recovery.conf the database is waiting for the wal to relaunch the replication.

postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up

Have you got an idea to stop the replication process and start the database ?

Kind regards
Matthieu

Le 10/07/15 16:46, Keith a écrit :

A recent, relevant post

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?
Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume

Re: Automatic restore corruption problem

From

Guillaume Lelarge

Date:

12 July 2015, 11:46:59

Hi,

2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:

Hi thank for your reply

My target is to give a database for buisness testing query and they are modify the database during the buisness day.

Now I got this error if I keep the file backup_label :

root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!

If I put the recovery.conf the database is waiting for the wal to relaunch the replication.

postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up

Have you got an idea to stop the replication process and start the database ?

What did you put in the recovery.conf file? (hint: standby_mode must be off)

Kind regards
Matthieu

Le 10/07/15 16:46, Keith a écrit :
A recent, relevant post

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?
Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume

Guillaume.

http://blog.guillaume.lelarge.info

http://www.dalibo.com

Re: Automatic restore corruption problem

From

Matthieu Lejeune

Date:

14 July 2015, 06:09:07

Hi,

I had no recovery.conf on this server because I launch my replication every night I need a H-24 copy database.

This is my recovery.conf
root@p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
standby_mode = 'off'
primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator password=XXXXXX'
trigger_file = '/var/lib/postgresql/9.1/main/trigger'
restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
root@p2prddnmdbc:/var/lib/postgresql#

But with or without a recovery.conf file I can't start the database service :
root@p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-14 08:02:59 CEST LOG: database system was interrupted; last known up at 2015-07-13 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14 08:02:59 CEST LOG: startup process (PID 24617) exited with exit co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup process failure ... failed!
failed!

Thanks
Matthieu

Le 12/07/15 13:46, Guillaume Lelarge a écrit :

Hi,

2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:
Hi thank for your reply

My target is to give a database for buisness testing query and they are modify the database during the buisness day.

Now I got this error if I keep the file backup_label :

root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!

If I put the recovery.conf the database is waiting for the wal to relaunch the replication.

postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up

Have you got an idea to stop the replication process and start the database ?

What did you put in the recovery.conf file? (hint: standby_mode must be off)

Kind regards
Matthieu

Le 10/07/15 16:46, Keith a écrit :
A recent, relevant post

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?
Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

Re: Automatic restore corruption problem

From

Guillaume Lelarge

Date:

14 July 2015, 06:51:06

2015-07-14 8:09 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:

Hi,

I had no recovery.conf on this server because I launch my replication every night I need a H-24 copy database.

On your first email, you said: "I have a script for restoring a database every night to an other postgresql database". A restore is not replication, even it's a PITR restore.

Still on your first email, you execute pg_start_backup() and pg_stop_backup() on p2prddnmdbm, and the rsync on 10.10.11.1. I suppose p2prddnmdbm and 10.10.11.1 are the same server?

This is my recovery.conf
root@p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
standby_mode = 'off'
primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator password=XXXXXX'
trigger_file = '/var/lib/postgresql/9.1/main/trigger'
restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
root@p2prddnmdbc:/var/lib/postgresql#

You don't need primary_conninfo if you only want to restore your database. Though that isn't your issue right now.

But with or without a recovery.conf file I can't start the database service :
root@p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-14 08:02:59 CEST LOG: database system was interrupted; last known up at 2015-07-13 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14 08:02:59 CEST LOG: startup process (PID 24617) exited with exit co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup process failure ... failed!
failed!

In your previous email, you had the startup process waiting for a file. These logs can't be the good ones.

What you should probably do is tell us exactly what you want to do, and state what you do right now, and what logs you get, and what processes are on the server. That would help us to help you.

Thanks
Matthieu

Le 12/07/15 13:46, Guillaume Lelarge a écrit :
Hi,

2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:
Hi thank for your reply

My target is to give a database for buisness testing query and they are modify the database during the buisness day.

Now I got this error if I keep the file backup_label :

root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!

If I put the recovery.conf the database is waiting for the wal to relaunch the replication.

postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up

Have you got an idea to stop the replication process and start the database ?

What did you put in the recovery.conf file? (hint: standby_mode must be off)

Kind regards
Matthieu

Le 10/07/15 16:46, Keith a écrit :
A recent, relevant post

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?
Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

Guillaume.

http://blog.guillaume.lelarge.info

http://www.dalibo.com

Re: Automatic restore corruption problem

From

Matthieu Lejeune

Date:

14 July 2015, 10:33:06

Hi,

The master server : 10.10.11.1 is p2prddnmdbm
I got a real time slave with the recovery process : p2prddnmdbs

I have a third copy with a H-24 data : p2prddnmdbc
On this server I need to have the copy of the master at 23H00 every day.
So I'm using the script send before.
During the day the buisness modify the database on server p2prddnmdbc and that's why I use a rsync --delete every night to get a clean copy of the prod server.

So what's the best way to do the restore ?

Thanks

Matthieu

Le 14/07/15 08:50, Guillaume Lelarge a écrit :

2015-07-14 8:09 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:
Hi,

I had no recovery.conf on this server because I launch my replication every night I need a H-24 copy database.

On your first email, you said: "I have a script for restoring a database every night to an other postgresql database". A restore is not replication, even it's a PITR restore.

Still on your first email, you execute pg_start_backup() and pg_stop_backup() on p2prddnmdbm, and the rsync on 10.10.11.1. I suppose p2prddnmdbm and 10.10.11.1 are the same server?

This is my recovery.conf
root@p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
standby_mode = 'off'
primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator password=XXXXXX'
trigger_file = '/var/lib/postgresql/9.1/main/trigger'
restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
root@p2prddnmdbc:/var/lib/postgresql#

You don't need primary_conninfo if you only want to restore your database. Though that isn't your issue right now.

But with or without a recovery.conf file I can't start the database service :
root@p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-14 08:02:59 CEST LOG: database system was interrupted; last known up at 2015-07-13 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14 08:02:59 CEST LOG: startup process (PID 24617) exited with exit co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup process failure ... failed!
failed!

In your previous email, you had the startup process waiting for a file. These logs can't be the good ones.

What you should probably do is tell us exactly what you want to do, and state what you do right now, and what logs you get, and what processes are on the server. That would help us to help you.

Thanks
Matthieu

Le 12/07/15 13:46, Guillaume Lelarge a écrit :
Hi,

2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:
Hi thank for your reply

My target is to give a database for buisness testing query and they are modify the database during the buisness day.

Now I got this error if I keep the file backup_label :

root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!

If I put the recovery.conf the database is waiting for the wal to relaunch the replication.

postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up

Have you got an idea to stop the replication process and start the database ?

What did you put in the recovery.conf file? (hint: standby_mode must be off)

Kind regards
Matthieu

Le 10/07/15 16:46, Keith a écrit :
A recent, relevant post

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?
Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

Re: Automatic restore corruption problem

From

Guillaume Lelarge

Date:

16 July 2015, 18:12:02

2015-07-14 12:32 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:

Hi,

The master server : 10.10.11.1 is p2prddnmdbm
I got a real time slave with the recovery process : p2prddnmdbs

I have a third copy with a H-24 data : p2prddnmdbc
On this server I need to have the copy of the master at 23H00 every day.
So I'm using the script send before.
During the day the buisness modify the database on server p2prddnmdbc and that's why I use a rsync --delete every night to get a clean copy of the prod server.

So what's the best way to do the restore ?

There were two issues on your script: it removed the backup_label file and didn't create the recovery.conf file. Did you fix this? if yes, can you send back your script? Thanks.

Thanks

Matthieu

Le 14/07/15 08:50, Guillaume Lelarge a écrit :
2015-07-14 8:09 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:
Hi,

I had no recovery.conf on this server because I launch my replication every night I need a H-24 copy database.

On your first email, you said: "I have a script for restoring a database every night to an other postgresql database". A restore is not replication, even it's a PITR restore.

Still on your first email, you execute pg_start_backup() and pg_stop_backup() on p2prddnmdbm, and the rsync on 10.10.11.1. I suppose p2prddnmdbm and 10.10.11.1 are the same server?

This is my recovery.conf
root@p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
standby_mode = 'off'
primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator password=XXXXXX'
trigger_file = '/var/lib/postgresql/9.1/main/trigger'
restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
root@p2prddnmdbc:/var/lib/postgresql#

You don't need primary_conninfo if you only want to restore your database. Though that isn't your issue right now.

But with or without a recovery.conf file I can't start the database service :
root@p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-14 08:02:59 CEST LOG: database system was interrupted; last known up at 2015-07-13 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14 08:02:59 CEST LOG: startup process (PID 24617) exited with exit co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup process failure ... failed!
failed!

In your previous email, you had the startup process waiting for a file. These logs can't be the good ones.

What you should probably do is tell us exactly what you want to do, and state what you do right now, and what logs you get, and what processes are on the server. That would help us to help you.

Thanks
Matthieu

Le 12/07/15 13:46, Guillaume Lelarge a écrit :
Hi,

2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:
Hi thank for your reply

My target is to give a database for buisness testing query and they are modify the database during the buisness day.

Now I got this error if I keep the file backup_label :

root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!

If I put the recovery.conf the database is waiting for the wal to relaunch the replication.

postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up

Have you got an idea to stop the replication process and start the database ?

What did you put in the recovery.conf file? (hint: standby_mode must be off)

Kind regards
Matthieu

Le 10/07/15 16:46, Keith a écrit :
A recent, relevant post

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?
Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

Guillaume.

http://blog.guillaume.lelarge.info

http://www.dalibo.com

Re: Automatic restore corruption problem

From

Kevin Grittner

Date:

16 July 2015, 18:40:23

Guillaume Lelarge <guillaume@lelarge.info> wrote:

> There were two issues on your script: it removed the backup_label
> file and didn't create the recovery.conf file.

There were more flaws in the script than that. See:

http://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-PITR-RECOVERY

That says, in part:

| Remove any files present in pg_xlog/; these came from the file
| system backup and are therefore probably obsolete rather than
| current.

The point is, if you are unfortunate enough to have recovery use
one of the WAL files copied during the backup, it might not be
complete, since they may have been copied at any point in the
backup -- quite likely before all the WAL records needed for
reaching consistency were added. The archived WAL files will be
complete; set up the recovery.conf file to use them.

Elsewhere on that page it says:

| You might also want to exclude postmaster.pid and
| postmaster.opts, which record information about the running
| postmaster, not about the postmaster which will eventually use
| this backup. (These files can confuse pg_ctl.)

If these files are removed, it is much harder to accidentally bring
up the database with corruption.

> Did you fix this? if yes, can you send back your script? Thanks.

Yeah, if you think you are now following the instructions shown at
the link I show above and are still having problems, please show
all steps in the process -- copy and paste scripts and commands,
and the relevant portions of logs.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company