Thread: Automatic restore corruption problem
Hi all, I have a script for restoring a database every night to an other postgresql database root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh #/bin/bash /etc/init.d/postgresql stop mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old cd /var/lib/postgresql/9.3/main psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');" rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/ rm backup_label chown -R postgres:postgres * psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();" /etc/init.d/postgresql start chmod 777 /var/log/postgresql/postgresql-9.3-main.log psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';" psql -U postgres xxxx -c "CREATE EXTENSION dblink;" root@p2prddnmdbc:~# But during the day when the user are using the new database we got error like this : 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes I have check the : filesystem on the vm, on the HW SAN,... Any idea to fix this problem? Thanks Matthieu
Regards,
Michael
Friday, July 10, 2015 8:59 AMHi all,
I have a script for restoring a database every night to an other postgresql database
root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
#/bin/bash
/etc/init.d/postgresql stop
mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
cd /var/lib/postgresql/9.3/main
psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
rm backup_label
chown -R postgres:postgres *
psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
/etc/init.d/postgresql start
chmod 777 /var/log/postgresql/postgresql-9.3-main.log
psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
root@p2prddnmdbc:~#
But during the day when the user are using the new database we got error like this :
2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
I have check the : filesystem on the vm, on the HW SAN,...
Any idea to fix this problem?
Thanks
Matthieu
Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?
Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume
Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume
My target is to give a database for buisness testing query and they are modify the database during the buisness day.
Now I got this error if I keep the file backup_label :
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!
If I put the recovery.conf the database is waiting for the wal to relaunch the replication.
postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up
Have you got an idea to stop the replication process and start the database ?
Kind regards
Matthieu
Le 10/07/15 16:46, Keith a écrit :
On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume
2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:
Hi thank for your reply
My target is to give a database for buisness testing query and they are modify the database during the buisness day.
Now I got this error if I keep the file backup_label :
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!
If I put the recovery.conf the database is waiting for the wal to relaunch the replication.
postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up
Have you got an idea to stop the replication process and start the database ?
Kind regards
Matthieu
Le 10/07/15 16:46, Keith a écrit :On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume
--
I had no recovery.conf on this server because I launch my replication every night I need a H-24 copy database.
This is my recovery.conf
root@p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
standby_mode = 'off'
primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator password=XXXXXX'
trigger_file = '/var/lib/postgresql/9.1/main/trigger'
restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
root@p2prddnmdbc:/var/lib/postgresql#
But with or without a recovery.conf file I can't start the database service :
root@p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-14 08:02:59 CEST LOG: database system was interrupted; last known up at 2015-07-13 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14 08:02:59 CEST LOG: startup process (PID 24617) exited with exit co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup process failure ... failed!
failed!
Thanks
Matthieu
Le 12/07/15 13:46, Guillaume Lelarge a écrit :
Hi,
2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:Hi thank for your reply
My target is to give a database for buisness testing query and they are modify the database during the buisness day.
Now I got this error if I keep the file backup_label :
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!
If I put the recovery.conf the database is waiting for the wal to relaunch the replication.
postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up
Have you got an idea to stop the replication process and start the database ?What did you put in the recovery.conf file? (hint: standby_mode must be off)
Kind regards
Matthieu
Le 10/07/15 16:46, Keith a écrit :On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume
--
Hi,
I had no recovery.conf on this server because I launch my replication every night I need a H-24 copy database.
This is my recovery.conf
root@p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
standby_mode = 'off'
primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator password=XXXXXX'
trigger_file = '/var/lib/postgresql/9.1/main/trigger'
restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
root@p2prddnmdbc:/var/lib/postgresql#
But with or without a recovery.conf file I can't start the database service :
root@p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-14 08:02:59 CEST LOG: database system was interrupted; last known up at 2015-07-13 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14 08:02:59 CEST LOG: startup process (PID 24617) exited with exit co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup process failure ... failed!
failed!
Thanks
Matthieu
Le 12/07/15 13:46, Guillaume Lelarge a écrit :Hi,
2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:Hi thank for your reply
My target is to give a database for buisness testing query and they are modify the database during the buisness day.
Now I got this error if I keep the file backup_label :
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!
If I put the recovery.conf the database is waiting for the wal to relaunch the replication.
postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up
Have you got an idea to stop the replication process and start the database ?What did you put in the recovery.conf file? (hint: standby_mode must be off)
Kind regards
Matthieu
Le 10/07/15 16:46, Keith a écrit :On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume
--
--
The master server : 10.10.11.1 is p2prddnmdbm
I got a real time slave with the recovery process : p2prddnmdbs
I have a third copy with a H-24 data : p2prddnmdbc
On this server I need to have the copy of the master at 23H00 every day.
So I'm using the script send before.
During the day the buisness modify the database on server p2prddnmdbc and that's why I use a rsync --delete every night to get a clean copy of the prod server.
So what's the best way to do the restore ?
Thanks
Matthieu
Le 14/07/15 08:50, Guillaume Lelarge a écrit :
2015-07-14 8:09 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:Hi,
I had no recovery.conf on this server because I launch my replication every night I need a H-24 copy database.On your first email, you said: "I have a script for restoring a database every night to an other postgresql database". A restore is not replication, even it's a PITR restore.Still on your first email, you execute pg_start_backup() and pg_stop_backup() on p2prddnmdbm, and the rsync on 10.10.11.1. I suppose p2prddnmdbm and 10.10.11.1 are the same server?This is my recovery.conf
root@p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
standby_mode = 'off'
primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator password=XXXXXX'
trigger_file = '/var/lib/postgresql/9.1/main/trigger'
restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
root@p2prddnmdbc:/var/lib/postgresql#You don't need primary_conninfo if you only want to restore your database. Though that isn't your issue right now.
But with or without a recovery.conf file I can't start the database service :
root@p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-14 08:02:59 CEST LOG: database system was interrupted; last known up at 2015-07-13 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14 08:02:59 CEST LOG: startup process (PID 24617) exited with exit co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup process failure ... failed!
failed!In your previous email, you had the startup process waiting for a file. These logs can't be the good ones.What you should probably do is tell us exactly what you want to do, and state what you do right now, and what logs you get, and what processes are on the server. That would help us to help you.Thanks
Matthieu
Le 12/07/15 13:46, Guillaume Lelarge a écrit :Hi,
2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:Hi thank for your reply
My target is to give a database for buisness testing query and they are modify the database during the buisness day.
Now I got this error if I keep the file backup_label :
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!
If I put the recovery.conf the database is waiting for the wal to relaunch the replication.
postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up
Have you got an idea to stop the replication process and start the database ?What did you put in the recovery.conf file? (hint: standby_mode must be off)
Kind regards
Matthieu
Le 10/07/15 16:46, Keith a écrit :On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume
--
--
Hi,
The master server : 10.10.11.1 is p2prddnmdbm
I got a real time slave with the recovery process : p2prddnmdbs
I have a third copy with a H-24 data : p2prddnmdbc
On this server I need to have the copy of the master at 23H00 every day.
So I'm using the script send before.
During the day the buisness modify the database on server p2prddnmdbc and that's why I use a rsync --delete every night to get a clean copy of the prod server.
So what's the best way to do the restore ?
Thanks
Matthieu
Le 14/07/15 08:50, Guillaume Lelarge a écrit :2015-07-14 8:09 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:Hi,
I had no recovery.conf on this server because I launch my replication every night I need a H-24 copy database.On your first email, you said: "I have a script for restoring a database every night to an other postgresql database". A restore is not replication, even it's a PITR restore.Still on your first email, you execute pg_start_backup() and pg_stop_backup() on p2prddnmdbm, and the rsync on 10.10.11.1. I suppose p2prddnmdbm and 10.10.11.1 are the same server?This is my recovery.conf
root@p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
standby_mode = 'off'
primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator password=XXXXXX'
trigger_file = '/var/lib/postgresql/9.1/main/trigger'
restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
root@p2prddnmdbc:/var/lib/postgresql#You don't need primary_conninfo if you only want to restore your database. Though that isn't your issue right now.
But with or without a recovery.conf file I can't start the database service :
root@p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-14 08:02:59 CEST LOG: database system was interrupted; last known up at 2015-07-13 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14 08:02:59 CEST LOG: startup process (PID 24617) exited with exit co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup process failure ... failed!
failed!In your previous email, you had the startup process waiting for a file. These logs can't be the good ones.What you should probably do is tell us exactly what you want to do, and state what you do right now, and what logs you get, and what processes are on the server. That would help us to help you.Thanks
Matthieu
Le 12/07/15 13:46, Guillaume Lelarge a écrit :Hi,
2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:Hi thank for your reply
My target is to give a database for buisness testing query and they are modify the database during the buisness day.
Now I got this error if I keep the file backup_label :
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
failed!
If I put the recovery.conf the database is waiting for the wal to relaunch the replication.
postgres 27817 0.7 0.9 631212 39892 ? S 09:55 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818 0.0 0.0 631472 2076 ? Ss 09:55 0:00 \_ postgres: startup process waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL: the database system is starting up
Have you got an idea to stop the replication process and start the database ?What did you put in the recovery.conf file? (hint: standby_mode must be off)
Kind regards
Matthieu
Le 10/07/15 16:46, Keith a écrit :On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:Hi,
Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR: could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR: could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?Sure. Don't remove the backup_label file, and add the recovery.conf file.
--
Guillaume
--
--
--
Guillaume Lelarge <guillaume@lelarge.info> wrote: > There were two issues on your script: it removed the backup_label > file and didn't create the recovery.conf file. There were more flaws in the script than that. See: http://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-PITR-RECOVERY That says, in part: | Remove any files present in pg_xlog/; these came from the file | system backup and are therefore probably obsolete rather than | current. The point is, if you are unfortunate enough to have recovery use one of the WAL files copied during the backup, it might not be complete, since they may have been copied at any point in the backup -- quite likely before all the WAL records needed for reaching consistency were added. The archived WAL files will be complete; set up the recovery.conf file to use them. Elsewhere on that page it says: | You might also want to exclude postmaster.pid and | postmaster.opts, which record information about the running | postmaster, not about the postmaster which will eventually use | this backup. (These files can confuse pg_ctl.) If these files are removed, it is much harder to accidentally bring up the database with corruption. > Did you fix this? if yes, can you send back your script? Thanks. Yeah, if you think you are now following the instructions shown at the link I show above and are still having problems, please show all steps in the process -- copy and paste scripts and commands, and the relevant portions of logs. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company