Re: Automatic restore corruption problem - Mailing list pgsql-admin

From Guillaume Lelarge
Subject Re: Automatic restore corruption problem
Date
Msg-id CAECtzeVB+28FXvNW=51wiqQVrNaNEu1iZ-LOnGyafJvnWxXrzw@mail.gmail.com
Whole thread Raw
In response to Re: Automatic restore corruption problem  (Matthieu Lejeune <matthieu.lejeune@exxoss.com>)
Responses Re: Automatic restore corruption problem
List pgsql-admin
2015-07-14 12:32 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:
Hi,

The master server : 10.10.11.1 is p2prddnmdbm
I got a real time slave with the recovery process : p2prddnmdbs

I have a third copy with a H-24 data : p2prddnmdbc
On this server I need to have the copy of the master at 23H00 every day.
So I'm using the script send before.
During the day the buisness modify the database on server p2prddnmdbc and that's why I use a rsync --delete every night to get a clean copy of the prod server.

So what's the best way to do the restore ?


There were two issues on your script: it removed the backup_label file and didn't create the recovery.conf file. Did you fix this? if yes, can you send back your script? Thanks.

Thanks

Matthieu


Le 14/07/15 08:50, Guillaume Lelarge a écrit :
2015-07-14 8:09 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:
Hi,

I had no recovery.conf on this server because I launch my replication every night I need a H-24 copy database.


On your first email, you said: "I have a script for restoring a database every night to an other postgresql database". A restore is not replication, even it's a PITR restore.

Still on your first email, you execute pg_start_backup() and pg_stop_backup() on p2prddnmdbm, and the rsync on 10.10.11.1. I suppose p2prddnmdbm and 10.10.11.1 are the same server?

This is my recovery.conf
root@p2prddnmdbc:/var/lib/postgresql# cat recovery.conf
standby_mode = 'off'
primary_conninfo = 'host=10.10.11.1 port=5432 user=replicator password=XXXXXX'
trigger_file = '/var/lib/postgresql/9.1/main/trigger'
restore_command = 'cp /mnt/p2prddnmdbm_pg_xlog/%f %p'
root@p2prddnmdbc:/var/lib/postgresql#


You don't need primary_conninfo if you only want to restore your database. Though that isn't your issue right now.
 
But with or without a recovery.conf file I can't start the database service :
root@p2prddnmdbc:/var/lib/postgresql# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-14 08:02:59 CEST LOG: database system was interrupted; last known up at 2015-07-13 23:33:46 CEST 2015-07-14 08:02:59 CEST LOG: invalid checkpoint record 2015-07-14 08:02:59 CEST FATAL: could not locate required checkpoint record 2015-07-14 08:02:59 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-14 08:02:59 CEST LOG: startup process (PID 24617) exited with exit co[FAIL2015-07-14 08:02:59 CEST LOG: aborting startup due to startup process failure ... failed!
 failed!


In your previous email, you had the startup process waiting for a file. These logs can't be the good ones.

What you should probably do is tell us exactly what you want to do, and state what you do right now, and what logs you get, and what processes are on the server. That would help us to help you.

Thanks
Matthieu

Le 12/07/15 13:46, Guillaume Lelarge a écrit :
Hi,

2015-07-12 10:18 GMT+02:00 Matthieu Lejeune <matthieu.lejeune@exxoss.com>:
Hi thank for your reply

My target is to give a database for buisness testing query and they are modify the database during the buisness day.

Now I got this error if I keep the file backup_label :

root@p2prddnmdbc:/var/lib/postgresql/9.3/main# /etc/init.d/postgresql start
[....] Starting PostgreSQL 9.3 database server: main[....] The PostgreSQL server failed to start. Please check the log output: 2015-07-12 10:12:45 CEST LOG: database system was shut down at 2015-07-12 10:07:10 CEST 2015-07-12 10:12:45 CEST LOG: invalid checkpoint record 2015-07-12 10:12:45 CEST FATAL: could not locate required checkpoint record 2015-07-12 10:12:45 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.3/main/backup_label". 2015-07-12 10:12:45 CEST LOG: startup process (PID 28492) exited with exit code 1 2015-07-12 10:12:45 CEST LOG: abo[FAIL startup due to startup process failure ... failed!
 failed!


If I put the recovery.conf the database is waiting for the wal to relaunch the replication.

postgres 27817  0.7  0.9 631212 39892 ?        S    09:55   0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main
postgres 27818  0.0  0.0 631472  2076 ?        Ss   09:55   0:00  \_ postgres: startup process   waiting for 0000000100000178000000B9
root@p2prddnmdbc:/var/lib/postgresql/9.3/main# su - postgres
postgres@p2prddnmdbc:~$ psql energycomm
psql: FATAL:  the database system is starting up

Have you got an idea to stop the replication process and start the database ?


What did you put in the recovery.conf file? (hint: standby_mode must be off)

 
Kind regards
Matthieu

Le 10/07/15 16:46, Keith a écrit :

On Fri, Jul 10, 2015 at 10:07 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:

Hi,

Le 10 juil. 2015 3:02 PM, "Matthieu Lejeune" <matthieu.lejeune@exxoss.com> a écrit :
>
> Hi all,
>
> I have a script for restoring a database every night to an other postgresql database
>
> root@p2prddnmdbc:~# cat /var/admin/script/restoredb.sh
> #/bin/bash
> /etc/init.d/postgresql stop
> mv /var/log/postgresql/postgresql-9.3-main.log /var/log/postgresql/postgresql-9.3-main.log.old
> cd /var/lib/postgresql/9.3/main
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_start_backup('sync');"
> rsync -av --delete root@10.10.11.1:/var/lib/postgresql/9.3/main/* /var/lib/postgresql/9.3/main/
> rm backup_label
> chown -R postgres:postgres *
> psql --host=p2prddnmdbm --username=replicator postgres -c "SELECT pg_stop_backup();"
> /etc/init.d/postgresql start
> chmod 777 /var/log/postgresql/postgresql-9.3-main.log
> psql -U postgres -c "ALTER USER xxxx WITH PASSWORD 'XXXX';"
> psql -U postgres xxxx -c "CREATE EXTENSION dblink;"
> root@p2prddnmdbc:~#
>
>
> But during the day when the user are using the new database we got error like this :
>
> 2015-06-25 16:20:58 CEST ERROR:  could not read block 257985 in file "base/16386/14064061.1": read only 0 of 8192 bytes
> 2015-06-22 15:21:11 CEST ERROR:  could not read block 256801 in file "base/16386/14064061.1": read only 0 of 8192 bytes
>
> I have check the : filesystem on the vm, on the HW SAN,...
>
> Any idea to fix this problem?

Sure. Don't remove the backup_label file, and add the recovery.conf file.

--
Guillaume







--




--




--

pgsql-admin by date:

Previous
From: Natalie Wenz
Date:
Subject: postgres_fdw user mapping and role inheritance
Next
From: Tom Lane
Date:
Subject: Re: segfault when creating index on huge table