Re: Initiating log shipping backups when the initial filesystem-level backup went out of sync - Mailing list pgsql-admin
From | Wouter Verhelst |
---|---|
Subject | Re: Initiating log shipping backups when the initial filesystem-level backup went out of sync |
Date | |
Msg-id | 2440DD2D6E0BA346934AFE42CB9080AB77BC0B15@lhreml503-mbx Whole thread Raw |
Responses |
Re: Initiating log shipping backups when the initial
filesystem-level backup went out of sync
|
List | pgsql-admin |
> Hi, > > A while back, I had to replace the hardware for a postgres server > containing about 600G of data. While at it, we did an upgrade of the > postgresql version used, and we had intended to also modify the way > backups were being done, by moving from a daily pg_dumpall in cron to > shipping WAL logs to an offsite location. > > Due to the many changes involved, I decided that it would be easier to > copy the data by running "ssh oldserver pg_dumpall | psql postgres", > but I did (approximately) the following: > > - Create the cluster > - Set "wal_level = hot_standby", "archive_mode = on", and > "archive_command = 'test ! -f /srv/pg_wal/%f && cp %p /srv/pg_wal/%f' > in postgresql.conf. > - Set up everything so that files written to /srv/pg_wal end up > (eventually) on the remote server. > - rsync the data directory to the off-site server > - Start the server > - Run the pg_dumpall thing > - Start the server on the off-site location, to test if it could open > everything, and stop it again > - Create a file "recovery.conf" in the wrong location (at first) and > then in the correct location (later on, after reading the documentation > more carefully), with a line saying "restore_command = 'cp > /srv/pg_wal/%f "%p"'" and one saying "standby_mode = on" > - Start the server > > At this point, the log file shows the following message: > > 2016-08-29 13:53:22 CEST [25504-1] LOG: database system was shut down > in recovery at (...) > 2016-08-29 13:53:22 CEST [25504-2] LOG: entering standby mode > 2016-08-29 13:53:22 CEST [25504-3] LOG: record with zero length at > 0/2000200 > 2016-08-29 13:53:22 CEST [25504-4] LOG: invalid primary checkpoint > record > 2016-08-29 13:53:22 CEST [25504-5] LOG: record with zero length at > 0/2000198 > 2016-08-29 13:53:22 CEST [25504-6] LOG: invalid secondary checkpoint > record > 2016-08-29 13:53:22 CEST [25504-7] PANIC: could not locate a valid > checkpoint record > 2016-08-29 13:53:22 CEST [25480-1] LOG: startup process (PID 25504) > was terminated by signal 6: Aborted > 2016-08-29 13:53:22 CEST [25480-2] LOG: aborting startup due to > startup process failure > > In the pg_xlog directory on the off-site server, there is a file > 000000010000000000000002 and one 000000010000000000000003. When I run > md5sum, I find that the first has the same checksum as the one with the > same name in the pg_wal directory containing the files from the > original server; the second does not, but it *does* have the same > checksum as the file 000000010000000000000001 in that directory (which > does not exist in the pg_xlog directory on the off-site server). > > I have retained all WAL files from 000000010000000000000001 all the way > to the current one. > > Is it still possible for me to load these WAL files into the server at > the remote site, without having to do a filesystem-level copy of all > the files in the postgresql database directory? If not, any hints on > what I did wrong? > > Thanks, Ping? Is this possible, or should I spend time to rsync files over? Thanks, -- Wouter Verhelst
pgsql-admin by date: