Thread: Re: Initiating log shipping backups when the initial filesystem-level backup went out of sync
Re: Initiating log shipping backups when the initial filesystem-level backup went out of sync
From
Wouter Verhelst
Date:
> Hi, > > A while back, I had to replace the hardware for a postgres server > containing about 600G of data. While at it, we did an upgrade of the > postgresql version used, and we had intended to also modify the way > backups were being done, by moving from a daily pg_dumpall in cron to > shipping WAL logs to an offsite location. > > Due to the many changes involved, I decided that it would be easier to > copy the data by running "ssh oldserver pg_dumpall | psql postgres", > but I did (approximately) the following: > > - Create the cluster > - Set "wal_level = hot_standby", "archive_mode = on", and > "archive_command = 'test ! -f /srv/pg_wal/%f && cp %p /srv/pg_wal/%f' > in postgresql.conf. > - Set up everything so that files written to /srv/pg_wal end up > (eventually) on the remote server. > - rsync the data directory to the off-site server > - Start the server > - Run the pg_dumpall thing > - Start the server on the off-site location, to test if it could open > everything, and stop it again > - Create a file "recovery.conf" in the wrong location (at first) and > then in the correct location (later on, after reading the documentation > more carefully), with a line saying "restore_command = 'cp > /srv/pg_wal/%f "%p"'" and one saying "standby_mode = on" > - Start the server > > At this point, the log file shows the following message: > > 2016-08-29 13:53:22 CEST [25504-1] LOG: database system was shut down > in recovery at (...) > 2016-08-29 13:53:22 CEST [25504-2] LOG: entering standby mode > 2016-08-29 13:53:22 CEST [25504-3] LOG: record with zero length at > 0/2000200 > 2016-08-29 13:53:22 CEST [25504-4] LOG: invalid primary checkpoint > record > 2016-08-29 13:53:22 CEST [25504-5] LOG: record with zero length at > 0/2000198 > 2016-08-29 13:53:22 CEST [25504-6] LOG: invalid secondary checkpoint > record > 2016-08-29 13:53:22 CEST [25504-7] PANIC: could not locate a valid > checkpoint record > 2016-08-29 13:53:22 CEST [25480-1] LOG: startup process (PID 25504) > was terminated by signal 6: Aborted > 2016-08-29 13:53:22 CEST [25480-2] LOG: aborting startup due to > startup process failure > > In the pg_xlog directory on the off-site server, there is a file > 000000010000000000000002 and one 000000010000000000000003. When I run > md5sum, I find that the first has the same checksum as the one with the > same name in the pg_wal directory containing the files from the > original server; the second does not, but it *does* have the same > checksum as the file 000000010000000000000001 in that directory (which > does not exist in the pg_xlog directory on the off-site server). > > I have retained all WAL files from 000000010000000000000001 all the way > to the current one. > > Is it still possible for me to load these WAL files into the server at > the remote site, without having to do a filesystem-level copy of all > the files in the postgresql database directory? If not, any hints on > what I did wrong? > > Thanks, Ping? Is this possible, or should I spend time to rsync files over? Thanks, -- Wouter Verhelst
Re: Initiating log shipping backups when the initial filesystem-level backup went out of sync
From
Robert Burgholzer
Date:
My experienc is that rsync is the way to go when a replicant gets stale -- you can at least be certain you didn't miss anything and it will take much less time than the initial rsync since it only takes things that are new.
--
Hth,
/r/b
On Monday, September 5, 2016, Wouter Verhelst <wouter.verhelst.ext@huawei.com> wrote:
On Monday, September 5, 2016, Wouter Verhelst <wouter.verhelst.ext@huawei.com> wrote:
> Hi,
>
> A while back, I had to replace the hardware for a postgres server
> containing about 600G of data. While at it, we did an upgrade of the
> postgresql version used, and we had intended to also modify the way
> backups were being done, by moving from a daily pg_dumpall in cron to
> shipping WAL logs to an offsite location.
>
> Due to the many changes involved, I decided that it would be easier to
> copy the data by running "ssh oldserver pg_dumpall | psql postgres",
> but I did (approximately) the following:
>
> - Create the cluster
> - Set "wal_level = hot_standby", "archive_mode = on", and
> "archive_command = 'test ! -f /srv/pg_wal/%f && cp %p /srv/pg_wal/%f'
> in postgresql.conf.
> - Set up everything so that files written to /srv/pg_wal end up
> (eventually) on the remote server.
> - rsync the data directory to the off-site server
> - Start the server
> - Run the pg_dumpall thing
> - Start the server on the off-site location, to test if it could open
> everything, and stop it again
> - Create a file "recovery.conf" in the wrong location (at first) and
> then in the correct location (later on, after reading the documentation
> more carefully), with a line saying "restore_command = 'cp
> /srv/pg_wal/%f "%p"'" and one saying "standby_mode = on"
> - Start the server
>
> At this point, the log file shows the following message:
>
> 2016-08-29 13:53:22 CEST [25504-1] LOG: database system was shut down
> in recovery at (...)
> 2016-08-29 13:53:22 CEST [25504-2] LOG: entering standby mode
> 2016-08-29 13:53:22 CEST [25504-3] LOG: record with zero length at
> 0/2000200
> 2016-08-29 13:53:22 CEST [25504-4] LOG: invalid primary checkpoint
> record
> 2016-08-29 13:53:22 CEST [25504-5] LOG: record with zero length at
> 0/2000198
> 2016-08-29 13:53:22 CEST [25504-6] LOG: invalid secondary checkpoint
> record
> 2016-08-29 13:53:22 CEST [25504-7] PANIC: could not locate a valid
> checkpoint record
> 2016-08-29 13:53:22 CEST [25480-1] LOG: startup process (PID 25504)
> was terminated by signal 6: Aborted
> 2016-08-29 13:53:22 CEST [25480-2] LOG: aborting startup due to
> startup process failure
>
> In the pg_xlog directory on the off-site server, there is a file
> 000000010000000000000002 and one 000000010000000000000003. When I run
> md5sum, I find that the first has the same checksum as the one with the
> same name in the pg_wal directory containing the files from the
> original server; the second does not, but it *does* have the same
> checksum as the file 000000010000000000000001 in that directory (which
> does not exist in the pg_xlog directory on the off-site server).
>
> I have retained all WAL files from 000000010000000000000001 all the way
> to the current one.
>
> Is it still possible for me to load these WAL files into the server at
> the remote site, without having to do a filesystem-level copy of all
> the files in the postgresql database directory? If not, any hints on
> what I did wrong?
>
> Thanks,
Ping?
Is this possible, or should I spend time to rsync files over?
Thanks,
--
Wouter Verhelst
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
--
--
Robert W. Burgholzer
'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.' - Charles Mingus
Robert W. Burgholzer
'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.' - Charles Mingus
Athletics: http://athleticalgorithm.wordpress.com/
Science: http://robertwb.wordpress.com/