Thread: Recovery continually requests new WAL files
Hey! I have a simple setup with one master and one backup server. I have an issue where I have performed a backup and copied it to the data directory for the slave, written a recovery.conf and copied in the backup_label file and then started the server, it happily restores everything up until and including the WAL file mentioned in the backup_label and then attempts to obtain the next archive file which has not yet been archived. I can't for the life of me figure out what is going on. Here's a break down of what I do call pg_start_backup('label') tar -zcf backup.tar.gz base global pg_clog pg_multixact pg_notify pg_serial pg_subtrans pg_tblspc pg_twophase backup_label call pg_stop_backup() scp pgsql.tar.gz slave_hostname:/var/lib/postgresql/9.1/main move to slave server rm -rf global base pg_clog pg_multixact pg_notify pg_serial pg_subtrans pg_tblspc pg_twophase pg_xlog/* mkdir pg_xlog/archive_status tar -xvf backup.tar.gz restart postgresql ---------------- recovery.conf ----------------- restore_command = 'scp master-hostname:/var/lib/postgresql/9.1/main/wal_archives/%f %p' standby_mode=on And here's what I'm seeing in the logs on the recovering server 2012-06-12 16:31:26 UTC FATAL: the database system is starting up 2012-06-12 16:31:27 UTC FATAL: the database system is starting up 2012-06-12 16:31:27 UTC FATAL: the database system is starting up 2012-06-12 16:31:27 UTC LOG: incomplete startup packet 2012-06-12 16:31:30 UTC LOG: restored log file "00000001000000000000000A" from archive 2012-06-12 16:31:30 UTC LOG: redo starts at 0/A000078 2012-06-12 16:31:30 UTC LOG: consistent recovery state reached at 0/B000000 scp: /var/lib/postgresql/9.1/main/wal_archives/00000001000000000000000B: No such file or directory scp: /var/lib/postgresql/9.1/main/wal_archives/00000001000000000000000B: No such file or directory scp: /var/lib/postgresql/9.1/main/wal_archives/00000001000000000000000B: No such file or directory I'm confused by this because the 00000001000000000000000B archive wasn't created until after the pg_stop_backup call so why is it needed? Any help would be appreciated, I've been banging my head against this one for a while. Thanks Alex
Alex Good wrote: > I have a simple setup with one master and one backup server. I have an > issue where I have performed a backup and copied it to the data > directory for the slave, written a recovery.conf and copied in the > backup_label file and then started the server, it happily restores > everything up until and including the WAL file mentioned in the > backup_label and then attempts to obtain the next archive file which has > not yet been archived. I can't for the life of me figure out what is > going on. What else would you expect? Are you planning to use streaming replication? If yes, what are your configuration parameters for replication? Yours, Laurenz Albe
Alex Good wrote: > What I expected to see was the server requesting each WAL file up until > the one which was archived during pg_stop_backup and then the server > would consider itself to be recovered. Clearly I have misunderstood > something here. > > These two servers are actually sat behind pgpool which is in replication > mode (so I don't have streaming replication set up) which I chose > beccause it gives me synchronous replication as well as automatic > failover. I am trying to understand the recovery process so I can use it > to set up pgpools' online recovery feature. Oh, you didn't say that it is about pgpool. You might try to ask their mailing lists: http://www.pgpool.net/mediawiki/index.php/Mailing_lists Yours, Laurenz Albe
On 13/06/12 10:29, Albe Laurenz wrote: > Alex Good wrote: >> What I expected to see was the server requesting each WAL file up > until >> the one which was archived during pg_stop_backup and then the server >> would consider itself to be recovered. Clearly I have misunderstood >> something here. >> >> These two servers are actually sat behind pgpool which is in > replication >> mode (so I don't have streaming replication set up) which I chose >> beccause it gives me synchronous replication as well as automatic >> failover. I am trying to understand the recovery process so I can use > it >> to set up pgpools' online recovery feature. > Oh, you didn't say that it is about pgpool. > > You might try to ask their mailing lists: > http://www.pgpool.net/mediawiki/index.php/Mailing_lists > > Yours, > Laurenz Albe Although pgpool is involved this isn't actually about pgpool, I've been running through the recovery process manually to try and understand what needs to be done in order to get onlinve recovery working with pgpool. Pgpool isn't actually running at the moment. Anyway, I think what I had misunderstood was the meaning of the 'standby_mode' parameter in recovery.conf. If I remove that then the process behaves as I expect it to except that the restoring server ends up restoring to a new timeline, I would prefer that it be on the same timeline as the master, I have set recovery_target_timeline = 'latest' in recovery.conf but this still increments the timeline. Is there any way to get the recovery to stay on the same timeline other than explicitly specifying the timeline? Thanks Alex
Alex Good wrote: > Although pgpool is involved this isn't actually about pgpool, I've been > running through the recovery process manually to try and understand what > needs to be done in order to get onlinve recovery working with pgpool. > Pgpool isn't actually running at the moment. Oh, I see. > Anyway, I think what I had misunderstood was the meaning of the > 'standby_mode' parameter in recovery.conf. If I remove that then the > process behaves as I expect it to except that the restoring server ends > up restoring to a new timeline, I would prefer that it be on the same > timeline as the master, I have set recovery_target_timeline = 'latest' > in recovery.conf but this still increments the timeline. Is there any > way to get the recovery to stay on the same timeline other than > explicitly specifying the timeline? That's why I asked if this is about streaming replication. It is by design that a new timeline is opened after recovery. This is to tell the WAL sequence from before and after recovery apart. Is it a problem for you? Yours, Laurenz Albe
On 13/06/12 11:10, Albe Laurenz wrote: > Alex Good wrote: >> Although pgpool is involved this isn't actually about pgpool, I've > been >> running through the recovery process manually to try and understand > what >> needs to be done in order to get onlinve recovery working with pgpool. >> Pgpool isn't actually running at the moment. > Oh, I see. > >> Anyway, I think what I had misunderstood was the meaning of the >> 'standby_mode' parameter in recovery.conf. If I remove that then the >> process behaves as I expect it to except that the restoring server > ends >> up restoring to a new timeline, I would prefer that it be on the same >> timeline as the master, I have set recovery_target_timeline = 'latest' >> in recovery.conf but this still increments the timeline. Is there any >> way to get the recovery to stay on the same timeline other than >> explicitly specifying the timeline? > That's why I asked if this is about streaming replication. > > It is by design that a new timeline is opened after recovery. > This is to tell the WAL sequence from before and after recovery apart. > Is it a problem for you? > > Yours, > Laurenz Albe Well I had assumed that it was a bad thing as the way I am intending to use the recovery procedure is to add backup servers to the pgpool cluster and it seemed to make more sense that they all be on the same timeline. Having thought about it though I don't think it matters, thanks very much for your help, I've been banging my head against this for a while. Thanks Alex Good
On 13/06/12 09:10, Albe Laurenz wrote: > Alex Good wrote: >> I have a simple setup with one master and one backup server. I have an >> issue where I have performed a backup and copied it to the data >> directory for the slave, written a recovery.conf and copied in the >> backup_label file and then started the server, it happily restores >> everything up until and including the WAL file mentioned in the >> backup_label and then attempts to obtain the next archive file which > has >> not yet been archived. I can't for the life of me figure out what is >> going on. > What else would you expect? > > Are you planning to use streaming replication? > > If yes, what are your configuration parameters for replication? > > Yours, > Laurenz Albe What I expected to see was the server requesting each WAL file up until the one which was archived during pg_stop_backup and then the server would consider itself to be recovered. Clearly I have misunderstood something here. These two servers are actually sat behind pgpool which is in replication mode (so I don't have streaming replication set up) which I chose beccause it gives me synchronous replication as well as automatic failover. I am trying to understand the recovery process so I can use it to set up pgpools' online recovery feature. Thanks Alex Good Alex