Problem with streaming replication, backups, and recovery (9.0.x) - Mailing list pgsql-hackers

From hubert depesz lubaczewski
Subject Problem with streaming replication, backups, and recovery (9.0.x)
Date
Msg-id 20110325203120.GA5946@depesz.com
Whole thread Raw
Responses Re: Problem with streaming replication, backups, and recovery (9.0.x)
List pgsql-hackers
hi,

So, I hit a strange problem with Streaming Replication, that I cannot explain.

Executive summary:

when using hot backup made on straming replication slave, sometimes
(depending on load) generated backup is created in such a way, that while it
can be brough back as standalone Pg, and it can be brough back as streaming
slave, such slave (created off the backup) cannot be promoted to standalone.

Disclaimer:

I know that making hot backups on slave is not the suggested way, yet I was
doing it without any problem on earlier Postgres versions (8.2,8.3,8.4), and do
not have this problem with backups generated from the masters, so the problem I
hit now is so" peculiar, that I thought that it might be just an effect of some
underlying, more serious, condition.

Longer explanation:

First, let me explain how omnipitr-backup-slave works, because it's the tool
that I use to make backups on slave.

Steps that it does:

1. gets pg_controldata for $PGDATADIR
2. compresses $PGDATA to data tar.gz, putting inside backup_label file,  which contains:   START WAL LOCATION: %s (file
%s)  CHECKPOINT LOCATION: %s   START TIME: %s   LABEL: OmniPITR_Slave_Hot_Backup  where START WAL LOCATION uses value
from"Latest checkpoint's REDO  location" from pg_controldata from step #1, "CHECKPOINT LOCATION" is  taken from "Latest
checkpointlocation" from pg_controldata taken in step  #1, and START TIME is based on current (before starting
compressionof  $PGDATA) timestamp.
 
3. gets another copy of pg_controldata for $PGDATA
4. repeats step #3 until value in "Latest checkpoint location" will change
5. wait until file that contains WAL location, from "Minimum recovery ending  location" from pg_controldata from step
#4,will be available
 
6. creates .backup file which is named based on "START WAL  LOCATION" (from step #2), and contains the same lines as
backup_labelfile  from step #2, plus two more lines:   STOP WAL LOCATION: %s (file %s)   STOP TIME: %s  where STOP WAL
LOCATIONis taken from "Minimum recovery ending location"  from pg_controldata from step #4, and STOP time is current
timestampas  of before starting compression of wal segments.
 
7. compresses xlogs plus the .backup file generated in step #6.

This approach worked for a long time on various hosts, systems, versions, etc.

But now, it fails.

I'm using for tests PostgreSQL 9.0.2 and 9.0.3 (mostly 9.0.2 as this is the
most critical for me, but I tested on 9.0.3 too, and the problem is the same),
on linux (ubuntu), 64bit.  

I do the procedure as always, and it produces backup. With this backup I can
setup new standalone server, and it works.

I can also setup streaming slave, and it also works, but when I create
trigger file to promote this slave to master it fails with error:
2011-03-24 21:01:58.051 CET @ 9680  LOG:  trigger file found: /home/depesz/slave2/finish.recovery
2011-03-24 21:01:58.051 CET @ 9930  FATAL:  terminating walreceiver process due to administrator command
2011-03-24 21:01:58.151 CET @ 9680  LOG:  redo done at 0/1F000058
2011-03-24 21:01:58.151 CET @ 9680  LOG:  last completed transaction was at log time 2011-03-24 20:58:25.836333+01
2011-03-24 21:01:58.238 CET @ 9680  FATAL:  WAL ends before consistent recovery point

Which is interesting, because this particular backup was done using .backup file containing:

START WAL LOCATION: 0/A000020 (file 00000001000000000000000A)
STOP WAL LOCATION: 0/12C9D7E8 (file 000000010000000000000012)
CHECKPOINT LOCATION: 0/B803050
START TIME: 2011-03-24 20:52:46 CET
STOP TIME: 2011-03-24 20:53:41 CET
LABEL: OmniPITR_Slave_Hot_Backup

Which means that minimum recovery ending location was in fact reached (it was
on 0/12C9D7E8, and recovery continued till 0/1F000058).

I have set of script that can be used to replicate the problem, but the test
takes some time (~ 30 minutes).

What's most interesting is that this problem does not happen always. It
happens only when there was non-trivial load on db server - this is in my tests
where both master and slave are the same machine. I think that in normal cases
load on slave is more important.

If anyone would be able to help, I can give you access to test machine
and/or provide set of script which replicate (usually) the problem.

Alternatively - if there is anything I can do to help you solve the mystery
- I'd be very willing to.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
                  http://depesz.com/
 


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Pre-set Hint bits/VACUUM FREEZE on data load..?
Next
From: Stephen Frost
Date:
Subject: Re: Transactional DDL, but not Serializable