Thread: help with error "unexpected pageaddr"
Hey everyone,
We have a PG 8.3.7 server that is doing WAL log shipping to 2 other servers that are remote mirrors. This has been working well for almost two years. Last night we did some massive data and structure changes to one of our databases. Since then I get these errors on the two mirrors:
2010-09-15 08:35:05 EDT: LOG: restored log file "0000000100000301000000D9" from archive
2010-09-15 08:35:27 EDT: LOG: restored log file "0000000100000301000000DA" from archive
2010-09-15 08:35:40 EDT: LOG: restored log file "0000000100000301000000DB" from archive
2010-09-15 08:35:40 EDT: LOG: unexpected pageaddr 301/47000000 in log file 769, segment 219, offset 0
2010-09-15 08:35:40 EDT: LOG: redo done at 301/DA370780
2010-09-15 08:35:40 EDT: LOG: last completed transaction was at log time 2010-09-15 08:30:01.24936-04
2010-09-15 08:35:40 EDT: LOG: restored log file "0000000100000301000000DA" from archive
2010-09-15 08:36:26 EDT: LOG: selected new timeline ID: 2
2010-09-15 08:37:11 EDT: LOG: archive recovery complete
I've taken two separate file level backups and tried to restart the mirrors, and every time on both servers I get a similar error message. I seem to recall reading that it may have something to do with corruption in the timeline, which is why it's jumping to a new timeline ID.
1. Can anyone tell me what this means?
2. Is there some corruption in the database?
3. If so, is there an easy way to fix it?
Also, one additional question. I don't have a 00001.history file which makes the PITRTools complain constantly. Is there any way to regenerate this file?
Any help would be much appreciated. I'm rather worried that I've got corruption, and not having the mirrors running puts us at risk for data loss.
"Scot Kreienkamp" <SKreien@la-z-boy.com> writes: > We have a PG 8.3.7 server that is doing WAL log shipping to 2 other > servers that are remote mirrors. This has been working well for almost > two years. Last night we did some massive data and structure changes to > one of our databases. Since then I get these errors on the two mirrors: > 2010-09-15 08:35:05 EDT: LOG: restored log file > "0000000100000301000000D9" from archive > 2010-09-15 08:35:27 EDT: LOG: restored log file > "0000000100000301000000DA" from archive > 2010-09-15 08:35:40 EDT: LOG: restored log file > "0000000100000301000000DB" from archive > 2010-09-15 08:35:40 EDT: LOG: unexpected pageaddr 301/47000000 in log > file 769, segment 219, offset 0 This appears to indicate that you archived the wrong contents of log file 0000000100000301000000DB. If you don't still have the correct contents on the master, I think the only way to recover is to take a fresh base backup so you can make the slaves roll forward from a point later than this log segment. There's no reason to suppose that there's data corruption on the master, just bad data in the WAL archive. You'd probably be well advised to look closely at your WAL archiving script to see if it has any race conditions that might be triggered by very fast generation of WAL. > Also, one additional question. I don't have a 00001.history file which > makes the PITRTools complain constantly. Is there any way to regenerate > this file? Just ignore that, it's cosmetic (the file isn't supposed to exist). regards, tom lane
"Scot Kreienkamp" <SKreien@la-z-boy.com> writes: > I tried to take a new base backup about 45 minutes ago. The master has > rolled forward a number of WAL files since I last tried, but it still > fails. > LOG: restored log file "0000000100000301000000FE" from archive > LOG: restored log file "000000010000030200000000" from archive > LOG: restored log file "000000010000030200000001" from archive > LOG: restored log file "000000010000030200000002" from archive > LOG: restored log file "000000010000030200000003" from archive > LOG: unexpected pageaddr 301/50000000 in log file 770, segment 3, > offset 0 Hmmm ... is it possible that your WAL archive contains log files numbered higher than where your master is? regards, tom lane
"Scot Kreienkamp" <SKreien@la-z-boy.com> writes: > We have a PG 8.3.7 server that is doing WAL log shipping to 2 other > servers that are remote mirrors. This has been working well for almost > two years. Last night we did some massive data and structure changes to > one of our databases. Since then I get these errors on the two mirrors: > 2010-09-15 08:35:05 EDT: LOG: restored log file > "0000000100000301000000D9" from archive > 2010-09-15 08:35:27 EDT: LOG: restored log file > "0000000100000301000000DA" from archive > 2010-09-15 08:35:40 EDT: LOG: restored log file > "0000000100000301000000DB" from archive > 2010-09-15 08:35:40 EDT: LOG: unexpected pageaddr 301/47000000 in log > file 769, segment 219, offset 0 This appears to indicate that you archived the wrong contents of log file 0000000100000301000000DB. If you don't still have the correct contents on the master, I think the only way to recover is to take a fresh base backup so you can make the slaves roll forward from a point later than this log segment. There's no reason to suppose that there's data corruption on the master, just bad data in the WAL archive. You'd probably be well advised to look closely at your WAL archiving script to see if it has any race conditions that might be triggered by very fast generation of WAL. > Also, one additional question. I don't have a 00001.history file which > makes the PITRTools complain constantly. Is there any way to regenerate > this file? Just ignore that, it's cosmetic (the file isn't supposed to exist). regards, tom lane Tom, I tried to take a new base backup about 45 minutes ago. The master has rolled forward a number of WAL files since I last tried, but it still fails. LOG: restored log file "0000000100000301000000FE" from archive LOG: restored log file "000000010000030200000000" from archive LOG: restored log file "000000010000030200000001" from archive LOG: restored log file "000000010000030200000002" from archive LOG: restored log file "000000010000030200000003" from archive LOG: unexpected pageaddr 301/50000000 in log file 770, segment 3, offset 0 LOG: redo done at 302/2BCE828 LOG: last completed transaction was at log time 2010-09-15 15:07:01.040854-04 LOG: restored log file "000000010000030200000002" from archive LOG: selected new timeline ID: 2 My entire WAL archiving script is 4 cp %p %f commands. It's so short I don't even have a script, it's directly in the postgresql.conf archive command.
Shouldn't have, the only thing we did to the server was restart it and run our database queries. Clearing out all the wal files from pg_xlog along with a new base backup did fix it though. Thanks for the help Tom! Scot Kreienkamp skreien@la-z-boy.com