recovery question - Mailing list pgsql-admin
From | Mark Steben |
---|---|
Subject | recovery question |
Date | |
Msg-id | 88E7EEC199DB4809971DA47B6F801D30@dei26g028534 Whole thread Raw |
Responses |
Re: recovery question
|
List | pgsql-admin |
Hi listers,
Here is my problem. I am running PITR restore on a machine remote from my production machine.
I’m shipping logs over there, compressed, then uncompressing them and copying them to pg_xlog.
Everything works fine until a network outage creates a gap in my logs.
The recovery terminates at log "0000000100000C28000000B1" and brings the database up
Because it can’t find "0000000100000C28000000B2”.
Log "0000000100000C28000000B3” is copied over but I wish to restart recovery at B2.
So I scp B2 over from my primary machine from a folder that I created for just such an occasion.
Now I rename recovery.done to recovery.conf (Copied here for your convenience)
'sh /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log'
(and copy.sh:)
REQ_FILE=$1
DEST=$2
LF="${REQ_FILE}.lock"
SUFFIX=${REQ_FILE##*.}
###############################################################
## check if file is transaction log or informational file
## if transaction log, cat from archlog and uncompress into unzipped folder
## if informational simply copy into unzipped folder (it came over uncompressed)
#####################################################################################
if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then
cat "/logs/var/backups/archlog/${REQ_FILE}" | gzip -dc > "/logs/var/backups/unzipped/${REQ_FILE}"
if [ "$?" = "0" ] ;
then
echo 'successful uncompress of ' "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log
else
echo 'unsuccessful uncompress of ' "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log
echo 'the return code is ' "$?" >> /tmp/restore.mavmail.log
fi
else
cp "/logs/var/backups/archlog/${REQ_FILE}" "/logs/var/backups/unzipped/${REQ_FILE}"
fi
#######################################################################################
## check for size. If not a full size (16777216) trans log, the copy from
## cobra is still in progress. Don't copy this file. Stop recovery here.
#######################################################################################
SIZE=$(ls -gG1 "/logs/var/backups/unzipped/${REQ_FILE}" | awk '{ print $3}' )
echo "The size of the log to be restored is " "${SIZE}" >> /tmp/restore.mavmail.log
if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then
if [ "${SIZE}" != '16777216' ]; then
echo 'partially written log - not restored - finishing recovery' >> /tmp/restore.mavmail.log
exit 0
fi
fi
/usr/bin/lockfile "${LF}"
################################################################
## copy either full sized trans log or informational file
## into pg_xlog data cluster.
################################################################
cp "/logs/var/backups/unzipped/${REQ_FILE}" "${DEST}"
rm -f "${LF}"
rm "/logs/var/backups/unzipped/${REQ_FILE}"
(END)
Now when I try to restart, hoping to begin recovery with the C2 log I get an invalid checkpoint error:
: LOG: starting archive recovery
Feb 25 10:08:10 ar-db3 postgres[32538]: [3-1] @: LOG: restore_command = "sh /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log"
Feb 25 10:08:11 ar-db3 postgres[32538]: [4-1] @: LOG: restored log file "0000000100000C28000000B1" from archive
Feb 25 10:08:11 ar-db3 postgres[32538]: [5-1] @: LOG: invalid record length at C28/B1FFECA4
Feb 25 10:08:11 ar-db3 postgres[32538]: [6-1] @: LOG: invalid primary checkpoint record
Feb 25 10:08:12 ar-db3 postgres[32538]: [7-1] @: LOG: restored log file "0000000100000C28000000B1" from archive
Feb 25 10:08:12 ar-db3 postgres[32538]: [8-1] @: LOG: invalid record length at C28/B1FFEC5C
Feb 25 10:08:12 ar-db3 postgres[32538]: [9-1] @: LOG: invalid secondary checkpoint record
Feb 25 10:08:12 ar-db3 postgres[32538]: [10-1] @: PANIC: could not locate a valid checkpoint record
Feb 25 10:08:12 ar-db3 postgres[32537]: [1-1] @: LOG: startup process (PID 32538) was terminated by signal 6
Feb 25 10:08:12 ar-db3 postgres[32537]: [2-1] @: LOG: aborting startup due to startup process failure
I remove the recovery.conf file, successfully start the database and issue a checkpoint. I try the restore again and get the same error.
So, is there a way that I can force the recovery to begin at B2 or am I dead in the water and need to bring in another full file copy and
Start from scratch:
Thanks for your time.
Mark Steben│Database Administrator│
@utoRevenue® "Join the Revenue-tion"
95 Ashley Ave. West Springfield, MA., 01089
413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax)
@utoRevenue is a registered trademark and a division of Dominion Enterprises
pgsql-admin by date: