recovery question - Mailing list pgsql-admin

From Mark Steben
Subject recovery question
Date
Msg-id 88E7EEC199DB4809971DA47B6F801D30@dei26g028534
Whole thread Raw
Responses Re: recovery question
List pgsql-admin

Hi listers,

 

Here is my problem.  I am running PITR restore on a machine remote from my production machine.

I’m shipping logs over there, compressed, then uncompressing them and copying them to pg_xlog.

Everything works fine until a network outage creates a gap in my logs.

The recovery terminates at log  "0000000100000C28000000B1" and brings the database up

Because it can’t find "0000000100000C28000000B2”.

Log "0000000100000C28000000B3” is copied over but I wish to restart recovery at B2.

So I scp B2 over from my primary machine from a folder that I created for just such an occasion.

 

Now I rename recovery.done to recovery.conf  (Copied here for your convenience)

 

'sh /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log'

 

(and copy.sh:)

 

REQ_FILE=$1

DEST=$2

LF="${REQ_FILE}.lock"

SUFFIX=${REQ_FILE##*.}

###############################################################

## check if file is transaction log or informational file

## if transaction log, cat from archlog and uncompress into unzipped folder

## if informational simply copy into unzipped folder (it came over uncompressed)

#####################################################################################

if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then

  cat "/logs/var/backups/archlog/${REQ_FILE}" | gzip -dc  > "/logs/var/backups/unzipped/${REQ_FILE}"

  if [ "$?" = "0" ] ;

  then

     echo 'successful uncompress of  ' "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log

  else

     echo 'unsuccessful uncompress of  ' "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log

     echo 'the return code is ' "$?" >> /tmp/restore.mavmail.log

  fi

else

  cp "/logs/var/backups/archlog/${REQ_FILE}"  "/logs/var/backups/unzipped/${REQ_FILE}"

fi

#######################################################################################

##  check for size.  If not a full size (16777216) trans log, the copy from

##   cobra is still in progress. Don't copy this file. Stop recovery here.

#######################################################################################

SIZE=$(ls -gG1 "/logs/var/backups/unzipped/${REQ_FILE}" | awk '{ print $3}' )

echo "The size of the log to be restored is " "${SIZE}" >> /tmp/restore.mavmail.log

if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then

  if [ "${SIZE}" != '16777216' ]; then

    echo 'partially written log - not restored - finishing recovery' >> /tmp/restore.mavmail.log

    exit 0

  fi

fi

     

/usr/bin/lockfile "${LF}" 

################################################################

## copy either full sized trans log or informational file

## into pg_xlog data cluster.

################################################################

 cp "/logs/var/backups/unzipped/${REQ_FILE}"  "${DEST}"

rm -f "${LF}"

rm "/logs/var/backups/unzipped/${REQ_FILE}"

 

(END)

 

Now when I try to restart, hoping to begin recovery with the C2 log I get an invalid checkpoint error:

 

: LOG:  starting archive recovery

Feb 25 10:08:10 ar-db3 postgres[32538]: [3-1] @: LOG:  restore_command = "sh /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log"

Feb 25 10:08:11 ar-db3 postgres[32538]: [4-1] @: LOG:  restored log file "0000000100000C28000000B1" from archive

Feb 25 10:08:11 ar-db3 postgres[32538]: [5-1] @: LOG:  invalid record length at C28/B1FFECA4

Feb 25 10:08:11 ar-db3 postgres[32538]: [6-1] @: LOG:  invalid primary checkpoint record

Feb 25 10:08:12 ar-db3 postgres[32538]: [7-1] @: LOG:  restored log file "0000000100000C28000000B1" from archive

Feb 25 10:08:12 ar-db3 postgres[32538]: [8-1] @: LOG:  invalid record length at C28/B1FFEC5C

Feb 25 10:08:12 ar-db3 postgres[32538]: [9-1] @: LOG:  invalid secondary checkpoint record

Feb 25 10:08:12 ar-db3 postgres[32538]: [10-1] @: PANIC:  could not locate a valid checkpoint record

Feb 25 10:08:12 ar-db3 postgres[32537]: [1-1] @: LOG:  startup process (PID 32538) was terminated by signal 6

Feb 25 10:08:12 ar-db3 postgres[32537]: [2-1] @: LOG:  aborting startup due to startup process failure

 

I remove the recovery.conf file, successfully start the database and issue a checkpoint.  I try the restore again and get the same error.

 

So, is there a way that I can force the recovery to begin at B2 or am I dead in the water and need to bring in another full file copy and

Start from scratch:

 

Thanks for your time.

 

Mark StebenDatabase Administrator

@utoRevenue­®­ "Join the Revenue-tion"
95 Ashley Ave. West Springfield, MA., 01089 
413-243-4800 x1512 (Phone) 
│ 413-732-1824 (Fax)

@utoRevenue is a registered trademark and a division of Dominion Enterprises

 

 

pgsql-admin by date:

Previous
From: "Daniel J. Summers"
Date:
Subject: Re: "like" and index
Next
From: Lee Azzarello
Date:
Subject: Re: recovery question