Thread: recovery question

recovery question

From
"Mark Steben"
Date:

Hi listers,

 

Here is my problem.  I am running PITR restore on a machine remote from my production machine.

I’m shipping logs over there, compressed, then uncompressing them and copying them to pg_xlog.

Everything works fine until a network outage creates a gap in my logs.

The recovery terminates at log  "0000000100000C28000000B1" and brings the database up

Because it can’t find "0000000100000C28000000B2”.

Log "0000000100000C28000000B3” is copied over but I wish to restart recovery at B2.

So I scp B2 over from my primary machine from a folder that I created for just such an occasion.

 

Now I rename recovery.done to recovery.conf  (Copied here for your convenience)

 

'sh /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log'

 

(and copy.sh:)

 

REQ_FILE=$1

DEST=$2

LF="${REQ_FILE}.lock"

SUFFIX=${REQ_FILE##*.}

###############################################################

## check if file is transaction log or informational file

## if transaction log, cat from archlog and uncompress into unzipped folder

## if informational simply copy into unzipped folder (it came over uncompressed)

#####################################################################################

if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then

  cat "/logs/var/backups/archlog/${REQ_FILE}" | gzip -dc  > "/logs/var/backups/unzipped/${REQ_FILE}"

  if [ "$?" = "0" ] ;

  then

     echo 'successful uncompress of  ' "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log

  else

     echo 'unsuccessful uncompress of  ' "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log

     echo 'the return code is ' "$?" >> /tmp/restore.mavmail.log

  fi

else

  cp "/logs/var/backups/archlog/${REQ_FILE}"  "/logs/var/backups/unzipped/${REQ_FILE}"

fi

#######################################################################################

##  check for size.  If not a full size (16777216) trans log, the copy from

##   cobra is still in progress. Don't copy this file. Stop recovery here.

#######################################################################################

SIZE=$(ls -gG1 "/logs/var/backups/unzipped/${REQ_FILE}" | awk '{ print $3}' )

echo "The size of the log to be restored is " "${SIZE}" >> /tmp/restore.mavmail.log

if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then

  if [ "${SIZE}" != '16777216' ]; then

    echo 'partially written log - not restored - finishing recovery' >> /tmp/restore.mavmail.log

    exit 0

  fi

fi

     

/usr/bin/lockfile "${LF}" 

################################################################

## copy either full sized trans log or informational file

## into pg_xlog data cluster.

################################################################

 cp "/logs/var/backups/unzipped/${REQ_FILE}"  "${DEST}"

rm -f "${LF}"

rm "/logs/var/backups/unzipped/${REQ_FILE}"

 

(END)

 

Now when I try to restart, hoping to begin recovery with the C2 log I get an invalid checkpoint error:

 

: LOG:  starting archive recovery

Feb 25 10:08:10 ar-db3 postgres[32538]: [3-1] @: LOG:  restore_command = "sh /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log"

Feb 25 10:08:11 ar-db3 postgres[32538]: [4-1] @: LOG:  restored log file "0000000100000C28000000B1" from archive

Feb 25 10:08:11 ar-db3 postgres[32538]: [5-1] @: LOG:  invalid record length at C28/B1FFECA4

Feb 25 10:08:11 ar-db3 postgres[32538]: [6-1] @: LOG:  invalid primary checkpoint record

Feb 25 10:08:12 ar-db3 postgres[32538]: [7-1] @: LOG:  restored log file "0000000100000C28000000B1" from archive

Feb 25 10:08:12 ar-db3 postgres[32538]: [8-1] @: LOG:  invalid record length at C28/B1FFEC5C

Feb 25 10:08:12 ar-db3 postgres[32538]: [9-1] @: LOG:  invalid secondary checkpoint record

Feb 25 10:08:12 ar-db3 postgres[32538]: [10-1] @: PANIC:  could not locate a valid checkpoint record

Feb 25 10:08:12 ar-db3 postgres[32537]: [1-1] @: LOG:  startup process (PID 32538) was terminated by signal 6

Feb 25 10:08:12 ar-db3 postgres[32537]: [2-1] @: LOG:  aborting startup due to startup process failure

 

I remove the recovery.conf file, successfully start the database and issue a checkpoint.  I try the restore again and get the same error.

 

So, is there a way that I can force the recovery to begin at B2 or am I dead in the water and need to bring in another full file copy and

Start from scratch:

 

Thanks for your time.

 

Mark StebenDatabase Administrator

@utoRevenue­®­ "Join the Revenue-tion"
95 Ashley Ave. West Springfield, MA., 01089 
413-243-4800 x1512 (Phone) 
│ 413-732-1824 (Fax)

@utoRevenue is a registered trademark and a division of Dominion Enterprises

 

 

Re: recovery question

From
Lee Azzarello
Date:
Is 0000000100000C28000000B1 the same size as the other segments?

-lee

2009/2/25 Mark Steben <msteben@autorevenue.com>:
> Hi listers,
>
>
>
> Here is my problem.  I am running PITR restore on a machine remote from my
> production machine.
>
> I'm shipping logs over there, compressed, then uncompressing them and
> copying them to pg_xlog.
>
> Everything works fine until a network outage creates a gap in my logs.
>
> The recovery terminates at log  "0000000100000C28000000B1" and brings the
> database up
>
> Because it can't find "0000000100000C28000000B2".
>
> Log "0000000100000C28000000B3" is copied over but I wish to restart recovery
> at B2.
>
> So I scp B2 over from my primary machine from a folder that I created for
> just such an occasion.
>
>
>
> Now I rename recovery.done to recovery.conf  (Copied here for your
> convenience)
>
>
>
> 'sh /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log'
>
>
>
> (and copy.sh:)
>
>
>
> REQ_FILE=$1
>
> DEST=$2
>
> LF="${REQ_FILE}.lock"
>
> SUFFIX=${REQ_FILE##*.}
>
> ###############################################################
>
> ## check if file is transaction log or informational file
>
> ## if transaction log, cat from archlog and uncompress into unzipped folder
>
> ## if informational simply copy into unzipped folder (it came over
> uncompressed)
>
> #####################################################################################
>
> if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then
>
>   cat "/logs/var/backups/archlog/${REQ_FILE}" | gzip -dc  >
> "/logs/var/backups/unzipped/${REQ_FILE}"
>
>   if [ "$?" = "0" ] ;
>
>   then
>
>      echo 'successful uncompress of  '
> "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log
>
>   else
>
>      echo 'unsuccessful uncompress of  '
> "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log
>
>      echo 'the return code is ' "$?" >> /tmp/restore.mavmail.log
>
>   fi
>
> else
>
>   cp "/logs/var/backups/archlog/${REQ_FILE}"
> "/logs/var/backups/unzipped/${REQ_FILE}"
>
> fi
>
> #######################################################################################
>
> ##  check for size.  If not a full size (16777216) trans log, the copy from
>
> ##   cobra is still in progress. Don't copy this file. Stop recovery here.
>
> #######################################################################################
>
> SIZE=$(ls -gG1 "/logs/var/backups/unzipped/${REQ_FILE}" | awk '{ print $3}'
> )
>
> echo "The size of the log to be restored is " "${SIZE}" >>
> /tmp/restore.mavmail.log
>
> if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then
>
>   if [ "${SIZE}" != '16777216' ]; then
>
>     echo 'partially written log - not restored - finishing recovery' >>
> /tmp/restore.mavmail.log
>
>     exit 0
>
>   fi
>
> fi
>
>
>
> /usr/bin/lockfile "${LF}"
>
> ################################################################
>
> ## copy either full sized trans log or informational file
>
> ## into pg_xlog data cluster.
>
> ################################################################
>
>  cp "/logs/var/backups/unzipped/${REQ_FILE}"  "${DEST}"
>
> rm -f "${LF}"
>
> rm "/logs/var/backups/unzipped/${REQ_FILE}"
>
>
>
> (END)
>
>
>
> Now when I try to restart, hoping to begin recovery with the C2 log I get an
> invalid checkpoint error:
>
>
>
> : LOG:  starting archive recovery
>
> Feb 25 10:08:10 ar-db3 postgres[32538]: [3-1] @: LOG:  restore_command = "sh
> /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log"
>
> Feb 25 10:08:11 ar-db3 postgres[32538]: [4-1] @: LOG:  restored log file
> "0000000100000C28000000B1" from archive
>
> Feb 25 10:08:11 ar-db3 postgres[32538]: [5-1] @: LOG:  invalid record length
> at C28/B1FFECA4
>
> Feb 25 10:08:11 ar-db3 postgres[32538]: [6-1] @: LOG:  invalid primary
> checkpoint record
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [7-1] @: LOG:  restored log file
> "0000000100000C28000000B1" from archive
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [8-1] @: LOG:  invalid record length
> at C28/B1FFEC5C
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [9-1] @: LOG:  invalid secondary
> checkpoint record
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [10-1] @: PANIC:  could not locate a
> valid checkpoint record
>
> Feb 25 10:08:12 ar-db3 postgres[32537]: [1-1] @: LOG:  startup process (PID
> 32538) was terminated by signal 6
>
> Feb 25 10:08:12 ar-db3 postgres[32537]: [2-1] @: LOG:  aborting startup due
> to startup process failure
>
>
>
> I remove the recovery.conf file, successfully start the database and issue a
> checkpoint.  I try the restore again and get the same error.
>
>
>
> So, is there a way that I can force the recovery to begin at B2 or am I dead
> in the water and need to bring in another full file copy and
>
> Start from scratch:
>
>
>
> Thanks for your time.
>
>
>
> Mark Steben│Database Administrator│
>
> @utoRevenue-(R)- "Join the Revenue-tion"
> 95 Ashley Ave. West Springfield, MA., 01089
> 413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax)
>
> @utoRevenue is a registered trademark and a division of Dominion Enterprises
>
>
>
>

Re: recovery question

From
"Mark Steben"
Date:
Hi Lee, just got your reply.

Every segment comes over compressed (gzip). So every segment would be a
Different size in the compressed folder.  But we decompress it into another
folder (gzip) and they always decompress into the standard 16 meg size when
We copy them back into xlog.
 So 0000000100000C28000000B1 came into xlog as 16777216, just like the
others.
  Thanks for the response.


Mark Steben│Database Administrator│
@utoRevenue-R- "Join the Revenue-tion"
95 Ashley Ave. West Springfield, MA., 01089
413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax)
@utoRevenue is a registered trademark and a division of Dominion Enterprises




-----Original Message-----
From: Lee Azzarello [mailto:lee@dropio.com]
Sent: Wednesday, February 25, 2009 10:40 AM
To: pgsql-admin@postgresql.org
Subject: Re: recovery question

Is 0000000100000C28000000B1 the same size as the other segments?

-lee

2009/2/25 Mark Steben <msteben@autorevenue.com>:
> Hi listers,
>
>
>
> Here is my problem.  I am running PITR restore on a machine remote from my
> production machine.
>
> I'm shipping logs over there, compressed, then uncompressing them and
> copying them to pg_xlog.
>
> Everything works fine until a network outage creates a gap in my logs.
>
> The recovery terminates at log  "0000000100000C28000000B1" and brings the
> database up
>
> Because it can't find "0000000100000C28000000B2".
>
> Log "0000000100000C28000000B3" is copied over but I wish to restart
recovery
> at B2.
>
> So I scp B2 over from my primary machine from a folder that I created for
> just such an occasion.
>
>
>
> Now I rename recovery.done to recovery.conf  (Copied here for your
> convenience)
>
>
>
> 'sh /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log'
>
>
>
> (and copy.sh:)
>
>
>
> REQ_FILE=$1
>
> DEST=$2
>
> LF="${REQ_FILE}.lock"
>
> SUFFIX=${REQ_FILE##*.}
>
> ###############################################################
>
> ## check if file is transaction log or informational file
>
> ## if transaction log, cat from archlog and uncompress into unzipped
folder
>
> ## if informational simply copy into unzipped folder (it came over
> uncompressed)
>
>
############################################################################
#########
>
> if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then
>
>   cat "/logs/var/backups/archlog/${REQ_FILE}" | gzip -dc  >
> "/logs/var/backups/unzipped/${REQ_FILE}"
>
>   if [ "$?" = "0" ] ;
>
>   then
>
>      echo 'successful uncompress of  '
> "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log
>
>   else
>
>      echo 'unsuccessful uncompress of  '
> "/logs/var/backups/unzipped/${REQ_FILE}" >> /tmp/restore.mavmail.log
>
>      echo 'the return code is ' "$?" >> /tmp/restore.mavmail.log
>
>   fi
>
> else
>
>   cp "/logs/var/backups/archlog/${REQ_FILE}"
> "/logs/var/backups/unzipped/${REQ_FILE}"
>
> fi
>
>
############################################################################
###########
>
> ##  check for size.  If not a full size (16777216) trans log, the copy
from
>
> ##   cobra is still in progress. Don't copy this file. Stop recovery here.
>
>
############################################################################
###########
>
> SIZE=$(ls -gG1 "/logs/var/backups/unzipped/${REQ_FILE}" | awk '{ print
$3}'
> )
>
> echo "The size of the log to be restored is " "${SIZE}" >>
> /tmp/restore.mavmail.log
>
> if [ "${SUFFIX}" != 'history' ] && [ "${SUFFIX}" != 'backup' ]; then
>
>   if [ "${SIZE}" != '16777216' ]; then
>
>     echo 'partially written log - not restored - finishing recovery' >>
> /tmp/restore.mavmail.log
>
>     exit 0
>
>   fi
>
> fi
>
>
>
> /usr/bin/lockfile "${LF}"
>
> ################################################################
>
> ## copy either full sized trans log or informational file
>
> ## into pg_xlog data cluster.
>
> ################################################################
>
>  cp "/logs/var/backups/unzipped/${REQ_FILE}"  "${DEST}"
>
> rm -f "${LF}"
>
> rm "/logs/var/backups/unzipped/${REQ_FILE}"
>
>
>
> (END)
>
>
>
> Now when I try to restart, hoping to begin recovery with the C2 log I get
an
> invalid checkpoint error:
>
>
>
> : LOG:  starting archive recovery
>
> Feb 25 10:08:10 ar-db3 postgres[32538]: [3-1] @: LOG:  restore_command =
"sh
> /usr/local/postgresql-8.2.5/bin/copy.sh %f %p 2>>/tmp/recovery.log"
>
> Feb 25 10:08:11 ar-db3 postgres[32538]: [4-1] @: LOG:  restored log file
> "0000000100000C28000000B1" from archive
>
> Feb 25 10:08:11 ar-db3 postgres[32538]: [5-1] @: LOG:  invalid record
length
> at C28/B1FFECA4
>
> Feb 25 10:08:11 ar-db3 postgres[32538]: [6-1] @: LOG:  invalid primary
> checkpoint record
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [7-1] @: LOG:  restored log file
> "0000000100000C28000000B1" from archive
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [8-1] @: LOG:  invalid record
length
> at C28/B1FFEC5C
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [9-1] @: LOG:  invalid secondary
> checkpoint record
>
> Feb 25 10:08:12 ar-db3 postgres[32538]: [10-1] @: PANIC:  could not locate
a
> valid checkpoint record
>
> Feb 25 10:08:12 ar-db3 postgres[32537]: [1-1] @: LOG:  startup process
(PID
> 32538) was terminated by signal 6
>
> Feb 25 10:08:12 ar-db3 postgres[32537]: [2-1] @: LOG:  aborting startup
due
> to startup process failure
>
>
>
> I remove the recovery.conf file, successfully start the database and issue
a
> checkpoint.  I try the restore again and get the same error.
>
>
>
> So, is there a way that I can force the recovery to begin at B2 or am I
dead
> in the water and need to bring in another full file copy and
>
> Start from scratch:
>
>
>
> Thanks for your time.
>
>
>
> Mark Steben│Database Administrator│
>
> @utoRevenue-(R)- "Join the Revenue-tion"
> 95 Ashley Ave. West Springfield, MA., 01089
> 413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax)
>
> @utoRevenue is a registered trademark and a division of Dominion
Enterprises
>
>
>
>