Thread: postgres crash

postgres crash

From

mattfairley@netscape.net

Date:

03 August 2011, 14:49:16

Hi,

I am a complete newb with postgres. In fact, I don't even use it as such but have a problem with a crashed postgresql db associated with using Calendar Server on Mac OS X 10.6 on a Mac Mini. To cut a long story short, I had to reformat my hard drive and restore form a Time Machine backup. Now, when I try to start the calendar server, I get a crash message in the console log and some of the following in the postgres log. There are no dates / times in the log file so I don't know which entries relate to which activities - sorry!

FATAL: the database system is starting up

LOG: startup process (PID 313) was terminated by signal 6: Abort trap

LOG: aborting startup due to startup process failure

LOG: database system was interrupted; last known up at 2011-07-15 07:48:24 BST

LOG: unexpected pageaddr 0/2058000 in log file 0, segment 4, offset 360448

LOG: invalid primary checkpoint record

LOG: unexpected pageaddr 0/2056000 in log file 0, segment 4, offset 352256

LOG: invalid secondary checkpoint record

PANIC: could not locate a valid checkpoint record

FATAL: the database system is starting up

LOG: startup process (PID 646) was terminated by signal 6: Abort trap

LOG: aborting startup due to startup process failure

I've tried using pg_resetxlog with the -f switch but I don't really know what I'm doing.

I've also asked the Calendar Server mailing list but have been requested to come to this forum....

Any hints / tips please?

Many thanks,

Matt

Re: postgres crash

From

Tom Lane

Date:

03 August 2011, 15:27:56

mattfairley@netscape.net writes:
> I am a complete newb with postgres. In fact, I don't even use it as
> such but have a problem with a crashed postgresql db associated with
> using Calendar Server on Mac OS X 10.6 on a Mac Mini. To cut a long
> story short, I had to reformat my hard drive and restore form a Time
> Machine backup.

Yeah, that's not exactly the approved way to back up a Postgres
instance; you're likely to get a collection of files that are somewhat
out-of-sync with each other, which seems to be exactly what this is
about:

> LOG:  database system was interrupted; last known up at 2011-07-15 07:48:24 BST
> LOG:  unexpected pageaddr 0/2058000 in log file 0, segment 4, offset 360448
> LOG:  invalid primary checkpoint record
> LOG:  unexpected pageaddr 0/2056000 in log file 0, segment 4, offset 352256
> LOG:  invalid secondary checkpoint record
> PANIC:  could not locate a valid checkpoint record

> I've tried using pg_resetxlog with the -f switch but I don't really know what I'm doing.

pg_resetxlog is pretty much the only way out, given that you don't have
any other form of backup.  But you haven't shown us exactly what you did
or exactly how it failed.

The man page for pg_resetxlog is reasonably thorough, did you read it?
http://www.postgresql.org/docs/9.0/static/app-pgresetxlog.html

(Once you do get it to start again, you'll at minimum want to do a
complete database reindex, and ideally a dump/re-initdb/reload, to try
to make sure there's not lingering database corruption.)

            regards, tom lane

Re: postgres crash

From

Matthew Fairley

Date:

03 August 2011, 17:35:03

Thanks for the reply Tom - I'll give the man page a read. A couple of comments below, just to clarify though.

Regards,

Matt

Sent from my iPhone

On 3 Aug 2011, at 19:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> mattfairley@netscape.net writes:
>> I am a complete newb with postgres. In fact, I don't even use it as
>> such but have a problem with a crashed postgresql db associated with
>> using Calendar Server on Mac OS X 10.6 on a Mac Mini. To cut a long
>> story short, I had to reformat my hard drive and restore form a Time
>> Machine backup.
>
> Yeah, that's not exactly the approved way to back up a Postgres
> instance; you're likely to get a collection of files that are somewhat
> out-of-sync with each other, which seems to be exactly what this is
> about:
>

To clarify - the whole hard drive was restored from the Time Machine back up, not just the db, so I would have thought
itwould have been ok, that things would have been in sync, rather than out of sync. However, due to hard drive
corruption,is it possible that the restored database is screwy? From what I can gather things went a bit wrong with the
harddrive about a week before I noticed it.  

>> LOG:  database system was interrupted; last known up at 2011-07-15 07:48:24 BST
>> LOG:  unexpected pageaddr 0/2058000 in log file 0, segment 4, offset 360448
>> LOG:  invalid primary checkpoint record
>> LOG:  unexpected pageaddr 0/2056000 in log file 0, segment 4, offset 352256
>> LOG:  invalid secondary checkpoint record
>> PANIC:  could not locate a valid checkpoint record
>
>> I've tried using pg_resetxlog with the -f switch but I don't really know what I'm doing.
>
> pg_resetxlog is pretty much the only way out, given that you don't have
> any other form of backup.  But you haven't shown us exactly what you did
> or exactly how it failed.
>
From memory, I ran:

pg_resetxlog -f /usr/local/pgsql/data

Was that right?
> The man page for pg_resetxlog is reasonably thorough, did you read it?
> http://www.postgresql.org/docs/9.0/static/app-pgresetxlog.html
>
> (Once you do get it to start again, you'll at minimum want to do a
> complete database reindex, and ideally a dump/re-initdb/reload, to try
> to make sure there's not lingering database corruption.)
>

How do I do all that?

>            regards, tom lane

Re: postgres crash

From

Michael Wood

Date:

04 August 2011, 05:39:51

Hi

On 3 August 2011 22:15, Matthew Fairley <mattfairley@netscape.net> wrote:
> Thanks for the reply Tom - I'll give the man page a read. A couple of comments below, just to clarify though.
>
> Regards,
>
> Matt
>
> Sent from my iPhone
>
> On 3 Aug 2011, at 19:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> mattfairley@netscape.net writes:
>>> I am a complete newb with postgres. In fact, I don't even use it as
>>> such but have a problem with a crashed postgresql db associated with
>>> using Calendar Server on Mac OS X 10.6 on a Mac Mini. To cut a long
>>> story short, I had to reformat my hard drive and restore form a Time
>>> Machine backup.
>>
>> Yeah, that's not exactly the approved way to back up a Postgres
>> instance; you're likely to get a collection of files that are somewhat
>> out-of-sync with each other, which seems to be exactly what this is
>> about:
>
> To clarify - the whole hard drive was restored from the Time Machine back up, not just the db, so I would have
thoughtit would have been ok, that things would have been in sync, rather than out of sync. 

That is most likely not good enough.  If Time Machine creates a
snapshot of the whole hard drive before copying the files to the
backup location then it would be almost OK (as if you pulled the power
cable out the back of the machine while it was running.)  I don't
think Time Machine does that, though.  So while it's backing up one of
Postgres' files, the others could still be modified.  Then while it
backs up the next one, again others might be modified.  So they can be
out of sync with each other as Tom says.

> However, due to hard drive corruption, is it possible that the restored database is screwy?

Yes, but as mentioned above this could also just be as a result of the
way Time Machine does the backups.

> From what I can gather things went a bit wrong with the hard drive about a week before I noticed it.

[...]
>>> I've tried using pg_resetxlog with the -f switch but I don't really know what I'm doing.
>>
>> pg_resetxlog is pretty much the only way out, given that you don't have
>> any other form of backup.  But you haven't shown us exactly what you did
>> or exactly how it failed.
>>
> From memory, I ran:
>
> pg_resetxlog -f /usr/local/pgsql/data
>
> Was that right?
[...]

You will need to use some of the other options mentioned in the
documentation.  Have a look at that and then ask again if you don't
understand the documentation.

--
Michael Wood <esiotrot@gmail.com>

Re: postgres crash

From

Tom Lane

Date:

04 August 2011, 11:35:45

Michael Wood <esiotrot@gmail.com> writes:
> On 3 August 2011 22:15, Matthew Fairley <mattfairley@netscape.net> wrote:
>> To clarify - the whole hard drive was restored from the Time Machine back up, not just the db, so I would have
thoughtit would have been ok, that things would have been in sync, rather than out of sync. 

> That is most likely not good enough.  If Time Machine creates a
> snapshot of the whole hard drive before copying the files to the
> backup location then it would be almost OK (as if you pulled the power
> cable out the back of the machine while it was running.)  I don't
> think Time Machine does that, though.  So while it's backing up one of
> Postgres' files, the others could still be modified.  Then while it
> backs up the next one, again others might be modified.  So they can be
> out of sync with each other as Tom says.

If you've ever watched Time Machine do its thing, you'll notice that it
actually makes two passes over your drive per backup session --- the
second pass copies any files that changed during the first pass.  So
it's not even trying to deliver an exact single-instant filesystem
snapshot.  (I doubt OS X has support for such a thing anyway.)

>> From memory, I ran:
>> pg_resetxlog -f /usr/local/pgsql/data
>> Was that right?

> You will need to use some of the other options mentioned in the
> documentation.  Have a look at that and then ask again if you don't
> understand the documentation.

He may or may not need any other options.  -f might not have been a good
idea though ... it would have been nice to see the output from
pg_resetxlog before that.  For that matter, we still haven't seen what
happened after trying pg_resetxlog.

            regards, tom lane