Thread: PANIC: unable to locate a valid checkpoint record

PANIC: unable to locate a valid checkpoint record

From
Ganesan R
Date:
Hi,

I am using 7.3.2. postmaster prints this on starting up:

----
LOG:  database system was interrupted at 2003-02-20 14:23:36 IST
LOG:  ReadRecord: bad resource manager data checksum in record at 0/E42144
LOG:  invalid primary checkpoint record
LOG:  ReadRecord: bad resource manager data checksum in record at 0/E42104
LOG:  invalid secondary checkpoint record
PANIC:  unable to locate a valid checkpoint record
LOG:  startup process (pid 7530) was terminated by signal 6
LOG:  aborting startup due to startup process failure
----

pg_resetxlog is able to recover from the problem; but I am concerned because
I can reproduce the scenario very easily. I originally encountered the
problem in 7.2.1; tried upgrading to 7.2.4 and now 7.3.2 and this scenario
happens for every version.

The scenario is like this;

I have an application that is doing database updates using JDBC. I do a
kill -9 on postmaster. The application detects that postmaster is down
and restarts it; I do kill -9 on postmaster. After a couple of such
forced crashes postmaster refuses to come up. The application uses a
PostgreSQL 7.2 JDBC2 driver.

I wrote a python application and tried to recreate the problem but wasn't
successful. However, I can consistently reproduce the problem with the
Java application. Any suggestions on how I can proceed?

Please CC me on any replies; I am not (yet) subscribed to the lists. Thanks.

Ganesan

Re: [ADMIN] PANIC: unable to locate a valid checkpoint record

From
Tom Lane
Date:
Ganesan R <rganesan@myrealbox.com> writes:
> I am using 7.3.2. postmaster prints this on starting up:

> LOG:  ReadRecord: bad resource manager data checksum in record at 0/E42144

> pg_resetxlog is able to recover from the problem; but I am concerned because
> I can reproduce the scenario very easily.

You should definitely be concerned :-(.  It sounds like the CRC code
isn't working at all on your platform.  What is your platform --- what
hardware, what OS, which C compiler?  How did you configure and install
Postgres?

            regards, tom lane

Re: [ADMIN] PANIC: unable to locate a valid checkpoint record

From
Tom Lane
Date:
Ganesan R <rganesan@myrealbox.com> writes:
> We've able recreate the problem on another platform;

It seems pretty dang odd that you should be able to reproduce the
problem on two different platforms, when no one else has reported it
at all.  Can you think of anything unusual that might be shared by
these two installations?

            regards, tom lane

Re: [ADMIN] PANIC: unable to locate a valid checkpoint record

From
Ganesan R
Date:
On Thu, Feb 20, 2003 at 11:07:55PM -0500, Tom Lane wrote:
> Ganesan R <rganesan@myrealbox.com> writes:
> > I am using 7.3.2. postmaster prints this on starting up:
>
> > LOG:  ReadRecord: bad resource manager data checksum in record at 0/E42144
>
> > pg_resetxlog is able to recover from the problem; but I am concerned because
> > I can reproduce the scenario very easily.
>
> You should definitely be concerned :-(.  It sounds like the CRC code
> isn't working at all on your platform.  What is your platform --- what
> hardware, what OS, which C compiler?  How did you configure and install
> Postgres?
>

Hi,

We've able recreate the problem on another platform; this time a
DELL PowerEdge 1650. See
http://www.dell.com/us/en/esg/topics/esg_pedge_rackmain_servers_1_pedge_1650.htm
The configuration is pretty much identical (Single Pentium III 1133MHz CPU,
dual mirrored SCSI drives, Redhat 7.3 running kernel 2.4.18). Please let me
know if you need additional information. Thank you.

Ganesan

Re: [ADMIN] PANIC: unable to locate a valid checkpoint record

From
Ganesan R
Date:
On Thu, Feb 20, 2003 at 11:07:55PM -0500, Tom Lane wrote:
> Ganesan R <rganesan@myrealbox.com> writes:
> > I am using 7.3.2. postmaster prints this on starting up:
>
> > LOG:  ReadRecord: bad resource manager data checksum in record at 0/E42144
>
> > pg_resetxlog is able to recover from the problem; but I am concerned because
> > I can reproduce the scenario very easily.
>
> You should definitely be concerned :-(.  It sounds like the CRC code
> isn't working at all on your platform.  What is your platform --- what
> hardware, what OS, which C compiler?  How did you configure and install
> Postgres?

The hardware is an IBM xSeries 340 with a single Xeon Pentium IV 2.40GHz
processor with dual mirrored SCSI disks. The OS is Redhat Linux 7.3 with a
Linux 2.4.18 SMP kernel (the CPU supports hyperthreading). PostgreSQL
binaries were precompiled. PostgreSQL 7.2.1 version is a redhat build
shippping with Redhat Linux 7.3. PostgresQL 7.2.4 and 7.3.2 binaries were
directly downloaded from the ftp mirrors.

Please let me know if you need additional information.

Ganesan