Thread: PANIC: unable to locate a valid checkpoint record

PANIC: unable to locate a valid checkpoint record

From
Ganesan R
Date:
Hi,

I am using 7.3.2. postmaster prints this on starting up:

----
LOG:  database system was interrupted at 2003-02-20 14:23:36 IST
LOG:  ReadRecord: bad resource manager data checksum in record at 0/E42144
LOG:  invalid primary checkpoint record
LOG:  ReadRecord: bad resource manager data checksum in record at 0/E42104
LOG:  invalid secondary checkpoint record
PANIC:  unable to locate a valid checkpoint record
LOG:  startup process (pid 7530) was terminated by signal 6
LOG:  aborting startup due to startup process failure
----

pg_resetxlog is able to recover from the problem; but I am concerned because
I can reproduce the scenario very easily. I originally encountered the
problem in 7.2.1; tried upgrading to 7.2.4 and now 7.3.2 and this scenario
happens for every version.

The scenario is like this;

I have an application that is doing database updates using JDBC. I do a
kill -9 on postmaster. The application detects that postmaster is down
and restarts it; I do kill -9 on postmaster. After a couple of such
forced crashes postmaster refuses to come up. The application uses a
PostgreSQL 7.2 JDBC2 driver.

I wrote a python application and tried to recreate the problem but wasn't
successful. However, I can consistently reproduce the problem with the
Java application. Any suggestions on how I can proceed?

Please CC me on any replies; I am not (yet) subscribed to the lists. Thanks.

Ganesan

Re: PANIC: unable to locate a valid checkpoint record

From
Tom Lane
Date:
Ganesan R <rganesan@myrealbox.com> writes:
> I am using 7.3.2. postmaster prints this on starting up:

> LOG:  ReadRecord: bad resource manager data checksum in record at 0/E42144

> pg_resetxlog is able to recover from the problem; but I am concerned because
> I can reproduce the scenario very easily.

You should definitely be concerned :-(.  It sounds like the CRC code
isn't working at all on your platform.  What is your platform --- what
hardware, what OS, which C compiler?  How did you configure and install
Postgres?

            regards, tom lane

Re: PANIC: unable to locate a valid checkpoint record

From
Ganesan R
Date:
On Thu, Feb 20, 2003 at 11:07:55PM -0500, Tom Lane wrote:
> Ganesan R <rganesan@myrealbox.com> writes:
> > I am using 7.3.2. postmaster prints this on starting up:
>
> > LOG:  ReadRecord: bad resource manager data checksum in record at 0/E42144
>
> > pg_resetxlog is able to recover from the problem; but I am concerned because
> > I can reproduce the scenario very easily.
>
> You should definitely be concerned :-(.  It sounds like the CRC code
> isn't working at all on your platform.  What is your platform --- what
> hardware, what OS, which C compiler?  How did you configure and install
> Postgres?

The hardware is an IBM xSeries 340 with a single Xeon Pentium IV 2.40GHz
processor with dual mirrored SCSI disks. The OS is Redhat Linux 7.3 with a
Linux 2.4.18 SMP kernel (the CPU supports hyperthreading). PostgreSQL
binaries were precompiled. PostgreSQL 7.2.1 version is a redhat build
shippping with Redhat Linux 7.3. PostgresQL 7.2.4 and 7.3.2 binaries were
directly downloaded from the ftp mirrors.

Please let me know if you need additional information.

Ganesan