Thread: Crash with pg_clog file not found

Crash with pg_clog file not found

From
"Matthieu Roger"
Date:
Hello,

OS: Windows 2003 64bits
Cpu : 2x Opteron Dual core
Ram : 8Go Ram
Disk : Areca Raid10 6x 200Go sata
Postgresql 8.3.4

we're experiencing problems since some weeks with always the same
error (same transaction 0 and same pg_clog file 0000) :

2008-10-31 20:14:27 CET 10.0.0.119 exile PANIC:  could not access
status of transaction 0
2008-10-31 20:14:27 CET 10.0.0.119 exile DETAIL:  Could not open file
"pg_clog/0000": No such file or directory.
2008-10-31 20:14:27 CET 10.0.0.119 exile STATEMENT:  SELECT
sp_execute_processes();

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

2008-10-31 20:17:32 CET 10.0.0.119 exile FATAL:  connection limit
exceeded for non-superusers

[...]

We try to stop the service :

2008-10-31 20:39:30 CET   LOG:  received fast shutdown request
2008-10-31 20:39:30 CET   LOG:  aborting any active transactions
2008-10-31 20:39:30 CET 10.0.0.119 exile FATAL:  terminating
connection due to administrator command

Line above repeated again and again

2008-10-31 20:44:20 CET   LOG:  server process (PID 2772) exited with
exit code 3
2008-10-31 20:44:20 CET   LOG:  terminating any other active server processes
2008-10-31 20:44:20 CET 10.0.0.119 vitevendu WARNING:  terminating
connection because of crash of another server process
2008-10-31 20:44:20 CET 10.0.0.119 vitevendu DETAIL:  The postmaster
has commanded this server process to roll back the current transaction
and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2008-10-31 20:44:20 CET 10.0.0.119 vitevendu HINT:  In a moment you
should be able to reconnect to the database and repeat your command.

Last block above repeated with all bases many times, then :

2008-10-31 20:44:21 CET 10.0.0.119 exile FATAL:  the database system
is shutting down
[...]
2008-10-31 20:44:22 CET   LOG:  abnormal database system shutdown

Event log :
20:14:32 : Faulting application postgres.exe, version 8.3.4.8262,
faulting module postgres.exe, version 8.3.4.8262, fault address
0x0024a529.

When it crashes the service can't be restarted, no postmaster.pid file
to remove, it does not want to restart, we need to reboot server then
it recovers but we have strange data and duplicated content so we need
to recover from a backup.

We vacuum regularly, autovacuum is enabled, cpu usage is ok (<50%) as
well as mem usage. I set log to debug1.

I don't know what the problem can be :-/

New crash with debug1 (not much info) :

2008-11-03 01:15:45 CET 10.0.0.119 exile LOG:  00000: duration:
2688.000 ms  statement: SELECT sp_execute_processes();
2008-11-03 01:15:45 CET 10.0.0.119 exile LOCATION:  exec_simple_query,
.\src\backend\tcop\postgres.c:1063
2008-11-03 01:15:46 CET 10.0.0.119 exile PANIC:  58P01: could not
access status of transaction 0
2008-11-03 01:15:46 CET 10.0.0.119 exile DETAIL:  Could not open file
"pg_clog/0000": No such file or directory.
2008-11-03 01:15:46 CET 10.0.0.119 exile LOCATION:  SlruReportIOError,
.\src\backend\access\transam\slru.c:845
2008-11-03 01:15:46 CET 10.0.0.119 exile STATEMENT:  SELECT
sp_execute_processes();

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.


Matthieu

Re: Crash with pg_clog file not found

From
Rainer Bauer
Date:
"Matthieu Roger" wrote:

>2008-10-31 20:17:32 CET 10.0.0.119 exile FATAL:  connection limit
>exceeded for non-superusers

This is just a wild guess, but how many connections are there? Especially see

<http://wiki.postgresql.org/wiki/Running_%26_Installing_PostgreSQL_On_Native_Windows#I_cannot_run_with_more_than_about_125_connections_at_once.2C_despite_having_capable_hardware>

Rainer

Re: Crash with pg_clog file not found

From
"Matthieu Roger"
Date:
>>2008-10-31 20:17:32 CET 10.0.0.119 exile FATAL:  connection limit
>>exceeded for non-superusers
>
> This is just a wild guess, but how many connections are there? Especially see

This link is interesting, I'm reading about it and found a good
article on the msdn blog with a tool to check the desktop heap :
http://www.microsoft.com/downloads/details.aspx?familyid=5CFC9B74-97AA-4510-B4B9-B2DC98C8ED8B&displaylang=en

I will check usage, we have a number of postgres.exe running even if
we don't have many concurrent connections.

Matthieu

Re: Crash with pg_clog file not found

From
Tom Lane
Date:
"Matthieu Roger" <matthieu.roger@gene6.com> writes:
> 2008-10-31 20:14:27 CET 10.0.0.119 exile PANIC:  could not access
> status of transaction 0

Can you present a self-contained test case that causes that?

A stack trace from the PANIC would be mighty useful too, but I dunno
whether there's any point in asking for that from a Windows system.

            regards, tom lane

Re: Crash with pg_clog file not found

From
"Matthieu Roger"
Date:
2008/11/3 Tom Lane <tgl@sss.pgh.pa.us>:
> "Matthieu Roger" <matthieu.roger@gene6.com> writes:
>> 2008-10-31 20:14:27 CET 10.0.0.119 exile PANIC:  could not access
>> status of transaction 0
>
> Can you present a self-contained test case that causes that?

Hello Tom,

we were not able to reproduce at will :-( it seems to happen after
running for 1 week, sometimes less, always for the same base, I know
not helpful at all ...

> A stack trace from the PANIC would be mighty useful too, but I dunno
> whether there's any point in asking for that from a Windows system.

Anyway to do that under windows ?

8.3.3 was also producing the same error, prior version (8.3.1) did not
seem to exhibit it, though we've opened a new universe in the web game
which increased the number of accounts and players in september so
maybe this triggers the problem.

I know this is not much helping, if there is any debug build I could run ...

Matthieu

Re: Crash with pg_clog file not found

From
"Scott Marlowe"
Date:
On Mon, Nov 3, 2008 at 8:49 AM, Matthieu Roger <matthieu.roger@gene6.com> wrote:
>
> 8.3.3 was also producing the same error, prior version (8.3.1) did not
> seem to exhibit it, though we've opened a new universe in the web game
> which increased the number of accounts and players in september so
> maybe this triggers the problem.

I know you probably don't want to hear this right now, but PostgreSQL
can handle a much higher load under some flavor of unix than it can
under windows.  Luckily, it's pretty easy to set up a machine running
Centos5, Ubuntu 8.x or some other flavor of linux.  Due to basic
architectural differences, the difference isn't likely to change soon.