Strange Postgresql crash - Mailing list pgsql-general

From Eric Rousse
Subject Strange Postgresql crash
Date
Msg-id 455CA524.40108@telmatik.com
Whole thread Raw
Responses Re: Strange Postgresql crash
Re: Strange Postgresql crash
List pgsql-general
Hello all,

I've been experiencing strange crash, never really took care of it since
it was happening only every 1-2 months or so. But lately, I've seen it a
lot in the past week and I have no clue about it, other than the backups.

So, here's some info about it and about my machine:

When: it crashes at night, at around 4AM, during the backup:

00 3 * * * root /export/dbsystem/pg_backup.sh va > /dev/null 2>&1
00 4 * * * root /export/dbsystem/pg_backup.sh b > /dev/null 2>&1

I move the vacuum to another time, just to make sure they are not in
conflict, who knows!

Which version: 7.3.16, I used the tar.gz version from the website.


Normally, in a crash the machine just hangs on a kernel panic. The
person on site always reboot the machine before taking a look at it. But
I never had any crash during the day or almost, maybe once, and the
kernel panic was talking about APCI. But during the night I'm not sure
if it's the same thing, I think I'll just disable the APCI from the
kernel and see if its okay.

Anyway, here's a quick log around 4AM, it doesn't say much...
10.1.1.54, is our monitoring machine, it only to the port using telnet.

"2006-11-16 03:55:39 [8681]   LOG:  pq_recvbuf: unexpected EOF on client
connection
2006-11-16 03:55:39 [8681]   LOG:  incomplete startup packet
2006-11-16 03:56:39 [8682]   LOG:  connection received: host=10.1.1.54
port=4754
2006-11-16 03:56:39 [8682]   LOG:  pq_recvbuf: unexpected EOF on client
connection
2006-11-16 03:56:39 [8682]   LOG:  incomplete startup packet
2006-11-16 03:57:39 [8684]   LOG:  connection received: host=10.1.1.54
port=4775
2006-11-16 03:57:39 [8684]   LOG:  pq_recvbuf: unexpected EOF on client
connection
2006-11-16 03:57:39 [8684]   LOG:  incomplete startup packet
2006-11-16 03:58:39 [8685]   LOG:  connection received: host=10.1.1.54
port=4828
2006-11-16 03:58:39 [8685]   LOG:  pq_recvbuf: unexpected EOF on client
connection
2006-11-16 03:58:39 [8685]   LOG:  incomplete startup packet
2006-11-16 03:59:24 [8132]   ERROR:  parser: parse error at or near
"WHEREligneid" at character 72
2006-11-16 03:59:24 [8132]   LOG:  statement: Update Appels Set
controller=4506500413, agentassignedligne='1012261'  WHEREligneid=4506500420
2006-11-16 03:59:49 [8686]   LOG:  connection received: host=10.1.1.54
port=4872
2006-11-16 03:59:50 [8686]   LOG:  pq_recvbuf: unexpected EOF on client
connection
2006-11-16 03:59:50 [8686]   LOG:  incomplete startup packet
2006-11-16 04:00:02 [8702]   LOG:  connection received: host=10.1.1.45
port=50457
2006-11-16 04:00:02 [8702]   LOG:  connection authorized: user=postgres
database=template1
2006-11-16 04:00:02 [8726]   LOG:  connection received: host=10.1.1.45
port=50458
2006-11-16 04:00:02 [8726]   LOG:  connection authorized: user=postgres
database=martin_test
2006-11-16 04:00:29 [8744]   LOG:  connection received: host=10.1.1.45
port=50459
2006-11-16 04:00:29 [8744]   LOG:  connection authorized: user=postgres
database=test
2006-11-16 04:00:29 [8762]   LOG:  connection received: host=10.1.1.45
port=50460
2006-11-16 04:00:29 [8762]   LOG:  connection authorized: user=postgres
database=wincentrex
2006-11-16 04:00:39 [8763]   LOG:  connection received: host=10.1.1.54
port=4894
2006-11-16 04:00:40 [8763]   LOG:  pq_recvbuf: unexpected EOF on client
connection
2006-11-16 04:00:40 [8763]   LOG:  incomplete startup packet
2006-11-16 04:02:26 [2534]   LOG:  database system was interrupted at
2006-11-16 03:57:36 EST
2006-11-16 04:02:26 [2534]   LOG:  checkpoint record is at C/6733EB68
2006-11-16 04:02:26 [2534]   LOG:  redo record is at C/6733EB68; undo
record is at 0/0; shutdown FALSE
2006-11-16 04:02:26 [2534]   LOG:  next transaction id: 2720349894; next
oid: 14377807
2006-11-16 04:02:26 [2534]   LOG:  database system was not properly shut
down; automatic recovery in progress
2006-11-16 04:02:26 [2534]   LOG:  redo starts at C/6733EBA8
2006-11-16 04:02:27 [2534]   LOG:  ReadRecord: record with zero length
at C/6735AB44
2006-11-16 04:02:27 [2534]   LOG:  redo done at C/6735AB20
2006-11-16 04:02:30 [2534]   LOG:  database system is ready"


Here's our active settings in postgresql.conf:

tcpip_socket = true
max_connections = 64
port = 5432
hostname_lookup = false
shared_buffers = 1520   # min max_connections*2 or 16, 8KB each
#shared_buffers = 12288  # min max_connections*2 or 16, 8KB each
max_fsm_relations = 1000        # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 10000           # min 1000, fsm is free space map, ~6 bytes
max_locks_per_transaction = 64  # min 10
wal_buffers = 8         # min 4, typically 8KB each
sort_mem = 32168                # min 64, size in KB
fsync = false
enable_seqscan = true
enable_indexscan = true
enable_tidscan = true
enable_sort = true
enable_nestloop = true
enable_mergejoin = true
enable_hashjoin = true

effective_cache_size =8000      # typically 8KB each
random_page_cost = 4            # units are one sequential page fetch cost
cpu_tuple_cost = 0.01           # (same)
cpu_index_tuple_cost = 0.001    # (same)
cpu_operator_cost = 0.0025      # (same)
log_connections = false
log_pid = true
log_statement = false
log_duration = false
log_timestamp = true
log_min_error_statement = notice # Values in order of increasing severity:
                                 #   debug5, debug4, debug3, debug2, debug1,
                                 #   info, notice, warning, error,
panic(off)
syslog = 0                      # range 0-2
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
LC_MESSAGES = 'en_US'
LC_MONETARY = 'en_US'
LC_NUMERIC = 'en_US'
LC_TIME = 'en_US'

I tested my memory with memtest, and it's perfect. I also did some
stress test within Linux, using stress and donnie++ to see if it would
crash with APCI or not, while doing a dump... So far its okay.

The machine: Linux aquilonII 2.6.17-1.2142_FC4 #1 Tue Jul 11 22:41:14
EDT 2006 i686 i686 i386 GNU/Linux

Any one has a suggestion ?

--
Eric Rousse
514-655-1001

Telmatik inc.
204 Montarville, suite 250
Boucherville, QC, Canada
J4B 6S2

www.telmatik.com



pgsql-general by date:

Previous
From: garry saddington
Date:
Subject: odd result set
Next
From: "Ardian Xharra"
Date:
Subject: Re: Why the data changes it's value by itself!