Strange Postgresql crash - Mailing list pgsql-general
From | Eric Rousse |
---|---|
Subject | Strange Postgresql crash |
Date | |
Msg-id | 455CA524.40108@telmatik.com Whole thread Raw |
Responses |
Re: Strange Postgresql crash
Re: Strange Postgresql crash |
List | pgsql-general |
Hello all, I've been experiencing strange crash, never really took care of it since it was happening only every 1-2 months or so. But lately, I've seen it a lot in the past week and I have no clue about it, other than the backups. So, here's some info about it and about my machine: When: it crashes at night, at around 4AM, during the backup: 00 3 * * * root /export/dbsystem/pg_backup.sh va > /dev/null 2>&1 00 4 * * * root /export/dbsystem/pg_backup.sh b > /dev/null 2>&1 I move the vacuum to another time, just to make sure they are not in conflict, who knows! Which version: 7.3.16, I used the tar.gz version from the website. Normally, in a crash the machine just hangs on a kernel panic. The person on site always reboot the machine before taking a look at it. But I never had any crash during the day or almost, maybe once, and the kernel panic was talking about APCI. But during the night I'm not sure if it's the same thing, I think I'll just disable the APCI from the kernel and see if its okay. Anyway, here's a quick log around 4AM, it doesn't say much... 10.1.1.54, is our monitoring machine, it only to the port using telnet. "2006-11-16 03:55:39 [8681] LOG: pq_recvbuf: unexpected EOF on client connection 2006-11-16 03:55:39 [8681] LOG: incomplete startup packet 2006-11-16 03:56:39 [8682] LOG: connection received: host=10.1.1.54 port=4754 2006-11-16 03:56:39 [8682] LOG: pq_recvbuf: unexpected EOF on client connection 2006-11-16 03:56:39 [8682] LOG: incomplete startup packet 2006-11-16 03:57:39 [8684] LOG: connection received: host=10.1.1.54 port=4775 2006-11-16 03:57:39 [8684] LOG: pq_recvbuf: unexpected EOF on client connection 2006-11-16 03:57:39 [8684] LOG: incomplete startup packet 2006-11-16 03:58:39 [8685] LOG: connection received: host=10.1.1.54 port=4828 2006-11-16 03:58:39 [8685] LOG: pq_recvbuf: unexpected EOF on client connection 2006-11-16 03:58:39 [8685] LOG: incomplete startup packet 2006-11-16 03:59:24 [8132] ERROR: parser: parse error at or near "WHEREligneid" at character 72 2006-11-16 03:59:24 [8132] LOG: statement: Update Appels Set controller=4506500413, agentassignedligne='1012261' WHEREligneid=4506500420 2006-11-16 03:59:49 [8686] LOG: connection received: host=10.1.1.54 port=4872 2006-11-16 03:59:50 [8686] LOG: pq_recvbuf: unexpected EOF on client connection 2006-11-16 03:59:50 [8686] LOG: incomplete startup packet 2006-11-16 04:00:02 [8702] LOG: connection received: host=10.1.1.45 port=50457 2006-11-16 04:00:02 [8702] LOG: connection authorized: user=postgres database=template1 2006-11-16 04:00:02 [8726] LOG: connection received: host=10.1.1.45 port=50458 2006-11-16 04:00:02 [8726] LOG: connection authorized: user=postgres database=martin_test 2006-11-16 04:00:29 [8744] LOG: connection received: host=10.1.1.45 port=50459 2006-11-16 04:00:29 [8744] LOG: connection authorized: user=postgres database=test 2006-11-16 04:00:29 [8762] LOG: connection received: host=10.1.1.45 port=50460 2006-11-16 04:00:29 [8762] LOG: connection authorized: user=postgres database=wincentrex 2006-11-16 04:00:39 [8763] LOG: connection received: host=10.1.1.54 port=4894 2006-11-16 04:00:40 [8763] LOG: pq_recvbuf: unexpected EOF on client connection 2006-11-16 04:00:40 [8763] LOG: incomplete startup packet 2006-11-16 04:02:26 [2534] LOG: database system was interrupted at 2006-11-16 03:57:36 EST 2006-11-16 04:02:26 [2534] LOG: checkpoint record is at C/6733EB68 2006-11-16 04:02:26 [2534] LOG: redo record is at C/6733EB68; undo record is at 0/0; shutdown FALSE 2006-11-16 04:02:26 [2534] LOG: next transaction id: 2720349894; next oid: 14377807 2006-11-16 04:02:26 [2534] LOG: database system was not properly shut down; automatic recovery in progress 2006-11-16 04:02:26 [2534] LOG: redo starts at C/6733EBA8 2006-11-16 04:02:27 [2534] LOG: ReadRecord: record with zero length at C/6735AB44 2006-11-16 04:02:27 [2534] LOG: redo done at C/6735AB20 2006-11-16 04:02:30 [2534] LOG: database system is ready" Here's our active settings in postgresql.conf: tcpip_socket = true max_connections = 64 port = 5432 hostname_lookup = false shared_buffers = 1520 # min max_connections*2 or 16, 8KB each #shared_buffers = 12288 # min max_connections*2 or 16, 8KB each max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes max_locks_per_transaction = 64 # min 10 wal_buffers = 8 # min 4, typically 8KB each sort_mem = 32168 # min 64, size in KB fsync = false enable_seqscan = true enable_indexscan = true enable_tidscan = true enable_sort = true enable_nestloop = true enable_mergejoin = true enable_hashjoin = true effective_cache_size =8000 # typically 8KB each random_page_cost = 4 # units are one sequential page fetch cost cpu_tuple_cost = 0.01 # (same) cpu_index_tuple_cost = 0.001 # (same) cpu_operator_cost = 0.0025 # (same) log_connections = false log_pid = true log_statement = false log_duration = false log_timestamp = true log_min_error_statement = notice # Values in order of increasing severity: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, panic(off) syslog = 0 # range 0-2 syslog_facility = 'LOCAL0' syslog_ident = 'postgres' LC_MESSAGES = 'en_US' LC_MONETARY = 'en_US' LC_NUMERIC = 'en_US' LC_TIME = 'en_US' I tested my memory with memtest, and it's perfect. I also did some stress test within Linux, using stress and donnie++ to see if it would crash with APCI or not, while doing a dump... So far its okay. The machine: Linux aquilonII 2.6.17-1.2142_FC4 #1 Tue Jul 11 22:41:14 EDT 2006 i686 i686 i386 GNU/Linux Any one has a suggestion ? -- Eric Rousse 514-655-1001 Telmatik inc. 204 Montarville, suite 250 Boucherville, QC, Canada J4B 6S2 www.telmatik.com
pgsql-general by date: