Core dump on PG 7.1.3 - Mailing list pgsql-general
From | David Esposito |
---|---|
Subject | Core dump on PG 7.1.3 |
Date | |
Msg-id | PEEDKNLDICKECFBNGNLLOEHMEOAA.dvesposito@newnetco.com Whole thread Raw |
Responses |
Re: Core dump on PG 7.1.3
|
List | pgsql-general |
Hello, I have pretty serious problem ... ~~~~~~~~~~~ Summary: On our production database, we are consistently getting core dumps when trying to execute a particular sequence of queries. ~~~~~~~~~~~ Platform: Redhat 7.1 (the kernel appears to be 2.4.2-2) Postgres 7.1.3 (SELECT version() reports "PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC 2.96") ~~~~~~~~~~~ Here is the actual query that causes the core dump ... It is basically trying to update ~20k rows UPDATE campaign_email SET campaign_email_status = 2, campaign_email_last_modified = message_queue.last_modification FROM message_queue, message_queue_purge_lock WHERE message_queue.message_id = message_queue_purge_lock.message_id AND message_queue.campaign_email_id = campaign_email.campaign_email_id; ~~~~~~~~~~~ Relevant schema (if you want the entire schema, i can send that along too ... we have FK constraints setup) Table "message_queue" Attribute | Type | Modifier -------------------------+--------------------------+---------------------- message_id | integer | not null status | character(1) | not null default 'N' lock_holder_session_id | character varying(16) | campaign_id | integer | not null campaign_email_id | integer | not null Index: message_queue_pkey Table "message_queue_purge_lock" Attribute | Type | Modifier ------------+---------+---------- message_id | integer | not null Index: message_queue_purge_lock_pkey Table "campaign_email" Attribute | Type | Modifier ------------------------------+--------------------------+---------- campaign_email_id | numeric(11,0) | not null campaign_id | numeric(11,0) | not null campaign_email_status | numeric(2,0) | not null campaign_email_last_modified | timestamp with time zone | not null campaign_email_detail | character varying(500) | impression_date | timestamp with time zone | Indices: campaign_email_bounce_date_idx, campaign_email_optout_date_idx, campaign_email_pkey, email_campaign_idx, email_campaign_patron_unq, email_patron_idx ~~~~~~~~~~~ The Postgres log file reports the following: Server process (pid 24542) exited with status 139 at Mon Apr 1 11:40:20 2002 Terminating any active server processes... NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. The Data Base System is in recovery mode Server processes were terminated at Mon Apr 1 11:40:20 2002 Reinitializing shared memory and semaphores DEBUG: database system was interrupted at 2002-04-01 11:40:12 EST DEBUG: CheckPoint record at (7, 3892464228) DEBUG: Redo record at (7, 3892371300); Undo record at (0, 0); Shutdown FALSE DEBUG: NextTransactionId: 14376178; NextOid: 7673608 DEBUG: database system was not properly shut down; automatic recovery in progress... DEBUG: redo starts at (7, 3892371300) The Data Base System is starting up DEBUG: ReadRecord: record with zero len at (7, 3939513968) DEBUG: redo done at (7, 3939513900) The Data Base System is starting up The Data Base System is starting up The Data Base System is starting up DEBUG: MoveOfflineLogs: remove 00000007000000E7 The Data Base System is starting up DEBUG: database system is in production state ~~~~~~~~~~~ I didn't compile Postgres with debug symbols turned on so the backtrace from the core dump is pretty useless looking ore was generated by `postgres: postgres bnmail [local] UPDATE '. Program terminated with signal 11, Segmentation fault. #0 0x080b995f in ?? () (gdb) bt #0 0x080b995f in ?? () #1 0x080ba4e5 in ?? () #2 0x080b9f37 in ?? () #3 0x080b9fb0 in ?? () #4 0x080ba1a1 in ?? () #5 0x080ba561 in ?? () #6 0x080b9f37 in ?? () #7 0x080b9fb0 in ?? () #8 0x080ba14d in ?? () #9 0x080ba555 in ?? () #10 0x080ba666 in ?? () #11 0x080bdfbd in ?? () #12 0x080bab13 in ?? () #13 0x080be194 in ?? () #14 0x080b928d in ?? () #15 0x080bfa08 in ?? () #16 0x080b92c9 in ?? () #17 0x080b8f16 in ?? () #18 0x080b8eee in ?? () #19 0x080b8847 in ?? () #20 0x080b84e1 in ?? () #21 0x080b79a8 in ?? () #22 0x080fc8d7 in ?? () #23 0x080fb3e6 in ?? () #24 0x080fc3ce in ?? () #25 0x080e7430 in ?? () #26 0x080e7023 in ?? () #27 0x080e628d in ?? () #28 0x080e5cd8 in ?? () #29 0x080c743f in ?? () #30 0x400ed306 in ?? () ~~~~~~~~~~~ Here's the ultimate irony ... I cannot reproduce this bug on our non-production database ... that means that I am unable to do the standard experimentation that would further help diagnose this problem ... Reducing the size of the data set, reducing the complexity of the tables, removing the FK constraints ... all of those things are not an option on production data ... However, since this problem needs to be fixed, I see myself being able to do one of two things ... - Recompiling PG 7.1.3 using the debug symbols .. - Upgrading to PG 7.2 Is there anything I'm missing here that I could try that is non-invasive? Thanks in advance -Dave
pgsql-general by date: