Thread: BUG #2858: postgres periodically restarts (problem with MemoryContextAllocZeroAligned)...
BUG #2858: postgres periodically restarts (problem with MemoryContextAllocZeroAligned)...
From
"Robert Locke"
Date:
The following bug has been logged online: Bug reference: 2858 Logged by: Robert Locke Email address: rob@mobius.ph PostgreSQL version: 8.1.4 Operating system: FreeBSD 6.1-RELEASE-p6 Description: postgres periodically restarts (problem with MemoryContextAllocZeroAligned)... Details: We recently began experiencing a problem with postgres where the server would periodically restart with messages such as the following in the LOG file: Dec 22 14:15:56 MOv2DB postgres[38675]: [100-1] WARNING: terminating connection because of crash of another server process Dec 22 14:15:56 MOv2DB postgres[38675]: [100-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server Dec 22 14:15:56 MOv2DB postgres[38675]: [100-3] process exited abnormally and possibly corrupted shared memory. Dec 22 14:15:56 MOv2DB postgres[38675]: [100-4] HINT: In a moment you should be able to reconnect to the database and repeat your command. "dmesg" would reveal errors such as: pid 34866 (postgres), uid 70: exited on signal 11 (core dumped) pid 43893 (postgres), uid 70: exited on signal 11 (core dumped) pid 43907 (postgres), uid 70: exited on signal 11 (core dumped) pid 46337 (postgres), uid 70: exited on signal 11 (core dumped) We enabled query logging and found that the process would sometimes die when a function called "removeAccount" was executed: 46337 2006-12-22 14:21:56 PHT 10.48.14.246 LOG: statement: SELECT * FROM core."removeAccount"(5130175) 45166 2006-12-22 14:21:59 PHT LOG: server process (PID 46337) was terminated by signal 11 This function simply executes a number of delete statements to remove a user from the system. We discovered, however, that it was a little slow (3 - 4 seconds) because the final delete removed the record from a table which is referenced as a foreign key in a number of other tables. Adding a couple of indices greatly improved the performance of the function, and the problem has now disappeared. However, we are concerned that this might indicate a more severe problem with Postgres which might cause further issues down the road. Here's a back trace of the core dump for reference: #0 0x08079d7f in heap_modifytuple () #1 0x08079eb6 in slot_getattr () #2 0x0816344d in ExecMakeFunctionResult () #3 0x081675b7 in ExecQual () #4 0x08167bae in ExecScan () #5 0x08175547 in ExecSeqScan () #6 0x08161b52 in ExecProcNode () #7 0x08160a8e in ExecutorRun () #8 0x0817ae0f in spi_printtup () #9 0x0817b9b0 in SPI_execute_snapshot () #10 0x00000000 in ?? () #11 0x00000000 in ?? () #12 0x00000000 in ?? () #13 0x00000001 in ?? () #14 0x083ecc88 in ?? () #15 0xbfbfa3e8 in ?? () #16 0x0000000a in ?? () #17 0x08607018 in ?? () #18 0x00000001 in ?? () #19 0xbfbfa588 in ?? () #20 0x08299fa9 in RI_Initial_Check () #21 0x083ecad8 in ?? () #22 0xbfbfa458 in ?? () #23 0x082e08ca in MemoryContextAllocZeroAligned () Previous frame inner to this frame (corrupt stack?) Any ideas?