Thread: random backend crashes - how to debug ( Is crash dump handler released ? )
random backend crashes - how to debug ( Is crash dump handler released ? )
Hi Everybody,
We are using PostGRE 8.4 version and experiencing random backend crashes. We have enabled logging and are able to see some logging happening in pg_log directory but not of much use. Here are the logs.
2011-06-14 18:06:04 IST WARNING: terminating connection because of crash of another server process
2011-06-14 18:06:04 IST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2011-06-14 18:06:04 IST HINT: In a moment you should be able to reconnect to the database and repeat your command.
2011-06-14 18:06:04 IST WARNING: terminating connection because of crash of another server process
2011-06-14 18:06:04 IST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2011-06-14 18:06:04 IST HINT: In a moment you should be able to reconnect to the database and repeat your command.
2011-06-14 18:06:04 IST WARNING: terminating connection because of crash of another server process
2011-06-14 18:06:04 IST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2011-06-14 18:06:04 IST HINT: In a moment you should be able to reconnect to the database and repeat your command.
2011-06-14 18:06:04 IST WARNING: terminating connection because of crash of another server process
2011-06-14 18:06:04 IST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
I searched online and found crash dump handler idea has been proposed and patch for that has already been released if I am not wrong. Could anyone please detail the steps to install crash dump handler in windows? Also could you please help me with the ways to debug the crashes happening as shown above.
Thanks,
Bangar Raju
Re: random backend crashes - how to debug ( Is crash dump handler released ? )
On Tue, Jun 14, 2011 at 9:26 AM, BangarRaju Vadapalli <BangarRaju.Vadapalli@infor.com> wrote: > Hi Everybody, > > > > We are using PostGRE 8.4 version and experiencing random backend > crashes. We have enabled logging and are able to see some logging happening > in pg_log directory but not of much use. Here are the logs. > > > > 2011-06-14 18:06:04 IST WARNING: terminating connection because of crash > of another server process > > 2011-06-14 18:06:04 IST DETAIL: The postmaster has commanded this server > process to roll back the current transaction and exit, because another > server process exited abnormally and possibly corrupted shared memory. > > 2011-06-14 18:06:04 IST HINT: In a moment you should be able to reconnect > to the database and repeat your command. > > 2011-06-14 18:06:04 IST WARNING: terminating connection because of crash of > another server process > > 2011-06-14 18:06:04 IST DETAIL: The postmaster has commanded this server > process to roll back the current transaction and exit, because another > server process exited abnormally and possibly corrupted shared memory. > > 2011-06-14 18:06:04 IST HINT: In a moment you should be able to reconnect > to the database and repeat your command. > > 2011-06-14 18:06:04 IST WARNING: terminating connection because of crash of > another server process > > 2011-06-14 18:06:04 IST DETAIL: The postmaster has commanded this server > process to roll back the current transaction and exit, because another > server process exited abnormally and possibly corrupted shared memory. > > 2011-06-14 18:06:04 IST HINT: In a moment you should be able to reconnect > to the database and repeat your command. > > 2011-06-14 18:06:04 IST WARNING: terminating connection because of crash of > another server process > > 2011-06-14 18:06:04 IST DETAIL: The postmaster has commanded this server > process to roll back the current transaction and exit, because another > server process exited abnormally and possibly corrupted shared memory. > > > > I searched online and found crash dump handler idea has been proposed and > patch for that has already been released if I am not wrong. Could anyone > please detail the steps to install crash dump handler in windows? Also could > you please help me with the ways to debug the crashes happening as shown > above. right. well, are you running any third party code? C functions? external modules? Is it practical to log queries from the client? On the server? Which exact version of postgres 8.4 are you running? How often do you see the crashes? merlin
Re: random backend crashes - how to debug ( Is crash dump handler released ? )
On 06/14/2011 10:26 PM, BangarRaju Vadapalli wrote: > Hi Everybody, > > We are using PostGRE 8.4 version and experiencing random backend > crashes. We have enabled logging and are able to see some logging > happening in pg_log directory but not of much use. Here are the logs. Thankyou for collecting the logs and including your version. A little more information would be helpful, like the exact version, your OS and architecture, etc. See this link for a list of suggested information: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems > I searched online and found crash dump handler idea has been proposed > and patch for that has already been released if I am not wrong. It is integrated into PostgreSQL 9.0 as a core part of the server. There's no reason it can't be compiled for PostgreSQL 8.4, though I never tested that. It shouldn't take long so I'll give it a go and get back to you. > Could > anyone please detail the steps to install crash dump handler in windows? > Also could you please help me with the ways to debug the crashes > happening as shown above. The first thing to do is to try to figure out if they're really random, or if they're related to a particular query or event. Enable more detailed logging in PostgreSQL - at least query logging, possibly also additional debug levels - and examine the logs to see if you can find a pattern. -- Craig Ringer
Re: random backend crashes - how to debug ( Is crash dump handler released ? )
On 15/06/2011 7:50 AM, Craig Ringer wrote: > There's no reason it can't be compiled for PostgreSQL 8.4, though I > never tested that. It shouldn't take long so I'll give it a go and get > back to you. Okies. I've built a version for 8.4. You can download it (32-bit only) from: http://www.postnewspapers.com.au/~craig/webfiles/crashdump_pg_84_32bit/crashdump.dll If you're on Windows XP or Vista you will also need: http://www.postnewspapers.com.au/~craig/webfiles/crashdump_pg_84_32bit/dbghelp.dll ... and since this DLL was compiled with VC++ 2008, you'll need the VC++2008 redist installed if you don't already have it: http://www.microsoft.com/downloads/en/details.aspx?familyid=9b2da534-3e03-4391-8a4d-074b9f2bc1bf&displaylang=en Put crashdump.dll and dbghelp.dll into C:\Program Files\PostgreSQL\8.4\lib Edit postgresql.conf, uncomment shared_preload_libraries if it's commented out and add 'crashdump' to it, eg: shared_preload_libraries = 'crashdump' Create a folder called "crashdumps" inside the data directory, at the same level as the "pg_log", "pg_xlog", "base" etc directories. Get properties on the new "crashdumps" directory and in the security tab add "Full Control" to the "postgres" user. Save your changes. Stop and start the postgresql-8.4 service from Start->Run->services.msc. You should now have a working crash dump handler. To test it, run: CREATE FUNCTION crashdump_crashme() RETURNS void AS 'crashdump.dll' LANGUAGE 'C'; then invoke it to crash your database system: SELECT crashdump_crashme(); If all goes well (heh) you'll lose your connection and the server will crash and - hopefully - restart. If it doesn't restart, relaunch it manually using services.msc. You should now see a file in the "crashdumps" folder.You can email it to me directly and I'll extract a backtrace. Alternately, if you want to get the backtrace yourself you will need to set up your NT_SYMBOL_PATH environment variable to match your install as per the instructions here: http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows#Configuring_the_symbol_path You will then be able to open it for debugging using Microsoft Visual Studio 2008 Express Edition (or any paid Visual Studio edition). Once open, right click on the dump file in the left bar and choose "debug new instance". Alternately you can use windbg.exe from Debugging Tools for Windows to analyse the dump as per the instructions here: http://archives.postgresql.org/message-id/4CAB4294.2070104@postnewspapers.com.au You'll probably want to DROP FUNCTION crashdump_crashme(); after running your crashme test, though it's harmless if left in place so long as it's not invoked. Dropping the crashme function doesn't affect the crashdump handler; it's loaded by shared_preload_libraries and will remain in place. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: random backend crashes - how to debug ( Is crash dump handler released ? )
On 15/06/2011 7:50 AM, Craig Ringer wrote: >> I searched online and found crash dump handler idea has been proposed >> and patch for that has already been released if I am not wrong. > > It is integrated into PostgreSQL 9.0 as a core part of the server. Correction - it's in 9.1 not 9.0 . Whoops, I should know that! -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
On 06/15/2011 05:52 PM, BangarRaju Vadapalli wrote: > 7. Attached the database side and application side logs. Yeah, there's definitely a crash, you just chopped it off in the abbreviated logs you sent earlier. With this kind of log volume that's easy enough to do. Anyway, the crash is: 2011-06-15 13:55:59 IST postgres epimart PANIC: XX000: cannot abort transaction 19773146, it was already committed from: 2011-06-15 13:55:59 IST postgres epimart ERROR: XX000: could not open relation base/2850136/3344343_vm: A blocking operation was interrupted by a call to WSACancelBlockingCall.^M 2011-06-15 13:55:59 IST postgres epimart LOCATION: mdopen, .\src\backend\storage\smgr\md.c:526 2011-06-15 13:55:59 IST postgres epimart STATEMENT: COMMIT 2011-06-15 13:55:59 IST postgres epimeta DEBUG: 00000: CommitTransaction 2011-06-15 13:55:59 IST postgres epimeta LOCATION: ShowTransactionState, .\src\backend\access\transam\xact.c:4074 2011-06-15 13:55:59 IST postgres epimart WARNING: 01000: AbortTransaction while in COMMIT state 2011-06-15 13:55:59 IST postgres epimart LOCATION: AbortTransaction, .\src\backend\access\transam\xact.c:2011 2011-06-15 13:55:59 IST postgres epimart DEBUG: 00000: StartTransaction 2011-06-15 13:55:59 IST postgres epimart LOCATION: ShowTransactionState, .\src\backend\access\transam\xact.c:4074 2011-06-15 13:55:59 IST postgres epimart PANIC: XX000: cannot abort transaction 19773146, it was already committed 2011-06-15 13:55:59 IST postgres epimart LOCATION: RecordTransactionAbort, .\src\backend\access\transam\xact.c:1200 While I recall some known issues that resulted in panics because of "cannot abort transaction" errors, I seem to recall that they were related to autovacuum, which this doesn't particularly seem to be. I'd still recommend updating to 8.4.8, as 8.4.2 was patched six more times for good reasons. -- Craig Ringer
Re: random backend crashes - how to debug ( Is crash dump handler released ? )
On 06/14/2011 10:26 PM, BangarRaju Vadapalli wrote: > Hi Everybody, > > We are using PostGRE 8.4 version and experiencing random backend > crashes. We have enabled logging and are able to see some logging > happening in pg_log directory but not of much use. Here are the logs. - Examination of the full length logs sent off-list shows these lines leading up to the crash: (crash1): 2011-06-15 13:55:59 IST postgres epimart ERROR: XX000: could not open relation base/2850136/3344343_vm: A blocking operation was interrupted by a call to WSACancelBlockingCall.^M (crash2) 2011-06-15 14:22:40 IST postgres epimart ERROR: XX000: could not open relation base/2850136/3352537_fsm: A blocking operation was interrupted by a call to WSACancelBlockingCall.^M ... in both cases followed by: XX000: cannot abort transaction 19859931, it was already committed then a Windows runtime message reporting that the backend crashed. Ideas? After some off-list conversation, we've established that: - The crash is part of a batch process where the OP loads data from external sources. It's the 11th stage of a 12 stage process, and cannot be easily separated into a small self-contained test case. If the OP runs just the 11th stage standalone, without having just run the prior stages, the crash does not occur. It only crashes if the whole process is run in one go. (OP: please confirm that my summary is accurate, as it's condensed from several emails). - The crash is reproducible on 9.0 . It hasn't yet been reproduced on 9.1 because the OP is having some problems with views on 9.1 that he'll be posting about separately. - We can't seem to get a crash dump or attach a debugger to get a backtrace. I built a copy of the early version of the crash dump handler before it was integrated into 9.1 so he could load it as a DLL into 8.4, and it works when the backend is intentionally crashed but doesn't capture the crash that's causing the problem. - I still haven't been able to confirm how the batch process in question works. Does it all run in a single connection with a single transaction? Or is it a multi-connection affair with multiple scripts / programs involved? If Bangar Raju could describe this part in more detail that would be very helpful. -- Craig Ringer
Attachment
Re: random backend crashes - how to debug ( Is crash dump handler released ? )
Answers Below.... -----Original Message----- From: Craig Ringer [mailto:craig@postnewspapers.com.au] Sent: Sunday, June 19, 2011 3:03 PM To: BangarRaju Vadapalli Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] random backend crashes - how to debug ( Is crash dump handler released ? ) On 06/14/2011 10:26 PM, BangarRaju Vadapalli wrote: > Hi Everybody, > > We are using PostGRE 8.4 version and experiencing random backend > crashes. We have enabled logging and are able to see some logging > happening in pg_log directory but not of much use. Here are the logs. - Examination of the full length logs sent off-list shows these lines leading up to the crash: (crash1): 2011-06-15 13:55:59 IST postgres epimart ERROR: XX000: could not open relation base/2850136/3344343_vm: A blockingoperation was interrupted by a call to WSACancelBlockingCall.^M (crash2) 2011-06-15 14:22:40 IST postgres epimart ERROR: XX000: could not open relation base/2850136/3352537_fsm: A blockingoperation was interrupted by a call to WSACancelBlockingCall.^M ... in both cases followed by: XX000: cannot abort transaction 19859931, it was already committed then a Windows runtime message reporting that the backend crashed. Ideas? After some off-list conversation, we've established that: - The crash is part of a batch process where the OP loads data from external sources. It's the 11th stage of a 12 stage process,and cannot be easily separated into a small self-contained test case. If the OP runs just the 11th stage standalone,without having just run the prior stages, the crash does not occur. It only crashes if the whole process is runin one go. Bangar - correct... (OP: please confirm that my summary is accurate, as it's condensed from several emails). - The crash is reproducible on 9.0 . It hasn't yet been reproduced on 9.1 because the OP is having some problems with views on 9.1 that he'll be posting about separately. Bangar - I have posted the question... - We can't seem to get a crash dump or attach a debugger to get a backtrace. I built a copy of the early version of thecrash dump handler before it was integrated into 9.1 so he could load it as a DLL into 8.4, and it works when the backendis intentionally crashed but doesn't capture the crash that's causing the problem. - I still haven't been able to confirm how the batch process in question works. Does it all run in a single connection witha single transaction? Or is it a multi-connection affair with multiple scripts / programs involved? If Bangar Raju could describe this part inmore detail that would be very helpful. Bangar - The process involves multiple connections and multiple transactions and our code has both java and native layersusing postgre jdbc drivers and libpq libraries... -- Craig Ringer