Re: random backend crashes - how to debug ( Is crash dump handler released ? ) - Mailing list pgsql-general

From BangarRaju Vadapalli
Subject Re: random backend crashes - how to debug ( Is crash dump handler released ? )
Date
Msg-id 3DF304319BFE284182A530654E0263831CB280D81F@INHYWEXMB2.infor.com
Whole thread Raw
In response to Re: random backend crashes - how to debug ( Is crash dump handler released ? )  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-general
Answers Below....

-----Original Message-----
From: Craig Ringer [mailto:craig@postnewspapers.com.au]
Sent: Sunday, June 19, 2011 3:03 PM
To: BangarRaju Vadapalli
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] random backend crashes - how to debug ( Is crash dump handler released ? )

On 06/14/2011 10:26 PM, BangarRaju Vadapalli wrote:
> Hi Everybody,
>
> We are using PostGRE 8.4 version and experiencing random backend
> crashes. We have enabled logging and are able to see some logging
> happening in pg_log directory but not of much use. Here are the logs.

- Examination of the full length logs sent off-list shows these lines leading up to the crash:

(crash1): 2011-06-15 13:55:59 IST postgres epimart ERROR:  XX000: could not open relation base/2850136/3344343_vm: A
blockingoperation was interrupted by a call to WSACancelBlockingCall.^M 

(crash2) 2011-06-15 14:22:40 IST postgres epimart ERROR:  XX000: could not open relation base/2850136/3352537_fsm: A
blockingoperation was interrupted by a call to WSACancelBlockingCall.^M 

... in both cases followed by:

XX000: cannot abort transaction 19859931, it was already committed

then a Windows runtime message reporting that the backend crashed.

Ideas?



After some off-list conversation, we've established that:

- The crash is part of a batch process where the OP loads data from external sources. It's the 11th stage of a 12 stage
process,and cannot be easily separated into a small self-contained test case. If the OP runs just the 11th stage
standalone,without having just run the prior stages, the crash does not occur. It only crashes if the whole process is
runin one go. 

Bangar - correct...

(OP: please confirm that my summary is accurate, as it's condensed from several emails).

- The crash is reproducible on 9.0 . It hasn't yet been reproduced on
9.1 because the OP is having some problems with views on 9.1 that he'll be posting about separately.

Bangar - I have posted the question...

- We can't seem to get a crash dump or attach a debugger to get a backtrace. I built a copy of the early version  of
thecrash dump handler before it was integrated into 9.1 so he could load it as a DLL into 8.4, and it works when the
backendis intentionally crashed but doesn't capture the crash that's causing the problem. 

- I still haven't been able to confirm how the batch process in question works. Does it all run in a single connection
witha single transaction?  
Or is it a multi-connection affair with multiple scripts / programs involved? If Bangar Raju could describe this part
inmore detail that would be very helpful. 

Bangar - The process involves multiple connections and multiple transactions and our code has both java and native
layersusing postgre jdbc drivers and libpq libraries... 

--
Craig Ringer

pgsql-general by date:

Previous
From: Gavin Flower
Date:
Subject: Re: Search for lists
Next
From: BangarRaju Vadapalli
Date:
Subject: Forward referencing of table aliases in subqueries does not work in 9.1 beta2 ( works in 9.0 and 8.4.2 )