Re: [PATCH] Log crashed backend's query (activity string) - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: [PATCH] Log crashed backend's query (activity string)
Date
Msg-id 4E6653D80200002500040DCC@gw.wicourts.gov
Whole thread Raw
In response to Re: [PATCH] Log crashed backend's query (activity string)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Marti Raudsepp <marti@juffo.org> writes:
>>> This patch adds the backend's current running query to the
>>> "backend crash" message.
>>
>> Sorry, this patch is entirely unacceptable.  We cannot have the
>> postmaster's functioning depending on the contents of shared
>> memory still being valid ... most especially not when we know
>> that somebody just crashed, and could have corrupted the shared
>> memory in arbitrary ways.  No, I don't think your attempts to
>> validate the data are adequate, nor do I believe they can be made
>> adequate.
> 
> Why and why not?
> 
>> And I doubt
>> that the goal is worth taking risks for.
> 
> I am unable to count the number of times that I have had a
> customer come to me and say "well, the backend crashed".  And I go
> look at their logs and I have no idea what happened.  So then I
> tell them to include %p in log_line_prefix and set
> log_min_duration_statement=0 and call me if it happens again. 
> This is a huge nuisance and a serious interference with attempts
> to do meaningful troubleshooting.  When it doesn't add days or
> weeks to the time to resolution, it's because it prevents
> resolution altogether.  We really, really need something like
> this.
I haven't had this experience more than a few times, but a few is
enough to recognize how painful it can be.  It seems we're brave
enough to log *some* information at crash time, in spite of the risk
that memory may be corrupted in unpredictable ways.  Sure, there is
a slim chance that when you think you're writing to the log you've
actually got a handle to a segment of a heap file, but that chance
is extremely slim -- and if that's where you're at you've probably
already written a 'segfault' message there, anyway.  My gut feel is
this would allow diagnosis in a timely fashion often enough to save
more data than it puts at risk, to say nothing of people's time.
I don't know whether the patch on the table is coded as defensively
as it should be given the perilous times the new code would come
into play, but I don't think the idea should be rejected out of
hand.
-Kevin


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Log crashed backend's query (activity string)
Next
From: Ants Aasma
Date:
Subject: Re: [COMMITTERS] pgsql: Clean up the #include mess a little.