Re: diagnosing a db crash - server exit code 2 - Mailing list pgsql-admin
From | Robert Burgholzer |
---|---|
Subject | Re: diagnosing a db crash - server exit code 2 |
Date | |
Msg-id | CACT-NGKP07Ku5y7=JdOFDx+PYv-5w70CAv=N8yuc4w2HeSotHQ@mail.gmail.com Whole thread Raw |
In response to | Re: diagnosing a db crash - server exit code 2 ("Burgholzer, Robert (DEQ)" <Robert.Burgholzer@deq.virginia.gov>) |
Responses |
Re: diagnosing a db crash - server exit code 2
Re: diagnosing a db crash - server exit code 2 |
List | pgsql-admin |
I have been following wiki that Scott sent, and attempted to trace one of my pg processes while making it crash. I have "succeeded" in causing the crash on my dev server, which suggests at least that it is not due to some spurious piece of faulty hardware on my primary. I had failed to initiate the log file creation on the process that was tracing, and thus it seems no log file resulted. Also, the ssh session that was monitoring the process died in the midst due to a local network routing glitch.
If anyone has any suggestions as to how to run the trace via a nohup command or something, that would be cool, since then I could let it run in the background. I can reproduce the crash, but it is somewhat episodic, it seems that I can run the same query several times before things blow up.
So, in short, I am quite confident that I can get this finished shortly, but very short on time to devote to it for the next couple of days.
Thanks again for the help, and sorry that I am drawing this process out,
r.b.
Thanks to everyone, Tom, Joe, Scott, I will be in touch today as I move through this.
Joe - if I need to have you log in for assistance, I am more than happy to make that happen.
Regards,
r.b.
-----Original Message-----
From: Joe Conway [mailto:mail@joeconway.com]
Sent: Fri 9/23/2011 5:03 PM
To: Burgholzer, Robert (DEQ)Cc: Scott Marlowe; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] diagnosing a db crash - server exit code 2On 09/23/2011 01:45 PM, Burgholzer, Robert (DEQ) wrote:
> Joe - it appears that it ALWAYS involves pLR - even a simple median call
> has caused it, though I must say it is something that is calculating the
> median of somewhere around 10-20,000 pieces of data if that makes any
> difference. I would be delighted to run any kind of debugging necessary
> and share the info. I have an identical system that can reproduce the
> errors (I am pretty certain that they HAVE previously). What I DON'T
> have is any knowledge of the stack-trace/debugger things, but I'm
> willing to learn, and I have a sysadmin who may be able to lend a hand.
There is some good information about using gdb with postgres here:
http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
If you need a hand, I would be happy to help you through the debugging
via phone or even log in remotely if you can allow it. Just contact me
off-list if you want to pursue that.
Note that I made a new PL/R release just a few weeks ago which fixed
several known crash-bugs. In particular these two pop out at me:
- Fix missing calls to UNPROTECT.
- Don't try to free an array element value when the
array element is NULL
Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
--
--
Robert W. Burgholzer
http://www.findingfreestyle.com/
On Facebook - http://www.facebook.com/pages/Finding-Freestyle/151918511505970
Twitter - http://www.twitter.com/findfreestyle
What's a tweeted swim set? A Sweet? No, a #swaiku! Get them by following http://twitter.com/findfreestyle
pgsql-admin by date: