Thread: Segmentation fault
Server stopped due to Segmentation Fault. Server was running successfully for an year.
PostgreSQL: 9.0.3
from /var/log/messages
Jul 18 19:00:03 ip-10-136-22-193 kernel: [18643442.660032] postgres[6818]: segfault at 170a8c6f ip 000000000044c94d sp 00007fff9fee5b80 error 4 in postgres[400000+495000]
from pg log
LOG: server process (PID 6818) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
Please suggest if there is a way to find out the issue.
Suggestions to avoid.
Regards
Amod
PostgreSQL: 9.0.3
from /var/log/messages
Jul 18 19:00:03 ip-10-136-22-193 kernel: [18643442.660032] postgres[6818]: segfault at 170a8c6f ip 000000000044c94d sp 00007fff9fee5b80 error 4 in postgres[400000+495000]
from pg log
LOG: server process (PID 6818) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
Please suggest if there is a way to find out the issue.
Suggestions to avoid.
Regards
Amod
On 07/19/2012 12:37 AM, Amod Pandey wrote:
Server stopped due to Segmentation Fault. Server was running successfully for an year.
PostgreSQL: 9.0.3
from /var/log/messages
Jul 18 19:00:03 ip-10-136-22-193 kernel: [18643442.660032] postgres[6818]: segfault at 170a8c6f ip 000000000044c94d sp 00007fff9fee5b80 error 4 in postgres[400000+495000]
from pg log
LOG: server process (PID 6818) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
Please suggest if there is a way to find out the issue.
Did the crash produce a core file ?
You haven't mentioned what Linux distro or kernel version you're on, and defaults vary. Look in your PostgreSQL datadir and see if there are any files with "core" in the name.
Unfortunately most Linux distros default to not producing core files. Without a core file it'll be nearly impossible because the segfault message reported by the kernel only contains the instruction pointer and stack pointer. The stack pointer is invalid and useless without a core file, and with address space layout randomisation active the instruction pointer offsets are all randomised for each execution, so the ip doesn't tell you much on ASLR systems either.
If you can show more of the PostgreSQL logs from around the incident that would possibly be helpful.
--
Craig Ringer
On 07/19/2012 01:52 PM, Amod Pandey wrote:
Thank you Craig for explaining in such a detail. I am adding more information and would see what more I can add,Quite likely. Limits are inherited down process trees, so there's no guarantee that PostgreSQL's ulimit also prevented core file generation. However I haven't seen any distro configure a non-zero ulimit for PostgreSQL or other system services explicitly, so it's pretty darn likely to be zero, though.
$ulimit -a
core file size (blocks, -c) 0
So I assume there to be no core dump file.
Just check for a core file in the PostgreSQL data dir. If there is one, the Pg ulimit obviously wasn't zero. If there isn't, then given that Pg's working directory is always the datadir, chances are the ulimit prevented a core dump.
You would need to put this command in the PostgreSQL startup scripts *then* restart PostgreSQL.
If I set 'ulimit -c unlimited' will it generate core dump if there is another occurrence. Do I need to restart postgres for this to take effect.
It can be easier to configure it globally for the server. How to do this depends a bit on your distro and version; Google will help - "enable core dumps <distro>" or "change ulimit <distro>" for example.
Um, that's not a distro, that's a kernel. I'm assuming it's an Amazon cloud hosted machine by the kernel, and since Ubuntu (and IIRC Debian) puts its name in the uname version string it's probably RHEL/CentOS/Fedora.
Linux distros
-------------------
Linux ip-xx-xx-xx-xx 2.6.35.11-83.9.amzn1.x86_64 #1 SMP Sat Feb 19 23:42:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
--
Craig Ringer
Thank you Craig for explaining in such a detail. I am adding more information and would see what more I can add,
$ulimit -a
core file size (blocks, -c) 0
So I assume there to be no core dump file.
If I set 'ulimit -c unlimited' will it generate core dump if there is another occurrence. Do I need to restart postgres for this to take effect.
Linux distros
-------------------
Linux ip-xx-xx-xx-xx 2.6.35.11-83.9.amzn1.x86_64 #1 SMP Sat Feb 19 23:42:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
I will see if there are queries which I can share.
Regards
Amod
$ulimit -a
core file size (blocks, -c) 0
So I assume there to be no core dump file.
If I set 'ulimit -c unlimited' will it generate core dump if there is another occurrence. Do I need to restart postgres for this to take effect.
Linux distros
-------------------
Linux ip-xx-xx-xx-xx 2.6.35.11-83.9.amzn1.x86_64 #1 SMP Sat Feb 19 23:42:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
I will see if there are queries which I can share.
Regards
Amod
On Thu, Jul 19, 2012 at 9:20 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
Did the crash produce a core file ?On 07/19/2012 12:37 AM, Amod Pandey wrote:Server stopped due to Segmentation Fault. Server was running successfully for an year.
PostgreSQL: 9.0.3
from /var/log/messages
Jul 18 19:00:03 ip-10-136-22-193 kernel: [18643442.660032] postgres[6818]: segfault at 170a8c6f ip 000000000044c94d sp 00007fff9fee5b80 error 4 in postgres[400000+495000]
from pg log
LOG: server process (PID 6818) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
Please suggest if there is a way to find out the issue.
You haven't mentioned what Linux distro or kernel version you're on, and defaults vary. Look in your PostgreSQL datadir and see if there are any files with "core" in the name.
Unfortunately most Linux distros default to not producing core files. Without a core file it'll be nearly impossible because the segfault message reported by the kernel only contains the instruction pointer and stack pointer. The stack pointer is invalid and useless without a core file, and with address space layout randomisation active the instruction pointer offsets are all randomised for each execution, so the ip doesn't tell you much on ASLR systems either.
If you can show more of the PostgreSQL logs from around the incident that would possibly be helpful.
--
Craig Ringer