Thread: Bus error in postgres 8.3
Hi all, I'm looking to push 8.3 out this week, but I'm running into a particularly nasty bus error. I'm not sure what's causing it as it appears to be transient (or at least somewhat random), but I do know that it bites on connection time and takes down the entire server with it. I'm going to try to force a crash on my test server to see if I can find out anything more. Does anyone know how to get useful debugging info at the time of the crash? I don't think I have things set to dump core anywhere, but that and/or stack traces would be nice (especially for child processes). If anyone has any idea what's going on and could clue me in, that would be excellent. Peter P.S. Here's the syslog transcript of the crash. Apr 28 23:02:02 mitchell postgres[3442]: [3-1] LOG: connection received: host=mitchell.cs.wisc.edu port=37588 Apr 28 23:02:02 mitchell postgres[3442]: [4-1] LOG: connection authorized: user=postgres database=template1 Apr 28 23:02:02 mitchell postgres[3442]: [5-1] LOG: disconnection: session time: 0:00:00.239 user=postgres database=template1 host=mitchell.cs.wisc.edu port=37588 Apr 28 23:02:02 mitchell postgres[3444]: [3-1] LOG: connection received: host=mitchell.cs.wisc.edu port=37589 Apr 28 23:02:02 mitchell postgres[3444]: [4-1] LOG: connection authorized: user=postgres database=564testdb Apr 28 23:02:02 mitchell postgres[461]: [3-1] LOG: server process (PID 3444) was terminated by signal 7: Bus error Apr 28 23:02:02 mitchell postgres[461]: [4-1] LOG: terminating any other active server processes Apr 28 23:02:02 mitchell postgres[461]: [5-1] LOG: all server processes terminated; reinitializing Apr 28 23:02:02 mitchell postgres[3447]: [6-1] LOG: connection received: host=mitchell.cs.wisc.edu port=37590 Apr 28 23:02:02 mitchell postgres[3447]: [7-1] FATAL: the database system is in recovery mode Apr 28 23:02:02 mitchell postgres[3448]: [6-1] LOG: connection received: host=mitchell.cs.wisc.edu port=37591 Apr 28 23:02:02 mitchell postgres[3448]: [7-1] FATAL: the database system is in recovery mode Apr 28 23:02:23 mitchell postgres[461]: [6-1] LOG: startup process (PID 3446) was terminated by signal 7: Bus error Apr 28 23:02:23 mitchell postgres[461]: [7-1] LOG: aborting startup due to startup process failure Apr 28 23:22:15 mitchell postgres[3702]: [1-1] LOG: could not load root certificate file "root.crt": No such file or directory Apr 28 23:22:15 mitchell postgres[3702]: [1-2] DETAIL: Will not verify client certificates. Apr 28 23:22:15 mitchell postgres[3703]: [2-1] LOG: database system was interrupted; last known up at 2008-04-28 07:02:32 CDT Apr 28 23:22:15 mitchell postgres[3703]: [3-1] LOG: database system was not properly shut down; automatic recovery in progress Apr 28 23:22:15 mitchell postgres[3703]: [4-1] LOG: record with zero length at 6/915320A0
"Peter Koczan" <pjkoczan@gmail.com> writes: > I'm going to try to force a crash on my test server to see if I can > find out anything more. Does anyone know how to get useful debugging > info at the time of the crash? I don't think I have things set to dump > core anywhere, but that and/or stack traces would be nice (especially > for child processes). Make sure the postmaster is started under ulimit -c unlimited. On a depressingly large fraction of modern platforms, daemons are started with ulimit -c 0 by default :-(. Try putting "ulimit -c unlimited" into your PG init script and restarting. regards, tom lane
On Tue, Apr 29, 2008 at 10:35 AM, Peter Koczan <pjkoczan@gmail.com> wrote: > > I'm going to try to force a crash on my test server to see if I can > find out anything more. Does anyone know how to get useful debugging > info at the time of the crash? I don't think I have things set to dump > core anywhere, but that and/or stack traces would be nice (especially > for child processes). > Yeah, a stack trace and if possible, a self contained test case to reproduce the bug would help. If you are using a custom build, then using a debug build would help a lot too. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
On Tue, Apr 29, 2008 at 1:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Make sure the postmaster is started under ulimit -c unlimited. > On a depressingly large fraction of modern platforms, daemons are > started with ulimit -c 0 by default :-(. Try putting "ulimit -c unlimited" > into your PG init script and restarting. > On Tue, Apr 29, 2008 at 10:35 AM, Peter Koczan <pjkoczan@gmail.com> wrote: > Yeah, a stack trace and if possible, a self contained test case to > reproduce the bug would help. If you are using a custom build, then > using a debug build would help a lot too. So far, this problem hasn't reoccurred since I first saw it, no matter how much I pummel my test server. I was testing some patches about the time that this happened (including one in the libpq backend), so there may have been an ABI mismatch which led to the bus error. In any case, I've set the postmaster to have ulimit -c unlimited in case this problem reoccurs. I'll also look into using a debug build in case this happens again. But until then, I'm confident enough to keep pushing out 8.3. Thanks again. Peter
On Tue, Apr 29, 2008 at 8:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Peter Koczan" <pjkoczan@gmail.com> writes: > > I'm going to try to force a crash on my test server to see if I can > > find out anything more. Does anyone know how to get useful debugging > > info at the time of the crash? I don't think I have things set to dump > > core anywhere, but that and/or stack traces would be nice (especially > > for child processes). > > Make sure the postmaster is started under ulimit -c unlimited. > On a depressingly large fraction of modern platforms, daemons are > started with ulimit -c 0 by default :-(. Try putting "ulimit -c unlimited" > into your PG init script and restarting. > > regards, tom lane Do you think a too low limit on user processes could be causing the problem? We have had similar crashes (server process terminated with Bus Error). This is running 8.2.5 on OS X (Tiger, intel), with about 130-140 connections, but the process count limit is 1000. Any reason why the process limit should be vastly higher than the number of connections? -- Venlig Hilsen / Kind Regards Filip Svendsen fs@basepointmedia.com
Filip Svendsen wrote: > On Tue, Apr 29, 2008 at 8:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> "Peter Koczan" <pjkoczan@gmail.com> writes: >> > I'm going to try to force a crash on my test server to see if I can >> > find out anything more. Does anyone know how to get useful debugging >> > info at the time of the crash? I don't think I have things set to dump >> > core anywhere, but that and/or stack traces would be nice (especially >> > for child processes). >> >> Make sure the postmaster is started under ulimit -c unlimited. >> On a depressingly large fraction of modern platforms, daemons are >> started with ulimit -c 0 by default :-(. Try putting "ulimit -c unlimited" >> into your PG init script and restarting. >> >> regards, tom lane >> > > Do you think a too low limit on user processes could be causing the problem? > > We have had similar crashes (server process terminated with Bus > Error). This is running 8.2.5 on OS X (Tiger, intel), with about > 130-140 connections, but the process count limit is 1000. Any reason > why the process limit should be vastly higher than the number of > connections? > "ulimit -c unlimited" sets the core dump size to be unlimited, not the connection count. That would be related to "ulimit -n" (descriptor count.) Darren
Hi All, On a newly created server I stupidly restored over the top of the postgres system database using the GUI...(postgresql n00b) and am wondering a couple of things. Firstly, if this database is overwritten, is it detrimental to the running of PostgreSQL? Secondly is it just a simple matter of restoring it using a backup from a different PostgreSQL instance installed elsewhere. Regards, Patrick ______________________________________________________________________ This email, including attachments, is intended only for the addressee and may be confidential, privileged and subject to copyright. If you have received this email in error, please advise the sender and delete it. If you are not the intended recipient of this email, you must not use, copy or disclose its content to anyone. You must not copy or communicate to others content that is confidential or subject to copyright, unless you have the consent of the content owner.
> On a newly created server I stupidly restored over the top of the > postgres system database using the GUI...(postgresql n00b) and am > wondering a couple of things. > Firstly, if this database is overwritten, is it detrimental to the running of PostgreSQL? > Secondly is it just a simple matter of restoring it using a backup from a different PostgreSQL instance installed elsewhere. When you say "system database", do you mean 'template1'? Or 'template0'? Or 'postgres'? Or something else? THINK BEFORE YOU PRINT - Save paper if you don't really need to print this *******************Confidentiality and Privilege Notice******************* The material contained in this message is privileged and confidential to the addressee. If you are not the addressee indicated in this message or responsible for delivery of the message to such person, you may not copy or deliver this message to anyone, and you should destroy it and kindly notify the sender by reply email. Information in this message that does not relate to the official business of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta. Weatherbeeta, its employees, contractors or associates shall not be liable for direct, indirect or consequential loss arising from transmission of this message or any attachments e-mail.
To be more clear it was the 'postgres' database. Regards, Patrick -----Original Message----- From: Phillip Smith [mailto:phillip.smith@weatherbeeta.com.au] Sent: Monday, 5 May 2008 4:22 PM To: Patrick Roberts; pgsql-admin@postgresql.org Subject: RE: [ADMIN] restore postgres system database 8.1.11 > On a newly created server I stupidly restored over the top of the > postgres system database using the GUI...(postgresql n00b) and am > wondering a couple of things. > Firstly, if this database is overwritten, is it detrimental to the running of PostgreSQL? > Secondly is it just a simple matter of restoring it using a backup from a different PostgreSQL instance installed elsewhere. When you say "system database", do you mean 'template1'? Or 'template0'? Or 'postgres'? Or something else? THINK BEFORE YOU PRINT - Save paper if you don't really need to print this *******************Confidentiality and Privilege Notice******************* The material contained in this message is privileged and confidential to the addressee. If you are not the addressee indicated in this message or responsible for delivery of the message to such person, you may not copy or deliver this message to anyone, and you should destroy it and kindly notify the sender by reply email. Information in this message that does not relate to the official business of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta. Weatherbeeta, its employees, contractors or associates shall not be liable for direct, indirect or consequential loss arising from transmission of this message or any attachments e-mail. ______________________________________________________________________ This email, including attachments, is intended only for the addressee and may be confidential, privileged and subject to copyright. If you have received this email in error, please advise the sender and delete it. If you are not the intended recipient of this email, you must not use, copy or disclose its content to anyone. You must not copy or communicate to others content that is confidential or subject to copyright, unless you have the consent of the content owner.
Patrick Roberts wrote: > Hi All, > > On a newly created server I stupidly restored over the top of the > postgres system database using the GUI...(postgresql n00b) and am > wondering a couple of things. > > Firstly, if this database is overwritten, is it detrimental to the > running of PostgreSQL? The "postgres" database is basically an empty holder so that when you do this: psql as the user postgres, you actually get into the system. > > Secondly is it just a simple matter of restoring it using a backup > from a different PostgreSQL instance installed elsewhere. If you are indeed speaking about the "postgres" database yes. Sincerely, Joshua D. Drake