Thread: Problems with 8.3
I'm getting the back end closing connections early for some reason. Here is an exception report from my servlet. This first started happening with my instance of Trac, but now it's happening to my Java apps too. I hope someone can shed some light on what is going on here. Alex HTTP Status 500 - type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException: javax.servlet.ServletException: javax.servlet.jsp.JspException: select venue, city, a.state from venue a, city b, subscriber c where a.city_id=b.city_id and distance(deg2rad(c.latitude),deg2rad(c.longitude), deg2rad(a.latitude), deg2rad(a.longitude))<60.0 and c.subscriber_id=? : An I/O error occured while sending to the backend. org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:541) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:417) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) javax.servlet.http.HttpServlet.service(HttpServlet.java:803) root cause javax.servlet.ServletException: javax.servlet.jsp.JspException: select venue, city, a.state from venue a, city b, subscriber c where a.city_id=b.city_id and distance(deg2rad(c.latitude),deg2rad(c.longitude), deg2rad(a.latitude), deg2rad(a.longitude))<60.0 and c.subscriber_id=? : An I/O error occured while sending to the backend. org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:846) org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:779) org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:420) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) javax.servlet.http.HttpServlet.service(HttpServlet.java:803) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) javax.servlet.http.HttpServlet.service(HttpServlet.java:803) root cause org.postgresql.util.PSQLException: An I/O error occured while sending to the backend. org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217) org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452) org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:351) org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:255) org.apache.taglibs.standard.tag.common.sql.QueryTagSupport.doEndTag(QueryTagSupport.java:215) org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_sql_005fquery_005f2(find_005fsubscriber_jsp.java:1388) org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f6(find_005fsubscriber_jsp.java:933) org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fforEach_005f0(find_005fsubscriber_jsp.java:752) org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f2(find_005fsubscriber_jsp.java:543) org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:220) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) javax.servlet.http.HttpServlet.service(HttpServlet.java:803) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) javax.servlet.http.HttpServlet.service(HttpServlet.java:803) root cause java.io.EOFException org.postgresql.core.PGStream.ReceiveChar(PGStream.java:257) org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1165) org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:191) org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452) org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:351) org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:255) org.apache.taglibs.standard.tag.common.sql.QueryTagSupport.doEndTag(QueryTagSupport.java:215) org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_sql_005fquery_005f2(find_005fsubscriber_jsp.java:1388) org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f6(find_005fsubscriber_jsp.java:933) org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fforEach_005f0(find_005fsubscriber_jsp.java:752) org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f2(find_005fsubscriber_jsp.java:543) org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:220) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) javax.servlet.http.HttpServlet.service(HttpServlet.java:803) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) javax.servlet.http.HttpServlet.service(HttpServlet.java:803) note The full stack trace of the root cause is available in the Apache Tomcat/6.0.14 logs.
Ok - lookint at the pg log, it appears that the server process is seg faulting :(. This might conceivably be my fault. I have 3 stored procedures written in C, but they've been on the server for months, and unless I didn't deploy them correctly originally I don't think it's them. I'm thinking I need a core file. I'm guessing I just restart postgresql from a user whos core file size limit is set to non zero? Alex On Thu, Mar 6, 2008 at 3:49 PM, Alex Turner <armtuk@gmail.com> wrote: > I'm getting the back end closing connections early for some reason. > Here is an exception report from my servlet. This first started > happening with my instance of Trac, but now it's happening to my Java > apps too. I hope someone can shed some light on what is going on > here. > > Alex > > > HTTP Status 500 - > > type Exception report > > message > > description The server encountered an internal error () that prevented > it from fulfilling this request. > > exception > > org.apache.jasper.JasperException: javax.servlet.ServletException: > javax.servlet.jsp.JspException: > > > select venue, > city, a.state from venue a, city b, subscriber c where > a.city_id=b.city_id and > distance(deg2rad(c.latitude),deg2rad(c.longitude), deg2rad(a.latitude), > deg2rad(a.longitude))<60.0 and c.subscriber_id=? > > > > > : An I/O > error occured while sending to the backend. > > org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:541) > > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:417) > > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320) > > org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) > > javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > > > root cause > > javax.servlet.ServletException: javax.servlet.jsp.JspException: > > > select venue, > city, a.state from venue a, city b, subscriber c where > a.city_id=b.city_id and > distance(deg2rad(c.latitude),deg2rad(c.longitude), deg2rad(a.latitude), > deg2rad(a.longitude))<60.0 and c.subscriber_id=? > > > > > : An I/O > error occured while sending to the backend. > > org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:846) > > org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:779) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:420) > > org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > > javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393) > > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320) > > org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) > > javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > > > root cause > > org.postgresql.util.PSQLException: An I/O error occured while > sending to the backend. > > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217) > > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452) > > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:351) > > org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:255) > > org.apache.taglibs.standard.tag.common.sql.QueryTagSupport.doEndTag(QueryTagSupport.java:215) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_sql_005fquery_005f2(find_005fsubscriber_jsp.java:1388) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f6(find_005fsubscriber_jsp.java:933) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fforEach_005f0(find_005fsubscriber_jsp.java:752) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f2(find_005fsubscriber_jsp.java:543) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:220) > > org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > > javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393) > > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320) > > org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) > > javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > > > root cause > > java.io.EOFException > > org.postgresql.core.PGStream.ReceiveChar(PGStream.java:257) > > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1165) > > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:191) > > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452) > > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:351) > > org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:255) > > org.apache.taglibs.standard.tag.common.sql.QueryTagSupport.doEndTag(QueryTagSupport.java:215) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_sql_005fquery_005f2(find_005fsubscriber_jsp.java:1388) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f6(find_005fsubscriber_jsp.java:933) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fforEach_005f0(find_005fsubscriber_jsp.java:752) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f2(find_005fsubscriber_jsp.java:543) > > org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:220) > > org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > > javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393) > > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320) > > org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) > > javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > > > note The full stack trace of the root cause is available in the Apache > Tomcat/6.0.14 logs. >
On 3/6/08, Alex Turner <armtuk@gmail.com> wrote: > I'm getting the back end closing connections early for some reason. > Here is an exception report from my servlet. This first started > happening with my instance of Trac, but now it's happening to my Java > apps too. I hope someone can shed some light on what is going on > here. You need to look in the server logs and see why it's crashing--all the exception traces tell us is that the server went away unexpectedly. -Doug
Ok - the connection closed thing is happening a lot, but not much is going into pg_log... Alex On Thu, Mar 6, 2008 at 4:03 PM, Douglas McNaught <doug@mcnaught.org> wrote: > On 3/6/08, Alex Turner <armtuk@gmail.com> wrote: > > Ok - lookint at the pg log, it appears that the server process is seg > > faulting :(. This might conceivably be my fault. I have 3 stored > > procedures written in C, but they've been on the server for months, > > and unless I didn't deploy them correctly originally I don't think > > it's them. > > > > I'm thinking I need a core file. I'm guessing I just restart > > postgresql from a user whos core file size limit is set to non zero? > > Or just put a 'ulimit' statement in the script that starts PG, or edit > /etc/security/limits.conf if you're on RedHat. > > -Doug >
Alex Turner wrote: > I'm getting the back end closing connections early for some reason. > Here is an exception report from my servlet. This first started > happening with my instance of Trac, but now it's happening to my Java > apps too. I hope someone can shed some light on what is going on > here. What do postgresql's server logs show? -- Richard Huxton Archonet Ltd
Sometimes I'm getting LOG: unexptected EOF on client connection Alex On Thu, Mar 6, 2008 at 4:06 PM, Alex Turner <armtuk@gmail.com> wrote: > Ok - the connection closed thing is happening a lot, but not much is > going into pg_log... > > Alex > > > > On Thu, Mar 6, 2008 at 4:03 PM, Douglas McNaught <doug@mcnaught.org> wrote: > > On 3/6/08, Alex Turner <armtuk@gmail.com> wrote: > > > Ok - lookint at the pg log, it appears that the server process is seg > > > faulting :(. This might conceivably be my fault. I have 3 stored > > > procedures written in C, but they've been on the server for months, > > > and unless I didn't deploy them correctly originally I don't think > > > it's them. > > > > > > I'm thinking I need a core file. I'm guessing I just restart > > > postgresql from a user whos core file size limit is set to non zero? > > > > Or just put a 'ulimit' statement in the script that starts PG, or edit > > /etc/security/limits.conf if you're on RedHat. > > > > -Doug > > >
Ok - more info I turned on connection logging a few other things, and normaly I get something like this: LOG: 00000: connection authorized: user=postgres database=trend LOCATION: BackendInitialize, postmaster.c:3097 LOG: 00000: disconnection: session time: 0:00:00.004 user=postgres database=trend host=localhost port=51586 but when it fails, I get nothing in the log at all... ALex On Thu, Mar 6, 2008 at 4:09 PM, Alex Turner <armtuk@gmail.com> wrote: > Sometimes I'm getting LOG: unexptected EOF on client connection > > Alex > > > > On Thu, Mar 6, 2008 at 4:06 PM, Alex Turner <armtuk@gmail.com> wrote: > > Ok - the connection closed thing is happening a lot, but not much is > > going into pg_log... > > > > Alex > > > > > > > > On Thu, Mar 6, 2008 at 4:03 PM, Douglas McNaught <doug@mcnaught.org> wrote: > > > On 3/6/08, Alex Turner <armtuk@gmail.com> wrote: > > > > Ok - lookint at the pg log, it appears that the server process is seg > > > > faulting :(. This might conceivably be my fault. I have 3 stored > > > > procedures written in C, but they've been on the server for months, > > > > and unless I didn't deploy them correctly originally I don't think > > > > it's them. > > > > > > > > I'm thinking I need a core file. I'm guessing I just restart > > > > postgresql from a user whos core file size limit is set to non zero? > > > > > > Or just put a 'ulimit' statement in the script that starts PG, or edit > > > /etc/security/limits.conf if you're on RedHat. > > > > > > -Doug > > > > > >
On 3/6/08, Alex Turner <armtuk@gmail.com> wrote: > Ok - lookint at the pg log, it appears that the server process is seg > faulting :(. This might conceivably be my fault. I have 3 stored > procedures written in C, but they've been on the server for months, > and unless I didn't deploy them correctly originally I don't think > it's them. > > I'm thinking I need a core file. I'm guessing I just restart > postgresql from a user whos core file size limit is set to non zero? Or just put a 'ulimit' statement in the script that starts PG, or edit /etc/security/limits.conf if you're on RedHat. -Doug
Alex Turner wrote: > Ok - lookint at the pg log, it appears that the server process is seg > faulting :(. This might conceivably be my fault. I have 3 stored > procedures written in C, but they've been on the server for months, > and unless I didn't deploy them correctly originally I don't think > it's them. You've had stored procedures written in C deployed on 8.3 for months? Or you had stored procedures written for 8.2 and have re-deployed onto 8.3? -- Richard Huxton Archonet Ltd
"Alex Turner" <armtuk@gmail.com> writes: > I'm thinking I need a core file. Yeah. > I'm guessing I just restart > postgresql from a user whos core file size limit is set to non zero? Probably won't help if your start script does "su postgres" as most do. What I'd try is putting the ulimit command in ~postgres/.profile or whatever init files your shell happens to like. regards, tom lane
I did a pg_ctl start from the postgres user... is that gonna work, or does pg_ctl do an su? Alex On Thu, Mar 6, 2008 at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Alex Turner" <armtuk@gmail.com> writes: > > I'm thinking I need a core file. > > Yeah. > > > > I'm guessing I just restart > > postgresql from a user whos core file size limit is set to non zero? > > Probably won't help if your start script does "su postgres" as most do. > What I'd try is putting the ulimit command in ~postgres/.profile or > whatever init files your shell happens to like. > > regards, tom lane >
Alex Turner wrote: > I had stored procs in C on 8.2 for months, and I moved them over to > 8.3 when we upgraded. And re-compiled them, yes? > The thing is that it's happening on a database > that doesn't have the stored procs. It seems to be happening worst on > a system that keeps connections open between requests. It's failing > about half the time. Hmm - if it's individual backends dying, and not those with the C functions then that should rule them out. I think that core dump will be your quickest way of figuring this out. -- Richard Huxton Archonet Ltd
It was core dumping on the 5th of March, but it hasn't since. It's just failing with the connection closed problem. It seems to happen worst with queries that are going to do updates and with connections that are persistent between http requests... I downgraded to 8.2, but it hasn't made any difference :( Alex On Fri, Mar 7, 2008 at 3:00 AM, Richard Huxton <dev@archonet.com> wrote: > Alex Turner wrote: > > I had stored procs in C on 8.2 for months, and I moved them over to > > 8.3 when we upgraded. > > And re-compiled them, yes? > > > > The thing is that it's happening on a database > > that doesn't have the stored procs. It seems to be happening worst on > > a system that keeps connections open between requests. It's failing > > about half the time. > > Hmm - if it's individual backends dying, and not those with the C > functions then that should rule them out. I think that core dump will be > your quickest way of figuring this out. > > -- > Richard Huxton > Archonet Ltd >
Alex Turner wrote: > It was core dumping on the 5th of March, but it hasn't since. It's > just failing with the connection closed problem. It seems to happen > worst with queries that are going to do updates and with connections > that are persistent between http requests... > > I downgraded to 8.2, but it hasn't made any difference :( It seems odd that nothing is being logged. Is there nothing in the system logs for the machine as a whole? You're not running out of memory and PG is being killed? Even then, you'd expect something to pop up in the logs. -- Richard Huxton Archonet Ltd
Nothin worth mentioning in /var/log/messages The wierd thing I do see is there are a number of sockets in CLOSE_WAIT when doing a netstat -an | grep 5432 I think maybe I'll just reboot and see if that fixes it. Alex On Fri, Mar 7, 2008 at 11:47 AM, Richard Huxton <dev@archonet.com> wrote: > Alex Turner wrote: > > It was core dumping on the 5th of March, but it hasn't since. It's > > just failing with the connection closed problem. It seems to happen > > worst with queries that are going to do updates and with connections > > that are persistent between http requests... > > > > I downgraded to 8.2, but it hasn't made any difference :( > > It seems odd that nothing is being logged. > > Is there nothing in the system logs for the machine as a whole? You're > not running out of memory and PG is being killed? Even then, you'd > expect something to pop up in the logs. > > -- > Richard Huxton > Archonet Ltd >
I didn't. And after the reboot, I still see 8 new sockets stuck in CLOSE_WAIT - I'm wondering if this is a hardware/kernel problem... Alex On Fri, Mar 7, 2008 at 11:50 AM, Alex Turner <armtuk@gmail.com> wrote: > Nothin worth mentioning in /var/log/messages > > The wierd thing I do see is there are a number of sockets in > CLOSE_WAIT when doing a netstat -an | grep 5432 > > I think maybe I'll just reboot and see if that fixes it. > > Alex > > > > On Fri, Mar 7, 2008 at 11:47 AM, Richard Huxton <dev@archonet.com> wrote: > > Alex Turner wrote: > > > It was core dumping on the 5th of March, but it hasn't since. It's > > > just failing with the connection closed problem. It seems to happen > > > worst with queries that are going to do updates and with connections > > > that are persistent between http requests... > > > > > > I downgraded to 8.2, but it hasn't made any difference :( > > > > It seems odd that nothing is being logged. > > > > Is there nothing in the system logs for the machine as a whole? You're > > not running out of memory and PG is being killed? Even then, you'd > > expect something to pop up in the logs. > > > > -- > > Richard Huxton > > Archonet Ltd > > >
On Fri, Mar 7, 2008 at 11:17 AM, Alex Turner <armtuk@gmail.com> wrote: > I didn't. And after the reboot, I still see 8 new sockets stuck in > CLOSE_WAIT - I'm wondering if this is a hardware/kernel problem... Having sockets in CLOSE_WAIT is actually pretty normal
Well - I think it might be that some of my servlets weren't closing their database connections properly. I do have some new evidence though: I did an strace of the tomcat processes, and I noticed something that might be odd, but I'm not really qualified to say. I notice that every time a socket sends a request to Postgresql it gets some kind of reply. This is true in all cases EXCEPT when the application crashes. Here is the segment of the strace right before it throws a wobbly: [pid 4565] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 156 [pid 4565] bind(156, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 4565] getsockname(156, {sa_family=AF_INET, sin_port=htons(56550), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 [pid 4565] connect(156, {sa_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 [pid 4565] setsockopt(156, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 4565] send(156, "\0\0\0W\0\3\0\0user\0postgres\0database\0t"..., 87, 0) = 87 [pid 4565] recv(156, "R\0\0\0\10\0\0\0\0S\0\0\0\34client_encoding\0UN"..., 8192, 0) = 279 [pid 4565] gettimeofday({1204948966, 386187}, NULL) = 0 [pid 4565] send(156, "P\0\0\1\35\0\r\n \t\tselect"..., 334, 0) = 334 [pid 4565] recv(156, "", 8192, 0) = 0 [pid 4565] send(156, "X\0\0\0\4", 5, 0) = 5 [pid 4565] dup2(11, 156) = 156 [pid 4565] close(156) = 0 Notice that the recv(156,... after sending the query comes back blank which seems odd given that we just sent a query to the database. I'm really in bind with this one. It started happening a couple of days ago at this point, and all our admin applications are basically down :(, people can't even log the bugs that this is generating because the bugtrac (trac) is running on this postgresql and is throwing errors too. I also caught something else that seemed wierd on another trace: [pid 3553] send(28, "P\0\0\0H\0delete from result_cache w"..., 108, 0) = 108 [pid 3553] recv(28, "N\0\0\1\202SWARNING\0C57P02\0Mterminatin"..., 8192, 0) = 387 [pid 3553] gettimeofday({1204946902, 977641}, NULL) = 0 [pid 3553] gettimeofday({1204946902, 977682}, NULL) = 0 [pid 3553] gettimeofday({1204946902, 977766}, NULL) = 0 [pid 3553] gettimeofday({1204946902, 977902}, NULL) = 0 [pid 3553] gettimeofday({1204946902, 977973}, NULL) = 0 [pid 3553] gettimeofday({1204946902, 978012}, NULL) = 0 [pid 3553] gettimeofday({1204946902, 978053}, NULL) = 0 [pid 3553] gettimeofday({1204946902, 978091}, NULL) = 0 [pid 3553] recv(28, "", 8192, 0) = 0 [pid 3553] send(28, "X\0\0\0\4", 5, 0) = -1 EPIPE (Broken pipe) [pid 3553] --- SIGPIPE (Broken pipe) @ 0 (0) --- [pid 3553] rt_sigreturn(0x9) = -1 EPIPE (Broken pipe) I couldn't reproduce this though. It just randomly throws a SIGPIPE after the query. The other wierd thing is that this process also throws a SIGSEGV at another point. I wasn't expecting tomcat to crash, so alas I didn't capture a core file. I guess I should set the system default up. Alex On Fri, Mar 7, 2008 at 2:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Fri, Mar 7, 2008 at 11:17 AM, Alex Turner <armtuk@gmail.com> wrote: > > I didn't. And after the reboot, I still see 8 new sockets stuck in > > CLOSE_WAIT - I'm wondering if this is a hardware/kernel problem... > > Having sockets in CLOSE_WAIT is actually pretty normal >
Well - I know that my stored proc is segfaulting based on a strace of postgresql. Don't know how that affects trac which isn't using that stored proc... the mystery continues. Either way I didn't get a corefile, and ulimit -a show I have unlimited core file size :( Alex On Fri, Mar 7, 2008 at 11:42 PM, Alex Turner <armtuk@gmail.com> wrote: > Well - I think it might be that some of my servlets weren't closing > their database connections properly. > > I do have some new evidence though: > > I did an strace of the tomcat processes, and I noticed something that > might be odd, but I'm not really qualified to say. I notice that > every time a socket sends a request to Postgresql it gets some kind of > reply. This is true in all cases EXCEPT when the application crashes. > Here is the segment of the strace right before it throws a wobbly: > > > [pid 4565] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 156 > [pid 4565] bind(156, {sa_family=AF_INET, sin_port=htons(0), > sin_addr=inet_addr("0.0.0.0")}, 16) = 0 > [pid 4565] getsockname(156, {sa_family=AF_INET, > sin_port=htons(56550), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 > [pid 4565] connect(156, {sa_family=AF_INET, sin_port=htons(5432), > sin_addr=inet_addr("127.0.0.1")}, 16) = 0 > [pid 4565] setsockopt(156, SOL_TCP, TCP_NODELAY, [1], 4) = 0 > [pid 4565] send(156, "\0\0\0W\0\3\0\0user\0postgres\0database\0t"..., > 87, 0) = 87 > [pid 4565] recv(156, > "R\0\0\0\10\0\0\0\0S\0\0\0\34client_encoding\0UN"..., 8192, 0) = 279 > [pid 4565] gettimeofday({1204948966, 386187}, NULL) = 0 > [pid 4565] send(156, "P\0\0\1\35\0\r\n \t\tselect"..., > 334, 0) = 334 > [pid 4565] recv(156, "", 8192, 0) = 0 > [pid 4565] send(156, "X\0\0\0\4", 5, 0) = 5 > [pid 4565] dup2(11, 156) = 156 > [pid 4565] close(156) = 0 > > > Notice that the recv(156,... after sending the query comes back blank > which seems odd given that we just sent a query to the database. > > I'm really in bind with this one. It started happening a couple of > days ago at this point, and all our admin applications are basically > down :(, people can't even log the bugs that this is generating > because the bugtrac (trac) is running on this postgresql and is > throwing errors too. > > I also caught something else that seemed wierd on another trace: > > [pid 3553] send(28, "P\0\0\0H\0delete from result_cache w"..., 108, 0) = 108 > [pid 3553] recv(28, "N\0\0\1\202SWARNING\0C57P02\0Mterminatin"..., > 8192, 0) = 387 > [pid 3553] gettimeofday({1204946902, 977641}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 977682}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 977766}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 977902}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 977973}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 978012}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 978053}, NULL) = 0 > [pid 3553] gettimeofday({1204946902, 978091}, NULL) = 0 > [pid 3553] recv(28, "", 8192, 0) = 0 > [pid 3553] send(28, "X\0\0\0\4", 5, 0) = -1 EPIPE (Broken pipe) > [pid 3553] --- SIGPIPE (Broken pipe) @ 0 (0) --- > [pid 3553] rt_sigreturn(0x9) = -1 EPIPE (Broken pipe) > > I couldn't reproduce this though. It just randomly throws a SIGPIPE > after the query. The other wierd thing is that this process also > throws a SIGSEGV at another point. I wasn't expecting tomcat to > crash, so alas I didn't capture a core file. I guess I should set the > system default up. > > Alex > > > > On Fri, Mar 7, 2008 at 2:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > > On Fri, Mar 7, 2008 at 11:17 AM, Alex Turner <armtuk@gmail.com> wrote: > > > I didn't. And after the reboot, I still see 8 new sockets stuck in > > > CLOSE_WAIT - I'm wondering if this is a hardware/kernel problem... > > > > Having sockets in CLOSE_WAIT is actually pretty normal > > >
On 3/8/08, Alex Turner <armtuk@gmail.com> wrote: > Well - I know that my stored proc is segfaulting based on a strace of > postgresql. Don't know how that affects trac which isn't using that > stored proc... the mystery continues. Either way I didn't get a > corefile, and ulimit -a show I have unlimited core file size :( * Are you absolutely sure that ulimit applies to the actual running postmaster process? * Are you sure you're looking in the right place for core files? -Doug
No I'm not. Where would a core file be if there was going to be one? I'm not sure how I can tell if the ulimit applies to the running postmaster I am the postgres user and ulimit -a show unlimited for core, and I run pg_ctl start. I have put it in that one place in /etc/ and also in ~/.bash_profile for postgres Alex On Sat, Mar 8, 2008 at 11:01 AM, Douglas McNaught <doug@mcnaught.org> wrote: > On 3/8/08, Alex Turner <armtuk@gmail.com> wrote: > > Well - I know that my stored proc is segfaulting based on a strace of > > postgresql. Don't know how that affects trac which isn't using that > > stored proc... the mystery continues. Either way I didn't get a > > corefile, and ulimit -a show I have unlimited core file size :( > > * Are you absolutely sure that ulimit applies to the actual running > postmaster process? > > * Are you sure you're looking in the right place for core files? > > -Doug >
On 3/8/08, Alex Turner <armtuk@gmail.com> wrote: > No I'm not. Where would a core file be if there was going to be one? They should appear in the data directory (e.g. /var/lib/pgsql/data). > I'm not sure how I can tell if the ulimit applies to the running > postmaster > > I am the postgres user and ulimit -a show unlimited for core, and I > run pg_ctl start. I have put it in that one place in /etc/ and also > in ~/.bash_profile for postgres That should work. It would be nice to be able to see the limits for a process through /proc, but unfortunately that's never been implemented... -Doug
On Sat, Mar 8, 2008 at 1:05 PM, Douglas McNaught <doug@mcnaught.org> wrote: > On 3/8/08, Alex Turner <armtuk@gmail.com> wrote: > > > No I'm not. Where would a core file be if there was going to be one? > > They should appear in the data directory (e.g. /var/lib/pgsql/data). > Yeah - thats where I was looking, so I'm guessing some where I don't have ulimit set up right for the process :(. My strace showed the segfault right after loading distance.so which is my shared object that contains my stored procs for that database, so I'm pretty sure it was in there. I found what was going on in that bit and fixed it so it's not crashing anymore, I'm just worried about how on earth that could have affected other back end processes that were querying unrelated databases. > > > I'm not sure how I can tell if the ulimit applies to the running > > postmaster > > > > I am the postgres user and ulimit -a show unlimited for core, and I > > run pg_ctl start. I have put it in that one place in /etc/ and also > > in ~/.bash_profile for postgres > > That should work. It would be nice to be able to see the limits for a > process through /proc, but unfortunately that's never been > implemented... > That would be nice. I wish there were more than 24 hours in a day so I could scratch some of my proverbial itches like that. > -Doug >
"Alex Turner" <armtuk@gmail.com> writes: > ... I found what was going on in that bit and fixed it so > it's not crashing anymore, I'm just worried about how on earth that > could have affected other back end processes that were querying > unrelated databases. You do know that a crash in any backend prompts the postmaster to restart the whole database cluster? I've forgotten the start of this thread, but what you say here sounds like expected behavior. regards, tom lane
Ah... no I didn't know that - that would explain all the other behaviour then!! Good to know. Alex On Sat, Mar 8, 2008 at 3:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Alex Turner" <armtuk@gmail.com> writes: > > ... I found what was going on in that bit and fixed it so > > > it's not crashing anymore, I'm just worried about how on earth that > > could have affected other back end processes that were querying > > unrelated databases. > > You do know that a crash in any backend prompts the postmaster to > restart the whole database cluster? I've forgotten the start of this > thread, but what you say here sounds like expected behavior. > > regards, tom lane >