Thread: Problems with 8.3

Problems with 8.3

From
"Alex Turner"
Date:
I'm getting the back end closing connections early for some reason.
Here is an exception report from my servlet.  This first started
happening with my instance of Trac, but now it's happening to my Java
apps too.  I hope someone can shed some light on what is going on
here.

Alex


HTTP Status 500 -

type Exception report

message

description The server encountered an internal error () that prevented
it from fulfilling this request.

exception

org.apache.jasper.JasperException: javax.servlet.ServletException:
javax.servlet.jsp.JspException:


        select venue,
city, a.state from venue a, city b, subscriber c where
a.city_id=b.city_id and
distance(deg2rad(c.latitude),deg2rad(c.longitude), deg2rad(a.latitude),
deg2rad(a.longitude))<60.0 and c.subscriber_id=?




        : An I/O
error occured while sending to the backend.

org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:541)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:417)

org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)

org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)

javax.servlet.http.HttpServlet.service(HttpServlet.java:803)


root cause

javax.servlet.ServletException: javax.servlet.jsp.JspException:


        select venue,
city, a.state from venue a, city b, subscriber c where
a.city_id=b.city_id and
distance(deg2rad(c.latitude),deg2rad(c.longitude), deg2rad(a.latitude),
deg2rad(a.longitude))<60.0 and c.subscriber_id=?




        : An I/O
error occured while sending to the backend.

org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:846)

org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:779)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:420)

org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)

javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393)

org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)

org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)

javax.servlet.http.HttpServlet.service(HttpServlet.java:803)


root cause

org.postgresql.util.PSQLException: An I/O error occured while
sending to the backend.

org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)

org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)

org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:351)

org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:255)

org.apache.taglibs.standard.tag.common.sql.QueryTagSupport.doEndTag(QueryTagSupport.java:215)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_sql_005fquery_005f2(find_005fsubscriber_jsp.java:1388)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f6(find_005fsubscriber_jsp.java:933)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fforEach_005f0(find_005fsubscriber_jsp.java:752)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f2(find_005fsubscriber_jsp.java:543)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:220)

org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)

javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393)

org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)

org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)

javax.servlet.http.HttpServlet.service(HttpServlet.java:803)


root cause

java.io.EOFException

org.postgresql.core.PGStream.ReceiveChar(PGStream.java:257)

org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1165)

org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:191)

org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)

org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:351)

org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:255)

org.apache.taglibs.standard.tag.common.sql.QueryTagSupport.doEndTag(QueryTagSupport.java:215)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_sql_005fquery_005f2(find_005fsubscriber_jsp.java:1388)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f6(find_005fsubscriber_jsp.java:933)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fforEach_005f0(find_005fsubscriber_jsp.java:752)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f2(find_005fsubscriber_jsp.java:543)

org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:220)

org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)

javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393)

org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)

org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)

javax.servlet.http.HttpServlet.service(HttpServlet.java:803)


note The full stack trace of the root cause is available in the Apache
Tomcat/6.0.14 logs.

Re: Problems with 8.3

From
"Alex Turner"
Date:
Ok - lookint at the pg log, it appears that the server process is seg
faulting :(.  This might conceivably be my fault.  I have 3 stored
procedures written in C, but they've been on the server for months,
and unless I didn't deploy them correctly originally I don't think
it's them.

I'm thinking I need a core file.  I'm guessing I just restart
postgresql from a user whos core file size limit is set to non zero?

Alex

On Thu, Mar 6, 2008 at 3:49 PM, Alex Turner <armtuk@gmail.com> wrote:
> I'm getting the back end closing connections early for some reason.
>  Here is an exception report from my servlet.  This first started
>  happening with my instance of Trac, but now it's happening to my Java
>  apps too.  I hope someone can shed some light on what is going on
>  here.
>
>  Alex
>
>
>  HTTP Status 500 -
>
>  type Exception report
>
>  message
>
>  description The server encountered an internal error () that prevented
>  it from fulfilling this request.
>
>  exception
>
>  org.apache.jasper.JasperException: javax.servlet.ServletException:
>  javax.servlet.jsp.JspException:
>
>
>         select venue,
>  city, a.state from venue a, city b, subscriber c where
>  a.city_id=b.city_id and
>  distance(deg2rad(c.latitude),deg2rad(c.longitude), deg2rad(a.latitude),
>  deg2rad(a.longitude))<60.0 and c.subscriber_id=?
>
>
>
>
>         : An I/O
>  error occured while sending to the backend.
>
>  org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:541)
>
>  org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:417)
>
>  org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)
>
>  org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
>
>  javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
>
>
>  root cause
>
>  javax.servlet.ServletException: javax.servlet.jsp.JspException:
>
>
>         select venue,
>  city, a.state from venue a, city b, subscriber c where
>  a.city_id=b.city_id and
>  distance(deg2rad(c.latitude),deg2rad(c.longitude), deg2rad(a.latitude),
>  deg2rad(a.longitude))<60.0 and c.subscriber_id=?
>
>
>
>
>         : An I/O
>  error occured while sending to the backend.
>
>  org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:846)
>
>  org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:779)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:420)
>
>  org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>
>  javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
>
>  org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393)
>
>  org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)
>
>  org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
>
>  javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
>
>
>  root cause
>
>  org.postgresql.util.PSQLException: An I/O error occured while
>  sending to the backend.
>
>  org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)
>
>  org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
>
>  org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:351)
>
>  org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:255)
>
>  org.apache.taglibs.standard.tag.common.sql.QueryTagSupport.doEndTag(QueryTagSupport.java:215)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_sql_005fquery_005f2(find_005fsubscriber_jsp.java:1388)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f6(find_005fsubscriber_jsp.java:933)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fforEach_005f0(find_005fsubscriber_jsp.java:752)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f2(find_005fsubscriber_jsp.java:543)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:220)
>
>  org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>
>  javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
>
>  org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393)
>
>  org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)
>
>  org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
>
>  javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
>
>
>  root cause
>
>  java.io.EOFException
>
>  org.postgresql.core.PGStream.ReceiveChar(PGStream.java:257)
>
>  org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1165)
>
>  org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:191)
>
>  org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
>
>  org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:351)
>
>  org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:255)
>
>  org.apache.taglibs.standard.tag.common.sql.QueryTagSupport.doEndTag(QueryTagSupport.java:215)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_sql_005fquery_005f2(find_005fsubscriber_jsp.java:1388)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f6(find_005fsubscriber_jsp.java:933)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fforEach_005f0(find_005fsubscriber_jsp.java:752)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspx_meth_c_005fif_005f2(find_005fsubscriber_jsp.java:543)
>
>  org.apache.jsp.admin.find_005fsubscriber_jsp._jspService(find_005fsubscriber_jsp.java:220)
>
>  org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
>
>  javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
>
>  org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393)
>
>  org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)
>
>  org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
>
>  javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
>
>
>  note The full stack trace of the root cause is available in the Apache
>  Tomcat/6.0.14 logs.
>

Re: Problems with 8.3

From
"Douglas McNaught"
Date:
On 3/6/08, Alex Turner <armtuk@gmail.com> wrote:
> I'm getting the back end closing connections early for some reason.
>  Here is an exception report from my servlet.  This first started
>  happening with my instance of Trac, but now it's happening to my Java
>  apps too.  I hope someone can shed some light on what is going on
>  here.

You need to look in the server logs and see why it's crashing--all the
exception traces tell us is that the server went away unexpectedly.

-Doug

Re: Problems with 8.3

From
"Alex Turner"
Date:
Ok - the connection closed thing is happening a lot, but not much is
going into pg_log...

Alex

On Thu, Mar 6, 2008 at 4:03 PM, Douglas McNaught <doug@mcnaught.org> wrote:
> On 3/6/08, Alex Turner <armtuk@gmail.com> wrote:
>  > Ok - lookint at the pg log, it appears that the server process is seg
>  >  faulting :(.  This might conceivably be my fault.  I have 3 stored
>  >  procedures written in C, but they've been on the server for months,
>  >  and unless I didn't deploy them correctly originally I don't think
>  >  it's them.
>  >
>  >  I'm thinking I need a core file.  I'm guessing I just restart
>  >  postgresql from a user whos core file size limit is set to non zero?
>
>  Or just put a 'ulimit' statement in the script that starts PG, or edit
>  /etc/security/limits.conf if you're on RedHat.
>
>  -Doug
>

Re: Problems with 8.3

From
Richard Huxton
Date:
Alex Turner wrote:
> I'm getting the back end closing connections early for some reason.
> Here is an exception report from my servlet.  This first started
> happening with my instance of Trac, but now it's happening to my Java
> apps too.  I hope someone can shed some light on what is going on
> here.

What do postgresql's server logs show?

--
   Richard Huxton
   Archonet Ltd

Re: Problems with 8.3

From
"Alex Turner"
Date:
Sometimes I'm getting LOG: unexptected EOF on client connection

Alex

On Thu, Mar 6, 2008 at 4:06 PM, Alex Turner <armtuk@gmail.com> wrote:
> Ok - the connection closed thing is happening a lot, but not much is
>  going into pg_log...
>
>  Alex
>
>
>
>  On Thu, Mar 6, 2008 at 4:03 PM, Douglas McNaught <doug@mcnaught.org> wrote:
>  > On 3/6/08, Alex Turner <armtuk@gmail.com> wrote:
>  >  > Ok - lookint at the pg log, it appears that the server process is seg
>  >  >  faulting :(.  This might conceivably be my fault.  I have 3 stored
>  >  >  procedures written in C, but they've been on the server for months,
>  >  >  and unless I didn't deploy them correctly originally I don't think
>  >  >  it's them.
>  >  >
>  >  >  I'm thinking I need a core file.  I'm guessing I just restart
>  >  >  postgresql from a user whos core file size limit is set to non zero?
>  >
>  >  Or just put a 'ulimit' statement in the script that starts PG, or edit
>  >  /etc/security/limits.conf if you're on RedHat.
>  >
>  >  -Doug
>  >
>

Re: Problems with 8.3

From
"Alex Turner"
Date:
Ok - more info

I turned on connection logging a few other things, and normaly I get
something like this:

LOG:  00000: connection authorized: user=postgres database=trend
LOCATION:  BackendInitialize, postmaster.c:3097
LOG:  00000: disconnection: session time: 0:00:00.004 user=postgres
database=trend host=localhost port=51586

but when it fails, I get nothing in the log at all...

ALex

On Thu, Mar 6, 2008 at 4:09 PM, Alex Turner <armtuk@gmail.com> wrote:
> Sometimes I'm getting LOG: unexptected EOF on client connection
>
>  Alex
>
>
>
>  On Thu, Mar 6, 2008 at 4:06 PM, Alex Turner <armtuk@gmail.com> wrote:
>  > Ok - the connection closed thing is happening a lot, but not much is
>  >  going into pg_log...
>  >
>  >  Alex
>  >
>  >
>  >
>  >  On Thu, Mar 6, 2008 at 4:03 PM, Douglas McNaught <doug@mcnaught.org> wrote:
>  >  > On 3/6/08, Alex Turner <armtuk@gmail.com> wrote:
>  >  >  > Ok - lookint at the pg log, it appears that the server process is seg
>  >  >  >  faulting :(.  This might conceivably be my fault.  I have 3 stored
>  >  >  >  procedures written in C, but they've been on the server for months,
>  >  >  >  and unless I didn't deploy them correctly originally I don't think
>  >  >  >  it's them.
>  >  >  >
>  >  >  >  I'm thinking I need a core file.  I'm guessing I just restart
>  >  >  >  postgresql from a user whos core file size limit is set to non zero?
>  >  >
>  >  >  Or just put a 'ulimit' statement in the script that starts PG, or edit
>  >  >  /etc/security/limits.conf if you're on RedHat.
>  >  >
>  >  >  -Doug
>  >  >
>  >
>

Re: Problems with 8.3

From
"Douglas McNaught"
Date:
On 3/6/08, Alex Turner <armtuk@gmail.com> wrote:
> Ok - lookint at the pg log, it appears that the server process is seg
>  faulting :(.  This might conceivably be my fault.  I have 3 stored
>  procedures written in C, but they've been on the server for months,
>  and unless I didn't deploy them correctly originally I don't think
>  it's them.
>
>  I'm thinking I need a core file.  I'm guessing I just restart
>  postgresql from a user whos core file size limit is set to non zero?

Or just put a 'ulimit' statement in the script that starts PG, or edit
/etc/security/limits.conf if you're on RedHat.

-Doug

Re: Problems with 8.3

From
Richard Huxton
Date:
Alex Turner wrote:
> Ok - lookint at the pg log, it appears that the server process is seg
> faulting :(.  This might conceivably be my fault.  I have 3 stored
> procedures written in C, but they've been on the server for months,
> and unless I didn't deploy them correctly originally I don't think
> it's them.

You've had stored procedures written in C deployed on 8.3 for months?

Or you had stored procedures written for 8.2 and have re-deployed onto 8.3?

--
   Richard Huxton
   Archonet Ltd

Re: Problems with 8.3

From
Tom Lane
Date:
"Alex Turner" <armtuk@gmail.com> writes:
> I'm thinking I need a core file.

Yeah.

> I'm guessing I just restart
> postgresql from a user whos core file size limit is set to non zero?

Probably won't help if your start script does "su postgres" as most do.
What I'd try is putting the ulimit command in ~postgres/.profile or
whatever init files your shell happens to like.

            regards, tom lane

Re: Problems with 8.3

From
"Alex Turner"
Date:
I did a pg_ctl start from the postgres user... is that gonna work, or
does pg_ctl do an su?

Alex

On Thu, Mar 6, 2008 at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Alex Turner" <armtuk@gmail.com> writes:
>  > I'm thinking I need a core file.
>
>  Yeah.
>
>
>  > I'm guessing I just restart
>  > postgresql from a user whos core file size limit is set to non zero?
>
>  Probably won't help if your start script does "su postgres" as most do.
>  What I'd try is putting the ulimit command in ~postgres/.profile or
>  whatever init files your shell happens to like.
>
>                         regards, tom lane
>

Re: Problems with 8.3

From
Richard Huxton
Date:
Alex Turner wrote:
> I had stored procs in C on 8.2 for months, and I moved them over to
> 8.3 when we upgraded.

And re-compiled them, yes?

 > The thing is that it's happening on a database
> that doesn't have the stored procs.  It seems to be happening worst on
> a system that keeps connections open between requests.  It's failing
> about half the time.

Hmm - if it's individual backends dying, and not those with the C
functions then that should rule them out. I think that core dump will be
your quickest way of figuring this out.

--
   Richard Huxton
   Archonet Ltd

Re: Problems with 8.3

From
"Alex Turner"
Date:
It was core dumping on the 5th of March, but it hasn't since.  It's
just failing with the connection closed problem.  It seems to happen
worst with queries that are going to do updates and with connections
that are persistent between http requests...

I downgraded to 8.2, but it hasn't made any difference :(

Alex

On Fri, Mar 7, 2008 at 3:00 AM, Richard Huxton <dev@archonet.com> wrote:
> Alex Turner wrote:
>  > I had stored procs in C on 8.2 for months, and I moved them over to
>  > 8.3 when we upgraded.
>
>  And re-compiled them, yes?
>
>
>   > The thing is that it's happening on a database
>  > that doesn't have the stored procs.  It seems to be happening worst on
>  > a system that keeps connections open between requests.  It's failing
>  > about half the time.
>
>  Hmm - if it's individual backends dying, and not those with the C
>  functions then that should rule them out. I think that core dump will be
>  your quickest way of figuring this out.
>
>  --
>    Richard Huxton
>    Archonet Ltd
>

Re: Problems with 8.3

From
Richard Huxton
Date:
Alex Turner wrote:
> It was core dumping on the 5th of March, but it hasn't since.  It's
> just failing with the connection closed problem.  It seems to happen
> worst with queries that are going to do updates and with connections
> that are persistent between http requests...
>
> I downgraded to 8.2, but it hasn't made any difference :(

It seems odd that nothing is being logged.

Is there nothing in the system logs for the machine as a whole? You're
not running out of memory and PG is being killed? Even then, you'd
expect something to pop up in the logs.

--
   Richard Huxton
   Archonet Ltd

Re: Problems with 8.3

From
"Alex Turner"
Date:
Nothin worth mentioning in /var/log/messages

The wierd thing I do see is there are a number of sockets in
CLOSE_WAIT when doing a netstat -an | grep 5432

I think maybe I'll just reboot and see if that fixes it.

Alex

On Fri, Mar 7, 2008 at 11:47 AM, Richard Huxton <dev@archonet.com> wrote:
> Alex Turner wrote:
>  > It was core dumping on the 5th of March, but it hasn't since.  It's
>  > just failing with the connection closed problem.  It seems to happen
>  > worst with queries that are going to do updates and with connections
>  > that are persistent between http requests...
>  >
>  > I downgraded to 8.2, but it hasn't made any difference :(
>
>  It seems odd that nothing is being logged.
>
>  Is there nothing in the system logs for the machine as a whole? You're
>  not running out of memory and PG is being killed? Even then, you'd
>  expect something to pop up in the logs.
>
>  --
>    Richard Huxton
>    Archonet Ltd
>

Re: Problems with 8.3

From
"Alex Turner"
Date:
I didn't.  And after the reboot, I still see 8 new sockets stuck in
CLOSE_WAIT - I'm wondering if this is a hardware/kernel problem...

Alex

On Fri, Mar 7, 2008 at 11:50 AM, Alex Turner <armtuk@gmail.com> wrote:
> Nothin worth mentioning in /var/log/messages
>
>  The wierd thing I do see is there are a number of sockets in
>  CLOSE_WAIT when doing a netstat -an | grep 5432
>
>  I think maybe I'll just reboot and see if that fixes it.
>
>  Alex
>
>
>
>  On Fri, Mar 7, 2008 at 11:47 AM, Richard Huxton <dev@archonet.com> wrote:
>  > Alex Turner wrote:
>  >  > It was core dumping on the 5th of March, but it hasn't since.  It's
>  >  > just failing with the connection closed problem.  It seems to happen
>  >  > worst with queries that are going to do updates and with connections
>  >  > that are persistent between http requests...
>  >  >
>  >  > I downgraded to 8.2, but it hasn't made any difference :(
>  >
>  >  It seems odd that nothing is being logged.
>  >
>  >  Is there nothing in the system logs for the machine as a whole? You're
>  >  not running out of memory and PG is being killed? Even then, you'd
>  >  expect something to pop up in the logs.
>  >
>  >  --
>  >    Richard Huxton
>  >    Archonet Ltd
>  >
>

Re: Problems with 8.3

From
"Scott Marlowe"
Date:
On Fri, Mar 7, 2008 at 11:17 AM, Alex Turner <armtuk@gmail.com> wrote:
> I didn't.  And after the reboot, I still see 8 new sockets stuck in
>  CLOSE_WAIT - I'm wondering if this is a hardware/kernel problem...

Having sockets in CLOSE_WAIT is actually pretty normal

Re: Problems with 8.3

From
"Alex Turner"
Date:
Well - I think it might be that some of my servlets weren't closing
their database connections properly.

I do have some new evidence though:

I did an strace of the tomcat processes, and I noticed something that
might be odd, but I'm not really qualified to say.  I notice that
every time a socket sends a request to Postgresql it gets some kind of
reply.  This is true in all cases EXCEPT when the application crashes.
 Here is the segment of the strace right before it throws a wobbly:


[pid  4565] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 156
[pid  4565] bind(156, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
[pid  4565] getsockname(156, {sa_family=AF_INET,
sin_port=htons(56550), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
[pid  4565] connect(156, {sa_family=AF_INET, sin_port=htons(5432),
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid  4565] setsockopt(156, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid  4565] send(156, "\0\0\0W\0\3\0\0user\0postgres\0database\0t"...,
87, 0) = 87
[pid  4565] recv(156,
"R\0\0\0\10\0\0\0\0S\0\0\0\34client_encoding\0UN"..., 8192, 0) = 279
[pid  4565] gettimeofday({1204948966, 386187}, NULL) = 0
[pid  4565] send(156, "P\0\0\1\35\0\r\n                \t\tselect"...,
334, 0) = 334
[pid  4565] recv(156, "", 8192, 0)      = 0
[pid  4565] send(156, "X\0\0\0\4", 5, 0) = 5
[pid  4565] dup2(11, 156)               = 156
[pid  4565] close(156)                  = 0


Notice that the recv(156,... after sending the query comes back blank
which seems odd given that we just sent a query to the database.

I'm really in bind with this one.  It started happening a couple of
days ago at this point, and all our admin applications are basically
down :(, people can't even log the bugs that this is generating
because the bugtrac (trac) is running on this postgresql and is
throwing errors too.

I also caught something else that seemed wierd on another trace:

[pid  3553] send(28, "P\0\0\0H\0delete from result_cache w"..., 108, 0) = 108
[pid  3553] recv(28, "N\0\0\1\202SWARNING\0C57P02\0Mterminatin"...,
8192, 0) = 387
[pid  3553] gettimeofday({1204946902, 977641}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 977682}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 977766}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 977902}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 977973}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 978012}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 978053}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 978091}, NULL) = 0
[pid  3553] recv(28, "", 8192, 0)       = 0
[pid  3553] send(28, "X\0\0\0\4", 5, 0) = -1 EPIPE (Broken pipe)
[pid  3553] --- SIGPIPE (Broken pipe) @ 0 (0) ---
[pid  3553] rt_sigreturn(0x9)           = -1 EPIPE (Broken pipe)

I couldn't reproduce this though.  It just randomly throws a SIGPIPE
after the query.  The other wierd thing is that this process also
throws a SIGSEGV at another point.  I wasn't expecting tomcat to
crash, so alas I didn't capture a core file.  I guess I should set the
system default up.

Alex

On Fri, Mar 7, 2008 at 2:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Fri, Mar 7, 2008 at 11:17 AM, Alex Turner <armtuk@gmail.com> wrote:
>  > I didn't.  And after the reboot, I still see 8 new sockets stuck in
>  >  CLOSE_WAIT - I'm wondering if this is a hardware/kernel problem...
>
>  Having sockets in CLOSE_WAIT is actually pretty normal
>

Re: Problems with 8.3

From
"Alex Turner"
Date:
Well - I know that my stored proc is segfaulting based on a strace of
postgresql.  Don't know how that affects trac which isn't using that
stored proc... the mystery continues.  Either way I didn't get a
corefile, and ulimit -a show I have unlimited core file size :(

Alex

On Fri, Mar 7, 2008 at 11:42 PM, Alex Turner <armtuk@gmail.com> wrote:
> Well - I think it might be that some of my servlets weren't closing
>  their database connections properly.
>
>  I do have some new evidence though:
>
>  I did an strace of the tomcat processes, and I noticed something that
>  might be odd, but I'm not really qualified to say.  I notice that
>  every time a socket sends a request to Postgresql it gets some kind of
>  reply.  This is true in all cases EXCEPT when the application crashes.
>   Here is the segment of the strace right before it throws a wobbly:
>
>
>  [pid  4565] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 156
>  [pid  4565] bind(156, {sa_family=AF_INET, sin_port=htons(0),
>  sin_addr=inet_addr("0.0.0.0")}, 16) = 0
>  [pid  4565] getsockname(156, {sa_family=AF_INET,
>  sin_port=htons(56550), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
>  [pid  4565] connect(156, {sa_family=AF_INET, sin_port=htons(5432),
>  sin_addr=inet_addr("127.0.0.1")}, 16) = 0
>  [pid  4565] setsockopt(156, SOL_TCP, TCP_NODELAY, [1], 4) = 0
>  [pid  4565] send(156, "\0\0\0W\0\3\0\0user\0postgres\0database\0t"...,
>  87, 0) = 87
>  [pid  4565] recv(156,
>  "R\0\0\0\10\0\0\0\0S\0\0\0\34client_encoding\0UN"..., 8192, 0) = 279
>  [pid  4565] gettimeofday({1204948966, 386187}, NULL) = 0
>  [pid  4565] send(156, "P\0\0\1\35\0\r\n                \t\tselect"...,
>  334, 0) = 334
>  [pid  4565] recv(156, "", 8192, 0)      = 0
>  [pid  4565] send(156, "X\0\0\0\4", 5, 0) = 5
>  [pid  4565] dup2(11, 156)               = 156
>  [pid  4565] close(156)                  = 0
>
>
>  Notice that the recv(156,... after sending the query comes back blank
>  which seems odd given that we just sent a query to the database.
>
>  I'm really in bind with this one.  It started happening a couple of
>  days ago at this point, and all our admin applications are basically
>  down :(, people can't even log the bugs that this is generating
>  because the bugtrac (trac) is running on this postgresql and is
>  throwing errors too.
>
>  I also caught something else that seemed wierd on another trace:
>
>  [pid  3553] send(28, "P\0\0\0H\0delete from result_cache w"..., 108, 0) = 108
>  [pid  3553] recv(28, "N\0\0\1\202SWARNING\0C57P02\0Mterminatin"...,
>  8192, 0) = 387
>  [pid  3553] gettimeofday({1204946902, 977641}, NULL) = 0
>  [pid  3553] gettimeofday({1204946902, 977682}, NULL) = 0
>  [pid  3553] gettimeofday({1204946902, 977766}, NULL) = 0
>  [pid  3553] gettimeofday({1204946902, 977902}, NULL) = 0
>  [pid  3553] gettimeofday({1204946902, 977973}, NULL) = 0
>  [pid  3553] gettimeofday({1204946902, 978012}, NULL) = 0
>  [pid  3553] gettimeofday({1204946902, 978053}, NULL) = 0
>  [pid  3553] gettimeofday({1204946902, 978091}, NULL) = 0
>  [pid  3553] recv(28, "", 8192, 0)       = 0
>  [pid  3553] send(28, "X\0\0\0\4", 5, 0) = -1 EPIPE (Broken pipe)
>  [pid  3553] --- SIGPIPE (Broken pipe) @ 0 (0) ---
>  [pid  3553] rt_sigreturn(0x9)           = -1 EPIPE (Broken pipe)
>
>  I couldn't reproduce this though.  It just randomly throws a SIGPIPE
>  after the query.  The other wierd thing is that this process also
>  throws a SIGSEGV at another point.  I wasn't expecting tomcat to
>  crash, so alas I didn't capture a core file.  I guess I should set the
>  system default up.
>
>  Alex
>
>
>
>  On Fri, Mar 7, 2008 at 2:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>  > On Fri, Mar 7, 2008 at 11:17 AM, Alex Turner <armtuk@gmail.com> wrote:
>  >  > I didn't.  And after the reboot, I still see 8 new sockets stuck in
>  >  >  CLOSE_WAIT - I'm wondering if this is a hardware/kernel problem...
>  >
>  >  Having sockets in CLOSE_WAIT is actually pretty normal
>  >
>

Re: Problems with 8.3

From
"Douglas McNaught"
Date:
On 3/8/08, Alex Turner <armtuk@gmail.com> wrote:
> Well - I know that my stored proc is segfaulting based on a strace of
>  postgresql.  Don't know how that affects trac which isn't using that
>  stored proc... the mystery continues.  Either way I didn't get a
>  corefile, and ulimit -a show I have unlimited core file size :(

* Are you absolutely sure that ulimit applies to the actual running
postmaster process?

* Are you sure you're looking in the right place for core files?

-Doug

Re: Problems with 8.3

From
"Alex Turner"
Date:
No I'm not.  Where would a core file be if there was going to be one?
I'm not sure how I can tell if the ulimit applies to the running
postmaster

I am the postgres user and ulimit -a show unlimited for core, and I
run pg_ctl start.  I have put it in that one place in /etc/ and also
in ~/.bash_profile for postgres

Alex

On Sat, Mar 8, 2008 at 11:01 AM, Douglas McNaught <doug@mcnaught.org> wrote:
> On 3/8/08, Alex Turner <armtuk@gmail.com> wrote:
>  > Well - I know that my stored proc is segfaulting based on a strace of
>  >  postgresql.  Don't know how that affects trac which isn't using that
>  >  stored proc... the mystery continues.  Either way I didn't get a
>  >  corefile, and ulimit -a show I have unlimited core file size :(
>
>  * Are you absolutely sure that ulimit applies to the actual running
>  postmaster process?
>
>  * Are you sure you're looking in the right place for core files?
>
>  -Doug
>

Re: Problems with 8.3

From
"Douglas McNaught"
Date:
On 3/8/08, Alex Turner <armtuk@gmail.com> wrote:
> No I'm not.  Where would a core file be if there was going to be one?

They should appear in the data directory (e.g. /var/lib/pgsql/data).

>  I'm not sure how I can tell if the ulimit applies to the running
>  postmaster
>
>  I am the postgres user and ulimit -a show unlimited for core, and I
>  run pg_ctl start.  I have put it in that one place in /etc/ and also
>  in ~/.bash_profile for postgres

That should work.  It would be nice to be able to see the limits for a
process through /proc, but unfortunately that's never been
implemented...

-Doug

Re: Problems with 8.3

From
"Alex Turner"
Date:
On Sat, Mar 8, 2008 at 1:05 PM, Douglas McNaught <doug@mcnaught.org> wrote:
> On 3/8/08, Alex Turner <armtuk@gmail.com> wrote:
>
> > No I'm not.  Where would a core file be if there was going to be one?
>
>  They should appear in the data directory (e.g. /var/lib/pgsql/data).
>

Yeah - thats where I was looking, so I'm guessing some where I don't
have ulimit set up right for the process :(.  My strace showed the
segfault right after loading distance.so which is my shared object
that contains my stored procs for that database, so I'm pretty sure it
was in there.  I found what was going on in that bit and fixed it so
it's not crashing anymore, I'm just worried about how on earth that
could have affected other back end processes that were querying
unrelated databases.

>
>  >  I'm not sure how I can tell if the ulimit applies to the running
>  >  postmaster
>  >
>  >  I am the postgres user and ulimit -a show unlimited for core, and I
>  >  run pg_ctl start.  I have put it in that one place in /etc/ and also
>  >  in ~/.bash_profile for postgres
>
>  That should work.  It would be nice to be able to see the limits for a
>  process through /proc, but unfortunately that's never been
>  implemented...
>

That would be nice. I wish there were more than 24 hours in a day so I
could scratch some of my proverbial itches like that.

>  -Doug
>

Re: Problems with 8.3

From
Tom Lane
Date:
"Alex Turner" <armtuk@gmail.com> writes:
> ... I found what was going on in that bit and fixed it so
> it's not crashing anymore, I'm just worried about how on earth that
> could have affected other back end processes that were querying
> unrelated databases.

You do know that a crash in any backend prompts the postmaster to
restart the whole database cluster?  I've forgotten the start of this
thread, but what you say here sounds like expected behavior.

            regards, tom lane

Re: Problems with 8.3

From
"Alex Turner"
Date:
Ah... no I didn't know that - that would explain all the other
behaviour then!!  Good to know.

Alex

On Sat, Mar 8, 2008 at 3:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Alex Turner" <armtuk@gmail.com> writes:
>  > ... I found what was going on in that bit and fixed it so
>
> > it's not crashing anymore, I'm just worried about how on earth that
>  > could have affected other back end processes that were querying
>  > unrelated databases.
>
>  You do know that a crash in any backend prompts the postmaster to
>  restart the whole database cluster?  I've forgotten the start of this
>  thread, but what you say here sounds like expected behavior.
>
>                         regards, tom lane
>