Re: Problems with 8.3 - Mailing list pgsql-general

From Alex Turner
Subject Re: Problems with 8.3
Date
Msg-id 33c6269f0803072042w35476925hf415003a555c42fe@mail.gmail.com
Whole thread Raw
In response to Re: Problems with 8.3  ("Scott Marlowe" <scott.marlowe@gmail.com>)
Responses Re: Problems with 8.3  ("Alex Turner" <armtuk@gmail.com>)
List pgsql-general
Well - I think it might be that some of my servlets weren't closing
their database connections properly.

I do have some new evidence though:

I did an strace of the tomcat processes, and I noticed something that
might be odd, but I'm not really qualified to say.  I notice that
every time a socket sends a request to Postgresql it gets some kind of
reply.  This is true in all cases EXCEPT when the application crashes.
 Here is the segment of the strace right before it throws a wobbly:


[pid  4565] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 156
[pid  4565] bind(156, {sa_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
[pid  4565] getsockname(156, {sa_family=AF_INET,
sin_port=htons(56550), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
[pid  4565] connect(156, {sa_family=AF_INET, sin_port=htons(5432),
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid  4565] setsockopt(156, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid  4565] send(156, "\0\0\0W\0\3\0\0user\0postgres\0database\0t"...,
87, 0) = 87
[pid  4565] recv(156,
"R\0\0\0\10\0\0\0\0S\0\0\0\34client_encoding\0UN"..., 8192, 0) = 279
[pid  4565] gettimeofday({1204948966, 386187}, NULL) = 0
[pid  4565] send(156, "P\0\0\1\35\0\r\n                \t\tselect"...,
334, 0) = 334
[pid  4565] recv(156, "", 8192, 0)      = 0
[pid  4565] send(156, "X\0\0\0\4", 5, 0) = 5
[pid  4565] dup2(11, 156)               = 156
[pid  4565] close(156)                  = 0


Notice that the recv(156,... after sending the query comes back blank
which seems odd given that we just sent a query to the database.

I'm really in bind with this one.  It started happening a couple of
days ago at this point, and all our admin applications are basically
down :(, people can't even log the bugs that this is generating
because the bugtrac (trac) is running on this postgresql and is
throwing errors too.

I also caught something else that seemed wierd on another trace:

[pid  3553] send(28, "P\0\0\0H\0delete from result_cache w"..., 108, 0) = 108
[pid  3553] recv(28, "N\0\0\1\202SWARNING\0C57P02\0Mterminatin"...,
8192, 0) = 387
[pid  3553] gettimeofday({1204946902, 977641}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 977682}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 977766}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 977902}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 977973}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 978012}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 978053}, NULL) = 0
[pid  3553] gettimeofday({1204946902, 978091}, NULL) = 0
[pid  3553] recv(28, "", 8192, 0)       = 0
[pid  3553] send(28, "X\0\0\0\4", 5, 0) = -1 EPIPE (Broken pipe)
[pid  3553] --- SIGPIPE (Broken pipe) @ 0 (0) ---
[pid  3553] rt_sigreturn(0x9)           = -1 EPIPE (Broken pipe)

I couldn't reproduce this though.  It just randomly throws a SIGPIPE
after the query.  The other wierd thing is that this process also
throws a SIGSEGV at another point.  I wasn't expecting tomcat to
crash, so alas I didn't capture a core file.  I guess I should set the
system default up.

Alex

On Fri, Mar 7, 2008 at 2:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Fri, Mar 7, 2008 at 11:17 AM, Alex Turner <armtuk@gmail.com> wrote:
>  > I didn't.  And after the reboot, I still see 8 new sockets stuck in
>  >  CLOSE_WAIT - I'm wondering if this is a hardware/kernel problem...
>
>  Having sockets in CLOSE_WAIT is actually pretty normal
>

pgsql-general by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Watch your PlanetPostgreSQL.org blogs
Next
From: "Alex Turner"
Date:
Subject: Re: Problems with 8.3