Thread: Bug in libpq causes local clients to hang

Bug in libpq causes local clients to hang

From
"Jeffrey Baker"
Date:
Lately I've noticed that local (UNIX socket) clients using libpq4
8.1.9 (Debian 8.1.9-0etch1) and the same version of the server can
hang forever waiting in poll().  The symptom is that the local client
waits forever, using no CPU time, until it is interrupted by some
event (such as attaching gdb or strace to it), after which it proceeds
normally.  From the server's perspective, such clients are in the
state "<IDLE> in transaction" as reported via pg_stat_activity.  I
attached GDB to one such client, and the stack trace is as follows:

#0  0x00002b4f2f914d7f in poll () from /lib/libc.so.6
#1  0x00002b4f3038449f in PQmblen () from /usr/lib/libpq.so.4
#2  0x00002b4f30384580 in pqWaitTimed () from /usr/lib/libpq.so.4
#3  0x00002b4f30383e62 in PQgetResult () from /usr/lib/libpq.so.4
#4  0x00002b4f30383f3e in PQgetResult () from /usr/lib/libpq.so.4
#5  0x00002b4f3025f014 in dbd_st_execute () from
/usr/lib/perl5/auto/DBD/Pg/Pg.so
#6  0x00002b4f302548b6 in XS_DBD__Pg__db_do () from
/usr/lib/perl5/auto/DBD/Pg/Pg.so
#7  0x00002b4f2fd201f0 in XS_DBI_dispatch () from /usr/lib/perl5/auto/DBI/DBI.so
#8  0x00002b4f2f310b95 in Perl_pp_entersub () from /usr/lib/libperl.so.5.8
#9  0x00002b4f2f30f36e in Perl_runops_standard () from /usr/lib/libperl.so.5.8
#10 0x00002b4f2f2ba7dc in perl_run () from /usr/lib/libperl.so.5.8
#11 0x00000000004017ac in main ()

You'll note that I'm using the DBD::Pg Perl interface.  So far I've
never seen this happen with TCP connections, only with UNIX sockets.
I see it with about 1 in 100 local client invocations.

As a workaround I've configured my local clients to use TCP anyway,
and this seems to solve the problem.  Is this something that might
have been fixed in a post-8.1 version of libpq?

-jwb


Re: Bug in libpq causes local clients to hang

From
"Jeffrey Baker"
Date:
On Sun, Mar 23, 2008 at 7:12 PM, Jeffrey Baker <jwbaker@gmail.com> wrote:
> Lately I've noticed that local (UNIX socket) clients using libpq4
>  8.1.9 (Debian 8.1.9-0etch1) and the same version of the server can
>  hang forever waiting in poll().  The symptom is that the local client
>  waits forever, using no CPU time, until it is interrupted by some
>  event (such as attaching gdb or strace to it), after which it proceeds
>  normally.  From the server's perspective, such clients are in the
>  state "<IDLE> in transaction" as reported via pg_stat_activity.  I
>  attached GDB to one such client, and the stack trace is as follows:
>
>  #0  0x00002b4f2f914d7f in poll () from /lib/libc.so.6
>  #1  0x00002b4f3038449f in PQmblen () from /usr/lib/libpq.so.4
>  #2  0x00002b4f30384580 in pqWaitTimed () from /usr/lib/libpq.so.4
>  #3  0x00002b4f30383e62 in PQgetResult () from /usr/lib/libpq.so.4
>  #4  0x00002b4f30383f3e in PQgetResult () from /usr/lib/libpq.so.4
>  #5  0x00002b4f3025f014 in dbd_st_execute () from
>  /usr/lib/perl5/auto/DBD/Pg/Pg.so
>  #6  0x00002b4f302548b6 in XS_DBD__Pg__db_do () from
>  /usr/lib/perl5/auto/DBD/Pg/Pg.so
>  #7  0x00002b4f2fd201f0 in XS_DBI_dispatch () from /usr/lib/perl5/auto/DBI/DBI.so
>  #8  0x00002b4f2f310b95 in Perl_pp_entersub () from /usr/lib/libperl.so.5.8
>  #9  0x00002b4f2f30f36e in Perl_runops_standard () from /usr/lib/libperl.so.5.8
>  #10 0x00002b4f2f2ba7dc in perl_run () from /usr/lib/libperl.so.5.8
>  #11 0x00000000004017ac in main ()

Following up to myself, I note that a very similar issue was reported,
with a very similar stack, only two days ago, with subject "ecpg
program getting stuck" archived at

http://groups.google.com/group/pgsql.general/browse_thread/thread/0b7ede57faad803e/9abfd7ab1b7e1d86

-jwb


Re: Bug in libpq causes local clients to hang

From
Tom Lane
Date:
"Jeffrey Baker" <jwbaker@gmail.com> writes:
> You'll note that I'm using the DBD::Pg Perl interface.  So far I've
> never seen this happen with TCP connections, only with UNIX sockets.

If it works over TCP and not over Unix socket, it's a kernel bug.
The libpq code doesn't really know the difference after connection
setup.
        regards, tom lane


Re: Bug in libpq causes local clients to hang

From
"Jeffrey Baker"
Date:
On Sun, Mar 23, 2008 at 8:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Jeffrey Baker" <jwbaker@gmail.com> writes:
>  > You'll note that I'm using the DBD::Pg Perl interface.  So far I've
>  > never seen this happen with TCP connections, only with UNIX sockets.
>
>  If it works over TCP and not over Unix socket, it's a kernel bug.
>  The libpq code doesn't really know the difference after connection
>  setup.

The same thought occurred to me, but it could also be a race condition
which the unix socket is fast enough to trigger but the TCP socket is
not fast enough to trigger.  I'm peeking around in the code but
nothing jumps out yet.

-jwb


Re: Bug in libpq causes local clients to hang

From
Tom Lane
Date:
"Jeffrey Baker" <jwbaker@gmail.com> writes:
> On Sun, Mar 23, 2008 at 8:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If it works over TCP and not over Unix socket, it's a kernel bug.
>> The libpq code doesn't really know the difference after connection
>> setup.

> The same thought occurred to me, but it could also be a race condition
> which the unix socket is fast enough to trigger but the TCP socket is
> not fast enough to trigger.  I'm peeking around in the code but
> nothing jumps out yet.

Fairly hard to believe given that you're talking about communication
between two sequential processes.  Anyway I'd suggest that the first
thing to do is extract a reproducible test case.  It'd be useful
to see if it hangs on other platforms...
        regards, tom lane


Re: Bug in libpq causes local clients to hang

From
"Jeffrey Baker"
Date:
On Mon, Mar 24, 2008 at 9:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Jeffrey Baker" <jwbaker@gmail.com> writes:
>  > On Sun, Mar 23, 2008 at 8:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> >> If it works over TCP and not over Unix socket, it's a kernel bug.
>  >> The libpq code doesn't really know the difference after connection
>  >> setup.
>
>  > The same thought occurred to me, but it could also be a race condition
>  > which the unix socket is fast enough to trigger but the TCP socket is
>  > not fast enough to trigger.  I'm peeking around in the code but
>  > nothing jumps out yet.
>
>  Fairly hard to believe given that you're talking about communication
>  between two sequential processes.  Anyway I'd suggest that the first
>  thing to do is extract a reproducible test case.  It'd be useful
>  to see if it hangs on other platforms...

The stack trace doesn't actually make sense, does it?  I think that
(at least) the PQmblen frame is spurious.

-jwb


Re: Bug in libpq causes local clients to hang

From
Tom Lane
Date:
"Jeffrey Baker" <jwbaker@gmail.com> writes:
> The stack trace doesn't actually make sense, does it?  I think that
> (at least) the PQmblen frame is spurious.

Yeah, it's obviously lying to some extent.  I think you need a copy
of libpq with debug symbols enabled to get a more accurate trace.
        regards, tom lane