Thread: Orphaned statements issue

Orphaned statements issue

From
Josh Berkus
Date:
Hackers,

One of our clients is seeing an unusual issue with statements which are
waiting going into sleep forever and never completing.   This first
e-mail is for a "has anyone else seen this problem before?" while we try
to collect additional information for a diagnosis.

This problem only happens under load and only when performing load tests
that insert large data (250 Kib) into bytea columns.  It's takes a
couple hours but we've been able to reproduce the issue with a 100%
success rate.  Sometime the locked query shows up as '<BIND>' in the
pg_stat_activity sometimes it's a complicated query using a multi-table
left outer join, other times is a simple select.  The only thing in
common is that there is never a corresponding entry for that statement
in the pg_locks table, and if you drop the connection the query goes away.

An strace on the process shows it to be in RECV, and otherwise doing
nothing.  We have not been able to run GDB because it takes a couple
hours of running a heavy load test to cause the issue.  While memory is
heavily used during the test, there is no swapping during the test which
would indicate Linux memory management as the culprit.

We can reproduce the issue on 8.1.11 and 8.1.23.  Currently we are
working on testing it on 9.0 and seeing if we can reproduce the issue.

We compiled the postgres from the Redhat source RPM.  The only
modification that we make is the config file.

The OS is Centos 5.4 32bit.

Hardware:
IBM 3650
2 x Dual Core Intel Xeon 5160 @ 3.00 GHz
16 GB memory
6 x 146 GB SAS 10K RPM in RAID-5

Please note that while we can reproduce the issue, access to the test
system is fairly restrictive and test runs take a while, so I'd like to
get requests for additional information-collecting all at once if possible.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Orphaned statements issue

From
Alvaro Herrera
Date:
Excerpts from Josh Berkus's message of jue ene 20 18:05:15 -0300 2011:

> 
> One of our clients is seeing an unusual issue with statements which are
> waiting going into sleep forever and never completing.   This first
> e-mail is for a "has anyone else seen this problem before?" while we try
> to collect additional information for a diagnosis.

I have seen it -- on 8.1 too.  On our case it was caused by an insert
that was doing lots of toast insertions, so it needed to grab the
extension lock frequently for the toast table; and this was slowed
down by a largish shared_buffers setting, somehow (8.1 doesn't have lock
partitioning, so this was expensive).  I don't recall details on why
these were related.  If this is your case too, I doubt you'd be able to
reproduce it in 9.0 (not even in 8.2 which is when lock partitioning was
introduced).

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Orphaned statements issue

From
Josh Berkus
Date:
> I have seen it -- on 8.1 too.  On our case it was caused by an insert
> that was doing lots of toast insertions, so it needed to grab the
> extension lock frequently for the toast table; and this was slowed
> down by a largish shared_buffers setting, somehow (8.1 doesn't have lock
> partitioning, so this was expensive).  I don't recall details on why
> these were related.  If this is your case too, I doubt you'd be able to
> reproduce it in 9.0 (not even in 8.2 which is when lock partitioning was
> introduced).

Thanks, I'll bet that's the case.  This is happening on machines with
more RAM, so they've increased shared_buffers.

Now, to get them off 8.1.  Been trying for over a year now ...

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Orphaned statements issue

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> One of our clients is seeing an unusual issue with statements which are
> waiting going into sleep forever and never completing.   This first
> e-mail is for a "has anyone else seen this problem before?" while we try
> to collect additional information for a diagnosis.

> An strace on the process shows it to be in RECV, and otherwise doing
> nothing.

I would take that to mean that it's waiting on the client.
        regards, tom lane


Re: Orphaned statements issue

From
Josh Berkus
Date:
> I would take that to mean that it's waiting on the client.

You mean that the client timed out and isn't accepting data from the
query anymore?  Shouldn't Postgres time out on that after a while?  In
one case, the orphaned statement was 16 hours old before we killed it.

If it's relevant, the client connection is from a C application via
libpq on localhost.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Orphaned statements issue

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> I would take that to mean that it's waiting on the client.

> You mean that the client timed out and isn't accepting data from the
> query anymore?

No, if the backend is in RECV state, it's waiting for the client to
*send* it something.

(Although if this is an SSL connection, it's a bit harder to be sure
about what the logical state of the connection is.)

> Shouldn't Postgres time out on that after a while?

Not if the problem is the client is confused.  As long as the remote-end
kernel doesn't indicate the connection is dead, we'll wait for the
client to wake up and send us a command.
        regards, tom lane


Re: Orphaned statements issue

From
Josh Berkus
Date:
On 1/20/11 2:26 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>>> I would take that to mean that it's waiting on the client.
> 
>> You mean that the client timed out and isn't accepting data from the
>> query anymore?
> 
> No, if the backend is in RECV state, it's waiting for the client to
> *send* it something.

I don't think that's consistent with what we're seeing except maybe in
the <BIND> case.  In the other cases, there's a query supposedly
executing.  But, will look for that possibility.


--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com