Thread: Blocking excessively in FOR UPDATE

Blocking excessively in FOR UPDATE

From
Claudio Freire
Date:
Hi list,

I've been experiencing a weird performance issue lately.

I have a very simple (and usually very fast) query:

SELECT track_logs.id
FROM track_logs
WHERE track_logs.track_id = <some id> AND track_logs.track_status_id =
1 AND track_logs.date >= now() - interval '1 hours'
FOR UPDATE

Whose plan is:

"LockRows  (cost=0.00..26.73 rows=1 width=14)"
"  ->  Index Scan using idx_track_logs_track_id on track_logs
(cost=0.00..26.72 rows=1 width=14)"
"        Index Cond: (track_id = <some id>)"
"        Filter: ((track_status_id = 1) AND (date >= (now() -
'01:00:00'::interval)))"

The same query, without FOR UPDATE, takes just 68 milliseconds.

With the FOR UPDATE, it takes like half a minute or more to finish.

Now, I understand the for update part may be blocking on some other
transaction, and it's probably the case.
But I cannot figure out which transaction it would be. There *are*, in
fact, connections in <idle in transaction> state, which makes me think
those would be the culprit. But for the life of me, I cannot make
sense of the pg_locks view, which shows all locks as granted:


PID           Relation    XID            TX            Mode            Granted     Start
14751    5551986                154/4038460    AccessShareLock    Yes        2011-11-03 12:45:03.551516-05
14751    5526310                154/4038460    RowShareLock        Yes        2011-11-03 12:45:03.551516-05
14751    5552008                154/4038460    RowExclusiveLock    Yes        2011-11-03 12:45:03.551516-05
14751    5552020                154/4038460    RowExclusiveLock    Yes        2011-11-03 12:45:03.551516-05
14751    5552008                154/4038460    AccessShareLock    Yes        2011-11-03 12:45:03.551516-05
14751    5525296                154/4038460    RowShareLock        Yes        2011-11-03 12:45:03.551516-05
14751    5525292                154/4038460    RowShareLock        Yes        2011-11-03 12:45:03.551516-05
14751    5552019                154/4038460    AccessShareLock    Yes        2011-11-03 12:45:03.551516-05
14751    5552019                154/4038460    RowExclusiveLock    Yes        2011-11-03 12:45:03.551516-05
14751    5552020                154/4038460    AccessShareLock    Yes        2011-11-03 12:45:03.551516-05
14751    5525292                154/4038460    RowExclusiveLock    Yes        2011-11-03 12:45:03.551516-05
14751            154/4038460    154/4038460    ExclusiveLock        Yes        2011-11-03
12:45:03.551516-05
14751                        154/4038460    ExclusiveLock        Yes        2011-11-03 12:45:03.551516-05
14751    5526308                154/4038460    AccessShareLock    Yes        2011-11-03 12:45:03.551516-05

Where should I look?
What other information should I provide?

Re: Blocking excessively in FOR UPDATE

From
Claudio Freire
Date:
On Thu, Nov 3, 2011 at 2:51 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> What other information should I provide?

Forgot all the usual details:

Server is postgresql 9.0.3, running in linux, quite loaded (load
average ~7), WAL on raid 1 2 spindles, data on raid 10 4 spindles, 16G
RAM.

Could it be high contention between the worker processes? (because of
the high load)

Re: Blocking excessively in FOR UPDATE

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> The same query, without FOR UPDATE, takes just 68 milliseconds.
> With the FOR UPDATE, it takes like half a minute or more to finish.

> Now, I understand the for update part may be blocking on some other
> transaction, and it's probably the case.

Yeah, that's what I'd guess.

> But I cannot figure out which transaction it would be. There *are*, in
> fact, connections in <idle in transaction> state, which makes me think
> those would be the culprit. But for the life of me, I cannot make
> sense of the pg_locks view, which shows all locks as granted:

A block on a row would typically show up as one transaction waiting on
another's XID.  Did you capture this *while* the query was blocked?
Also, I'm suspicious that you may be using a view that filters out
the relevant lock types --- that's obviously not a raw display of
pg_locks.

            regards, tom lane

Re: Blocking excessively in FOR UPDATE

From
Claudio Freire
Date:
On Thu, Nov 3, 2011 at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> But I cannot figure out which transaction it would be. There *are*, in
>> fact, connections in <idle in transaction> state, which makes me think
>> those would be the culprit. But for the life of me, I cannot make
>> sense of the pg_locks view, which shows all locks as granted:
>
> A block on a row would typically show up as one transaction waiting on
> another's XID.  Did you capture this *while* the query was blocked?

Yes

> Also, I'm suspicious that you may be using a view that filters out
> the relevant lock types --- that's obviously not a raw display of
> pg_locks.

It's pgadmin, which I usually use to monitor pg_stats_activity and
pg_locks in a "pretty" view.
pg_locks does not show the query, only the pid, so it's harder to spot.

Next time I find it blocking, I will check pg_locks directly and post
the output.

I did that once, and they were all granted. I didn't correlate with
other XIDs since I thought the "granted" column meant it wasn't
waiting. Is that wrong?

Re: Blocking excessively in FOR UPDATE

From
Claudio Freire
Date:
On Thu, Nov 3, 2011 at 4:29 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> Next time I find it blocking, I will check pg_locks directly and post
> the output.

Here it is, two instances of the query, while blocked:

select * from pg_locks where pid = 22636;

   locktype    | database | relation | page | tuple | virtualxid  |
transactionid | classid | objid | objsubid | virtualtransaction |  pid
 |       mode       | granted

---------------+----------+----------+------+-------+-------------+---------------+---------+-------+----------+--------------------+-------+------------------+---------
 transactionid |          |          |      |       |             |
 360992199 |         |       |          | 89/22579344        | 22636 |
ExclusiveLock    | t
 virtualxid    |          |          |      |       | 89/22579344 |
           |         |       |          | 89/22579344        | 22636 |
ExclusiveLock    | t
 relation      |    16398 |  5552020 |      |       |             |
           |         |       |          | 89/22579344        | 22636 |
AccessShareLock  | t
 relation      |    16398 |  5552020 |      |       |             |
           |         |       |          | 89/22579344        | 22636 |
RowExclusiveLock | t
 relation      |    16398 |  5552019 |      |       |             |
           |         |       |          | 89/22579344        | 22636 |
AccessShareLock  | t
 relation      |    16398 |  5552019 |      |       |             |
           |         |       |          | 89/22579344        | 22636 |
RowExclusiveLock | t
 relation      |    16398 |  5525292 |      |       |             |
           |         |       |          | 89/22579344        | 22636 |
RowShareLock     | t
 relation      |    16398 |  5525292 |      |       |             |
           |         |       |          | 89/22579344        | 22636 |
RowExclusiveLock | t
 relation      |    16398 |  5552008 |      |       |             |
           |         |       |          | 89/22579344        | 22636 |
AccessShareLock  | t
 relation      |    16398 |  5552008 |      |       |             |
           |         |       |          | 89/22579344        | 22636 |
RowExclusiveLock | t
(10 rows)

select * from pg_locks where pid = 22618;

   locktype    | database | relation | page | tuple | virtualxid  |
transactionid | classid | objid | objsubid | virtualtransaction |  pid
 |       mode       | granted

---------------+----------+----------+------+-------+-------------+---------------+---------+-------+----------+--------------------+-------+------------------+---------
 virtualxid    |          |          |      |       | 159/2706505 |
           |         |       |          | 159/2706505        | 22618 |
ExclusiveLock    | t
 relation      |    16398 |  5551986 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
AccessShareLock  | t
 transactionid |          |          |      |       |             |
 360992478 |         |       |          | 159/2706505        | 22618 |
ExclusiveLock    | t
 relation      |    16398 |  5552008 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
AccessShareLock  | t
 relation      |    16398 |  5552008 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
RowExclusiveLock | t
 relation      |    16398 |  5526310 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
RowShareLock     | t
 relation      |    16398 |  5552020 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
AccessShareLock  | t
 relation      |    16398 |  5552020 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
RowExclusiveLock | t
 relation      |    16398 |  5526308 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
AccessShareLock  | t
 relation      |    16398 |  5552019 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
AccessShareLock  | t
 relation      |    16398 |  5552019 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
RowExclusiveLock | t
 relation      |    16398 |  5525296 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
RowShareLock     | t
 relation      |    16398 |  5525292 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
RowShareLock     | t
 relation      |    16398 |  5525292 |      |       |             |
           |         |       |          | 159/2706505        | 22618 |
RowExclusiveLock | t
(14 rows)

Re: Blocking excessively in FOR UPDATE

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> On Thu, Nov 3, 2011 at 4:29 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> Next time I find it blocking, I will check pg_locks directly and post
>> the output.

> Here it is, two instances of the query, while blocked:

Hmm ... definitely seems that you're not blocked on a FOR UPDATE tuple
lock.  If you were, there'd be an ungranted ShareLock on a transactionid
in there.

It seems possible that you're blocked on an LWLock, which would not show
in pg_locks.  But before pursuing that idea, probably first you should
back up and confirm whether the process is actually waiting, or running,
or just really slow due to CPU contention.  It might be useful to see
what strace has to say about it.

            regards, tom lane

Re: Blocking excessively in FOR UPDATE

From
Claudio Freire
Date:
On Thu, Nov 3, 2011 at 8:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  But before pursuing that idea, probably first you should
> back up and confirm whether the process is actually waiting, or running,
> or just really slow due to CPU contention.  It might be useful to see
> what strace has to say about it.

Thanks for the tip, it seems locks had nothing to do with it.

strace suggests those queries get blocked on write().

This is an explain analyze without for update:

"Index Scan using idx_track_logs_track_id on track_logs
(cost=0.00..26.75 rows=1 width=8) (actual time=0.056..38.119 rows=1
loops=1)"
"  Index Cond: (track_id = <some id>)"
"  Filter: ((track_status_id = 1) AND (date >= (now() - '01:00:00'::interval)))"
"  Buffers: shared hit=140 read=3127"
"Total runtime: 38.147 ms"

This is with for update that goes fast:

"LockRows  (cost=0.00..26.76 rows=1 width=14) (actual
time=0.075..37.420 rows=1 loops=1)"
"  Buffers: shared hit=63 read=3205"
"  ->  Index Scan using idx_track_logs_track_id on track_logs
(cost=0.00..26.75 rows=1 width=14) (actual time=0.058..37.402 rows=1
loops=1)"
"        Index Cond: (track_id = <some id>)"
"        Filter: ((track_status_id = 1) AND (date >= (now() -
'01:00:00'::interval)))"
"        Buffers: shared hit=62 read=3205"
"Total runtime: 37.462 ms"

I cannot hit one that goes slow yet, but when I did (and didn't
capture the output :( ) it was kinda like:


"LockRows  (cost=0.00..26.76 rows=1 width=14) (actual
time=0.075..37.420 rows=1 loops=1)"
"  Buffers: shared hit=63 read=3205"
"  ->  Index Scan using idx_track_logs_track_id on track_logs
(cost=0.00..26.75 rows=1 width=14) (actual time=0.058..37.402 rows=1
loops=1)"
"        Index Cond: (track_id = <some id>)"
"        Filter: ((track_status_id = 1) AND (date >= (now() -
'01:00:00'::interval)))"
"        Buffers: shared hit=62 read=3205 written=135"
"Total runtime: 37000.462 ms"

Now, I'm thinking those writes are catching the DB at a bad moment -
we do have regular very write-intensive peaks.

Maybe I should look into increasing shared buffers?
Checkpoints are well spread and very fast

What are those writes about? HOT vacuuming perhaps?

Re: Blocking excessively in FOR UPDATE

From
Robert Haas
Date:
On Fri, Nov 4, 2011 at 12:07 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> What are those writes about? HOT vacuuming perhaps?

Every tuple lock requires dirtying the page.  Those writes are all
those dirty pages getting flushed out to disk.  It's possible that the
OS is allowing the writes to happen asynchronously for a while, but
then when you get too much dirty data in the cache, it starts
blocking.

The only thing I'm fuzzy about is why it's locking so many rows, given
that the output says rows=1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Blocking excessively in FOR UPDATE

From
"Kevin Grittner"
Date:
Claudio Freire <klaussfreire@gmail.com> wrote:

> Now, I'm thinking those writes are catching the DB at a bad moment
-
> we do have regular very write-intensive peaks.
>
> Maybe I should look into increasing shared buffers?

As already pointed out, SELECT FOR UPDATE will require a disk write
of the tuple(s) read.  If these are glutting, increasing
shared_buffers would tend to make things worse.  You might want to
make the background writer more aggressive or *reduce*
shared_buffers to better spread the output.

Hopefully you have a BBU RAID controller configured for write-back.
If not you should.  If you do, another thing which might help is
increasing the cache on that controller.  Or you could move WAL to a
separate file system on a separate controller with its own BBU
write-back cache.

-Kevin

Re: Blocking excessively in FOR UPDATE

From
Claudio Freire
Date:
On Fri, Nov 4, 2011 at 1:26 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> As already pointed out, SELECT FOR UPDATE will require a disk write
> of the tuple(s) read.  If these are glutting, increasing
> shared_buffers would tend to make things worse.

I thought shared_buffers improved write caching.
We do tend to write onto the same rows over and over.

> Hopefully you have a BBU RAID controller configured for write-back.
> If not you should.  If you do, another thing which might help is
> increasing the cache on that controller.

Write-back on the RAID is another issue.
The controller is old and has a very small cache, we're already in the
process of securing a replacement.

> Or you could move WAL to a
> separate file system on a separate controller with its own BBU
> write-back cache.

WAL is already on a separate controller and array.

Re: Blocking excessively in FOR UPDATE

From
"Kevin Grittner"
Date:
Claudio Freire <klaussfreire@gmail.com> wrote:
> On Fri, Nov 4, 2011 at 1:26 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> As already pointed out, SELECT FOR UPDATE will require a disk
>> write of the tuple(s) read.  If these are glutting, increasing
>> shared_buffers would tend to make things worse.
>
> I thought shared_buffers improved write caching.
> We do tend to write onto the same rows over and over.

PostgreSQL is very aggressive about holding on to dirty buffers as
long as possible, in hopes of reducing duplicate page writes.  This
can work against the goal of consistent latency.  In our shop we
needed to make the background writer more aggressive and keep shared
buffers in the 0.5GB to 2GB range (depending on hardware and
workload) to prevent write gluts leading to latency spikes.  In the
mildly surprising department, the OS seemed to do a decent job of
spotting pages receiving repeated writes and hold back on an OS
level write, while pushing other pages out in a more timely fashion
-- there was no discernible increase in OS write activity from
making these changes.  I know other people have had other
experiences, based on their workloads (and OS versions?).

Before anything else, you might want to make sure you've spread your
checkpoint activity as much as possible by setting
checkpoint_completion_target = 0.9.

-Kevin

Re: Blocking excessively in FOR UPDATE

From
Claudio Freire
Date:
On Fri, Nov 4, 2011 at 2:07 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Before anything else, you might want to make sure you've spread your
> checkpoint activity as much as possible by setting
> checkpoint_completion_target = 0.9.

We have

shared_buffers = 2G
bgwriter_delay = 1000ms
effective_io_concurrency=8
synchronous_commit=off
wal_buffers=16M
wal_writer_delay=2000ms
commit_delay=10000
checkpoint_segments=72
checkpoint_timeout=60min
checkpoint_completion_target=0.8

I'm thinking bgwriter_delay and wal_writer_delay might not be working
as I expected, and that maybe checkpoint_segments=72 is a bit too
high, but we were having much worse I/O storms before I pushed it that
high. Looking at checkpoint logging for the last few days, it goes
almost always like:

checkpoint complete: wrote 589 buffers (3.6%); 0 transaction log
file(s) added, 0 removed, 8 recycled; write=590.325 s, sync=0.055 s,
total=590.417 s

590s seems an awful lot for 589 buffers.

Re: Blocking excessively in FOR UPDATE

From
Shaun Thomas
Date:
On 11/04/2011 12:22 PM, Claudio Freire wrote:

> bgwriter_delay = 1000ms
> wal_writer_delay=2000ms
> commit_delay=10000

!?

Maybe someone can back me up on this, but my interpretation of these
settings suggests they're *way* too high. That commit_delay especially
makes me want to cry. From the manual:

"Setting commit_delay can only help when there are many concurrently
committing transactions, and it is difficult to tune it to a value that
actually helps rather than hurt throughput."

Meaning it may halt all of your commits up to *ten seconds* if it
doesn't think there was enough activity to warrant a write. Ouch.

Your bgwriter_delay and wal_writer_delay settings are equally odd.
You've made the background writer so passive that when it does finally
run, it's going to have a ton of work to do, causing giant write spikes.
I'm not sure whether or not this also causes compounding problems with
locks and backend write delays.

> checkpoint complete: wrote 589 buffers (3.6%); 0 transaction log
> file(s) added, 0 removed, 8 recycled; write=590.325 s, sync=0.055 s,
> total=590.417 s
>
> 590s seems an awful lot for 589 buffers.

You're misinterpreting this. The checkpoint completion target is
multiplied against your checkpoint timeout. 590 seconds is roughly ten
minutes, and for 589 buffers, it wrote one per second. That's about as
slow as it can possibly write that many buffers. It had *up to* 24
minutes, and if it had more buffers available to write, it would have
written them. The number you really care about is "sync=0.055 s" which
is how much time the controller spent syncing that data to disk.

If you're having real problems writing or lock delays due to IO stalls,
you'll see that sync parameter shoot way up. This can also be elevated
in certain NVRAM-based solutions. Once you start seeing whole seconds,
or minutes, it might actually matter.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

Re: Blocking excessively in FOR UPDATE

From
Claudio Freire
Date:
On Fri, Nov 4, 2011 at 3:26 PM, Shaun Thomas <sthomas@peak6.com> wrote:
> On 11/04/2011 12:22 PM, Claudio Freire wrote:
>
>> bgwriter_delay = 1000ms
>> wal_writer_delay=2000ms
>> commit_delay=10000
>
> !?
>snip
> "Setting commit_delay can only help when there are many concurrently
> committing transactions, and it is difficult to tune it to a value that
> actually helps rather than hurt throughput."
>
> Meaning it may halt all of your commits up to *ten seconds* if it doesn't
> think there was enough activity to warrant a write. Ouch.

I think you're misinterpreting the value.
It's in microseconds, that's 10 *milli*seconds

> Your bgwriter_delay and wal_writer_delay settings are equally odd. You've
> made the background writer so passive that when it does finally run, it's
> going to have a ton of work to do, causing giant write spikes. I'm not sure
> whether or not this also causes compounding problems with locks and backend
> write delays.

I don't think 1 second can be such a big difference for the bgwriter,
but I might be wrong.

The wal_writer makes me doubt, though. If logged activity was higher
than 8MB/s, then that setting would block it all.
I guess I really should lower it.

> The number you
> really care about is "sync=0.055 s" which is how much time the controller
> spent syncing that data to disk.
>
> If you're having real problems writing or lock delays due to IO stalls,
> you'll see that sync parameter shoot way up. This can also be elevated in
> certain NVRAM-based solutions. Once you start seeing whole seconds, or
> minutes, it might actually matter.

Nice to know, I thought so, now I know so.
:-)

So... I'm leaning towards lowering wal_writer_delay and see how it goes.

Re: Blocking excessively in FOR UPDATE

From
Robert Haas
Date:
On Fri, Nov 4, 2011 at 2:45 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> I don't think 1 second can be such a big difference for the bgwriter,
> but I might be wrong.

Well, the default value is 200 ms.   And I've never before heard of
anyone tuning it up, except maybe to save on power consumption on a
system with very low utilization.  Nearly always you want to reduce
it.

> The wal_writer makes me doubt, though. If logged activity was higher
> than 8MB/s, then that setting would block it all.
> I guess I really should lower it.

Here again, you've set it to ten times the default value.  That
doesn't seem like a good idea.  I would start with the default and
tune down.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Blocking excessively in FOR UPDATE

From
Shaun Thomas
Date:
On 11/04/2011 01:45 PM, Claudio Freire wrote:

> I think you're misinterpreting the value.
> It's in microseconds, that's 10 *milli*seconds

Wow. My brain totally skimmed over that section. Everything else is in
milliseconds, so I never even considered it. Sorry about that!

I stand by everything else though. ;)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

Re: Blocking excessively in FOR UPDATE

From
Claudio Freire
Date:
On Fri, Nov 4, 2011 at 3:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Nov 4, 2011 at 2:45 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> I don't think 1 second can be such a big difference for the bgwriter,
>> but I might be wrong.
>
> Well, the default value is 200 ms.   And I've never before heard of
> anyone tuning it up, except maybe to save on power consumption on a
> system with very low utilization.  Nearly always you want to reduce
> it.

Will try

>> The wal_writer makes me doubt, though. If logged activity was higher
>> than 8MB/s, then that setting would block it all.
>> I guess I really should lower it.
>
> Here again, you've set it to ten times the default value.  That
> doesn't seem like a good idea.  I would start with the default and
> tune down.

Already did that. Waiting to see how it turns out.

Re: Blocking excessively in FOR UPDATE

From
Claudio Freire
Date:
On Fri, Nov 4, 2011 at 4:07 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> Here again, you've set it to ten times the default value.  That
>> doesn't seem like a good idea.  I would start with the default and
>> tune down.
>
> Already did that. Waiting to see how it turns out.

Nope, still happening with those changes.

Though it did make sense that those settings were too high, it didn't
fix the strange blocking.

Is it possible that the query is locking all the tuples hit, rather
than only the ones selected?

Because the index used to reach the tuple has to walk across around 3k
tuples before finding the one that needs locking. They're supposed to
be in memory already (they're quite hot), that's why selecting is
fast, but maybe it's trying to lock all 3k tuples?

I don't know, I'm just throwing punches blindly at this point.