Re: odd intermittent query hanging issue - Mailing list pgsql-general

From Steve Crawford
Subject Re: odd intermittent query hanging issue
Date
Msg-id 4FBA6861.7080200@pinpointresearch.com
Whole thread Raw
In response to odd intermittent query hanging issue  (Aaron Burnett <aburnett@bzzagent.com>)
List pgsql-general
On 05/20/2012 07:45 AM, Aaron Burnett wrote:
> Hey Steve,
>
> Just wanted to send a note of thanks for pointing me in a few new
> directions on this.
>
> Turns out that the query would swap but not all the time. When it swapped,
> it wouldn't finish, if it didn't swap it would finish in the expected
> time. Not sure why this isn't consistent...
>
> The work_mem was set at 100MB so as a test I increased it to 200MB on the
> session and so far it has consistently finished every time it has run. I'm
> not sure why this has reared it's head now as we run far more complex and
> data intensive queries than this one and they don't swap... so I guess I
> have some forensics to do.

You're welcome. Assuming that changing work_mem is a confirmed solution,
it sounds like the query is running right on the edge of running out of
work_mem and when nudged over that threshold starts to swap. PostgreSQL,
like most other things in computers, does not degrade linearly. Things
work great till you run out of some resource then degrade dramatically.

The work_mem setting specifies memory available for internal sorts and
hash operations. Looking at the original explain plan you posted I see
lots of such operations - perhaps your other queries don't need them:

explain plan:
------------------------------
  Unique  (cost=237234.66..239267.33 rows=406533 width=4) (actual
time=4790.823..4922.621 rows=418843 loops=1)
    ->   Sort  (cost=237234.66..238251.00 rows=406533 width=4) (actual
time=4790.818..4826.201 rows=418879 loops=1)
          Sort Key: member.id
          Sort Method: quicksort  Memory: 31923kB
          ->   Hash Join  (cost=167890.31..199359.99 rows=406533 width=4)
(actual time=3357.406..4532.952 rows=418879 loops=1)
                Hash Cond: (xrcm.member_id = member.id)
                ->   Seq Scan on xrefchannelmember xrcm
(cost=0.00..19273.69 rows=813066 width=4) (actual time=0.015..219.259 row
s=814421 loops=1)
                      Filter: (channel_id = 1)
                ->   Hash  (cost=162586.21..162586.21 rows=424328 width=4)
(actual time=3357.001..3357.001 rows=444626 loops=1)
                      Buckets: 65536  Batches: 1  Memory Usage: 15632kB
                      ->   Seq Scan on member  (cost=66114.02..162586.21
rows=424328 width=4) (actual time=2357.280..3216.076 rows
=444626 loops=1)
                            Filter: (NOT (hashed SubPlan 1))
                            SubPlan 1
                              ->   Seq Scan on xrefcampaignmember
(cost=0.00..57931.82 rows=3272882 width=4) (actual time=0.021..
670.086 rows=3272870 loops=1)
  Total runtime: 4963.134 ms

I'm still bothered by a couple things but I don't have a enough
understanding of internals to provide definitive advice:

1. Why does reindexing seem to temporarily solve the problem?...

2...especially when I don't see indexes being used in the query

I would definitely run explain on some of the alternative versions of
the query posted earlier to see if they do a better job of using
available indexes or reducing the amount of internal hashing/sorting. If
they do, you will have some "ammo" to take back to your developer.

Cheers,
Steve


pgsql-general by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: Global Named Prepared Statements
Next
From: Merlin Moncure
Date:
Subject: Re: Libpq question