Hash Anti Join performance degradation - Mailing list pgsql-performance

From panam
Subject Hash Anti Join performance degradation
Date
Msg-id 1306210464806-4420974.post@n5.nabble.com
Whole thread Raw
Responses Re: Hash Anti Join performance degradation
Re: Hash Anti Join performance degradation
Re: Hash Anti Join performance degradation
List pgsql-performance
Hi,

In my DB, there is a large table containing messages and one that contains
message boxes.
Messages are assigned to boxes via a child parent link m->b.
In order to obtain the last message for a specific box, I use the following
SQL:

SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id
AND m1.id < m2.id) WHERE m2.id IS NULL AND m1.box_id = id;

This worked quite well for a long time. But now, suddenly the performance of
this query drastically degraded as new messages were added.
If these new messages are removed again, everything is back to normal. If
other messages are removed instead, the problem remains, so it does not seem
to be a memory issue. I fear I have difficulties to understand what is going
wrong here.

This is the query plan when everything is fine:

"Seq Scan on public.box this_  (cost=0.00..10467236.32 rows=128 width=696)
(actual time=0.169..7683.978 rows=128 loops=1)"
"  Output: this_.id, this_.login, (SubPlan 1)"
"  Buffers: shared hit=188413 read=94635 written=135, temp read=22530
written=22374"
"  SubPlan 1"
"    ->  Hash Anti Join  (cost=41323.25..81775.25 rows=20427 width=8)
(actual time=59.571..59.877 rows=1 loops=128)"
"          Output: m1.id"
"          Hash Cond: (m1.box_id = m2.box_id)"
"          Join Filter: (m1.id < m2.id)"
"          Buffers: shared hit=188412 read=94633 written=135, temp
read=22530 written=22374"
"          ->  Bitmap Heap Scan on public.message m1  (cost=577.97..40212.28
rows=30640 width=16) (actual time=3.152..9.514 rows=17982 loops=128)"
"                Output: m1.id, m1.box_id"
"                Recheck Cond: (m1.box_id = $0)"
"                Buffers: shared hit=131993 read=9550 written=23"
"                ->  Bitmap Index Scan on "message_box_Idx"
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.840..2.840 rows=18193
loops=128)"
"                      Index Cond: (m1.box_id = $0)"
"                      Buffers: shared hit=314 read=6433 written=23"
"          ->  Hash  (cost=40212.28..40212.28 rows=30640 width=16) (actual
time=26.840..26.840 rows=20014 loops=115)"
"                Output: m2.box_id, m2.id"
"                Buckets: 4096  Batches: 4 (originally 2)  Memory Usage:
5444kB"
"                Buffers: shared hit=56419 read=85083 written=112, temp
written=7767"
"                ->  Bitmap Heap Scan on public.message m2
(cost=577.97..40212.28 rows=30640 width=16) (actual time=2.419..20.007
rows=20014 loops=115)"
"                      Output: m2.box_id, m2.id"
"                      Recheck Cond: (m2.box_id = $0)"
"                      Buffers: shared hit=56419 read=85083 written=112"
"                      ->  Bitmap Index Scan on "message_box_Idx"
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.166..2.166 rows=20249
loops=115)"
"                            Index Cond: (m2.box_id = $0)"
"                            Buffers: shared hit=6708"
"Total runtime: 7685.202 ms"

This is the plan when the query gets sluggish:

"Seq Scan on public.box this_  (cost=0.00..10467236.32 rows=128 width=696)
(actual time=0.262..179333.086 rows=128 loops=1)"
"  Output: this_.id, this_.login, (SubPlan 1)"
"  Buffers: shared hit=189065 read=93983 written=10, temp read=22668
written=22512"
"  SubPlan 1"
"    ->  Hash Anti Join  (cost=41323.25..81775.25 rows=20427 width=8)
(actual time=1264.700..1400.886 rows=1 loops=128)"
"          Output: m1.id"
"          Hash Cond: (m1.box_id = m2.box_id)"
"          Join Filter: (m1.id < m2.id)"
"          Buffers: shared hit=189064 read=93981 written=10, temp read=22668
written=22512"
"          ->  Bitmap Heap Scan on public.message m1  (cost=577.97..40212.28
rows=30640 width=16) (actual time=3.109..9.850 rows=18060 loops=128)"
"                Output: m1.id, m1.box_id"
"                Recheck Cond: (m1.box_id = $0)"
"                Buffers: shared hit=132095 read=9448"
"                ->  Bitmap Index Scan on "message_box_Idx"
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.867..2.867 rows=18193
loops=128)"
"                      Index Cond: (m1.box_id = $0)"
"                      Buffers: shared hit=312 read=6435"
"          ->  Hash  (cost=40212.28..40212.28 rows=30640 width=16) (actual
time=27.533..27.533 rows=20102 loops=115)"
"                Output: m2.box_id, m2.id"
"                Buckets: 4096  Batches: 4 (originally 2)  Memory Usage:
5522kB"
"                Buffers: shared hit=56969 read=84533 written=10, temp
written=7811"
"                ->  Bitmap Heap Scan on public.message m2
(cost=577.97..40212.28 rows=30640 width=16) (actual time=2.406..20.492
rows=20102 loops=115)"
"                      Output: m2.box_id, m2.id"
"                      Recheck Cond: (m2.box_id = $0)"
"                      Buffers: shared hit=56969 read=84533 written=10"
"                      ->  Bitmap Index Scan on "message_box_Idx"
(cost=0.00..570.31 rows=30640 width=0) (actual time=2.170..2.170 rows=20249
loops=115)"
"                            Index Cond: (m2.box_id = $0)"
"                            Buffers: shared hit=6708"
"Total runtime: 179334.310 ms"


So from my limited experience, the only significant difference I see is that
the Hash Anti Join takes a lot more time in plan 2, but I do not understand
why.
Ideas somebody?

Thanks
panam


--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4420974.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: reducing random_page_cost from 4 to 2 to force index scan
Next
From: Terry Schmitt
Date:
Subject: Re: Performance degradation of inserts when database size grows