Re: 8.4 optimization regression? - Mailing list pgsql-performance

From Mark Kirkwood
Subject Re: 8.4 optimization regression?
Date
Msg-id 4E601B2F.30107@catalyst.net.nz
Whole thread Raw
In response to Re: 8.4 optimization regression?  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Responses Re: 8.4 optimization regression?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 02/09/11 11:18, Mark Kirkwood wrote:
> On 02/09/11 11:13, Tom Lane wrote:
>> I wrote:
>>> Mark Kirkwood<mark.kirkwood@catalyst.net.nz>  writes:
>>>> [ assorted examples showing that commit
>>>> 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ]
>>> ...
>>> So, not only are you correct that we should revert the changes to
>>> eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too.
>> I've retested these examples with the patches I committed yesterday.
>> Six of the eight examples are estimated pretty nearly dead on, while the
>> other two are estimated about 50% too high (still a lot better than
>> before).  AFAICT there's no easy way to improve those estimates further;
>> eqjoinsel_semi just plain hasn't got enough information to know how many
>> matches there will be.
>>
>>
>
> Just noticed your two commits this morning and ran them through the
> examples too - results look really good! Not only are the plain join
> queries looking way better but that last semi join that was way off is
> now being estimated pretty close. Should be interesting to see how
> much this improves more complex queries!
>
>

While this is still fresh in your mind, a couple of additional anti join
queries are still managing to sneak past estimation:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );
                                                              QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
  Hash Anti Join  (cost=501666.88..851597.05 rows=1 width=0) (actual
time=29956.971..50933.702 rows=5914 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=1991560
width=4) (actual time=13.352..13765.749 rows=1999780 loops=1)
          Filter: (keywordid < 100000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29345.238..29345.238 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.010..22731.316 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );
                                                             QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Hash Anti Join  (cost=501666.88..821806.96 rows=1 width=0) (actual
time=46497.231..49196.057 rows=566 loops=1)
    Hash Cond: (nk.nodeid = n.nodeid)
    ->  Seq Scan on nodekeyword nk  (cost=0.00..297414.03 rows=192921
width=4) (actual time=19.916..16250.224 rows=199616 loops=1)
          Filter: (keywordid < 10000)
    ->  Hash  (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29901.178..29901.178 rows=4985269 loops=1)
          Buckets: 4096  Batches: 256  Memory Usage: 699kB
          ->  Seq Scan on node n  (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.008..23207.964 rows=4985269 loops=1)
                Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)







pgsql-performance by date:

Previous
From: Stefan Keller
Date:
Subject: Re: Summaries on SSD usage?
Next
From: Jesper Krogh
Date:
Subject: Re: Summaries on SSD usage?