10x rowcount mis-estimation favouring merge over nestloop - Mailing list pgsql-performance

From Abhijit Menon-Sen
Subject 10x rowcount mis-estimation favouring merge over nestloop
Date
Msg-id 20061110051245.GA1928@penne.toroid.org
Whole thread Raw
Responses Re: 10x rowcount mis-estimation favouring merge over nestloop
List pgsql-performance
I'm executing the following query:

    select
        hf.mailbox,hf.uid,hf.position,hf.part,hf.field,hf.value,
        af.address,a.name,a.localpart,a.domain
    from
        header_fields hf
        left join address_fields af
            using ( mailbox, uid, position, part, field )
        left join addresses a
            on (af.address=a.id)
    where
        hf.field<=12 and (hf.part!='' or hf.value ilike '%,%') ;

The header_fields table contains 13.5M rows, of which only ~250K match
the where condition. I created an index like this:

    create index hffpv on header_fields(field)
    where field<=12 and (part!='' or value ilike '%,%')

By default, explain analyse shows me a plan like this:

 Hash Left Join  (cost=1225503.02..1506125.88 rows=2077771 width=143) (actual time=106467.431..117902.397 rows=1113355
loops=1)
   Hash Cond: ("outer".address = "inner".id)
   ->  Merge Left Join  (cost=1211354.65..1288896.97 rows=2077771 width=91) (actual time=104792.505..110648.253
rows=1113355loops=1) 
         Merge Cond: (("outer".field = "inner".field) AND ("outer".part = "inner".part) AND ("outer"."position" =
"inner"."position")AND ("outer".uid = "inner".uid) AND ("outer".mailbox = "inner".mailbox)) 
         ->  Sort  (cost=665399.78..670594.21 rows=2077771 width=87) (actual time=39463.784..39724.772 rows=264180
loops=1)
               Sort Key: hf.field, hf.part, hf."position", hf.uid, hf.mailbox
               ->  Bitmap Heap Scan on header_fields hf  (cost=1505.63..325237.46 rows=2077771 width=87) (actual
time=3495.308..33767.229rows=264180 loops=1) 
                     Recheck Cond: ((field <= 12) AND ((part <> ''::text) OR (value ~~* '%,%'::text)))
                     ->  Bitmap Index Scan on hffpv  (cost=0.00..1505.63 rows=2077771 width=0) (actual
time=3410.069..3410.069rows=264180 loops=1) 
                           Index Cond: (field <= 12)
         ->  Sort  (cost=545954.87..553141.07 rows=2874480 width=24) (actual time=65328.437..67437.846 rows=2874230
loops=1)
               Sort Key: af.field, af.part, af."position", af.uid, af.mailbox
               ->  Seq Scan on address_fields af  (cost=0.00..163548.00 rows=2874480 width=24) (actual
time=12.434..4076.694rows=2874230 loops=1) 
   ->  Hash  (cost=11714.35..11714.35 rows=190807 width=56) (actual time=1670.629..1670.629 rows=190807 loops=1)
         ->  Seq Scan on addresses a  (cost=0.00..11714.35 rows=190807 width=56) (actual time=39.944..1398.897
rows=190807loops=1) 
 Total runtime: 118381.608 ms

Note the 2M estimated rowcount in the bitmap index scan on header_fields
vs. the actual number (264180). That mis-estimation also causes it to do
a sequential scan of address_fields, though there's a usable index. If I
set both enable_mergejoin and enable_seqscan to false, I get a plan like
the following:

 Hash Left Join  (cost=8796.82..72064677.06 rows=2077771 width=143) (actual time=4400.706..58110.697 rows=1113355
loops=1)
   Hash Cond: ("outer".address = "inner".id)
   ->  Nested Loop Left Join  (cost=1505.63..71937416.17 rows=2077771 width=91) (actual time=3486.238..52351.567
rows=1113355loops=1) 
         Join Filter: (("outer"."position" = "inner"."position") AND ("outer".part = "inner".part) AND ("outer".field =
"inner".field))
         ->  Bitmap Heap Scan on header_fields hf  (cost=1505.63..242126.62 rows=2077771 width=87) (actual
time=3478.202..39181.477rows=264180 loops=1) 
               Recheck Cond: ((field <= 12) AND ((part <> ''::text) OR (value ~~* '%,%'::text)))
               ->  Bitmap Index Scan on hffpv  (cost=0.00..1505.63 rows=2077771 width=0) (actual
time=3393.949..3393.949rows=264180 loops=1) 
                     Index Cond: (field <= 12)
         ->  Index Scan using af_mu on address_fields af  (cost=0.00..34.26 rows=11 width=24) (actual time=0.028..0.040
rows=7loops=264180) 
               Index Cond: (("outer".mailbox = af.mailbox) AND ("outer".uid = af.uid))
   ->  Hash  (cost=4857.17..4857.17 rows=190807 width=56) (actual time=764.337..764.337 rows=190807 loops=1)
         ->  Index Scan using addresses_pkey on addresses a  (cost=0.00..4857.17 rows=190807 width=56) (actual
time=29.381..484.826rows=190807 loops=1) 
 Total runtime: 58459.624 ms

Which looks like a considerably nicer plan (but still features the wild
mis-estimation, though the index has approximately the right rowcount).
I tried increasing the statistics target on header_fields.field, part,
and value to 100, but the estimate always hovers around the 2M mark.

Does anyone have any ideas about what's wrong, and how to fix it?

Thanks.

-- ams

pgsql-performance by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Keeping processes open for re-use
Next
From: Tom Lane
Date:
Subject: Re: 10x rowcount mis-estimation favouring merge over nestloop