Thread: Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT

>Steve McLellan <smclellan@mintel.com> writes:
>> Thanks! Installing 8.4 and setting cursor_tuple_fraction to 1.0 does seem
to
>> force it to execute in the same time as not using the cursor, and we'll
>> probably go with this solution (the only reason we're using cursors is to
>> avoid retrieving vast result sets through psycopg2/fetchmany). Your
>> explanation makes sense, and I'm curious to see why this particular query
>> ends up being so different, but I couldn't figure out how to run the
explain
>> - the DECLARE syntax doesn't seem to allow it.
>
> "EXPLAIN [ANALYZE] DECLARE x CURSOR ..." works for me.
>
>   regards, tom lane

Sorry, I should've tried that. The two explains are below; without cursor
then with cursor. I don't know enough to reliably say whether there's
anything wrong with either (we use default_statistics_target=100 although
the estimates don't look right for some of the query fragments), but they're
certainly very different.


========
WITHOUT CURSOR:

Hash Join  (cost=33364.36..46606.90 rows=1 width=4) (actual
time=795.690..833.577 rows=14 loops=1)
   Hash Cond: (fact_table.originator_key =
originator_dimension.originator_key)
   ->  Nested Loop  (cost=33339.89..46577.20 rows=1392 width=8) (actual
time=792.676..832.203 rows=186 loops=1)
         ->  Nested Loop  (cost=33339.89..40678.24 rows=48 width=8) (actual
time=792.653..830.017 rows=110 loops=1)
               ->  Hash Join  (cost=33339.89..40641.84 rows=66 width=12)
(actual time=467.029..790.075 rows=3884 loops=1)
                     Hash Cond: (record_type_mv.record_key =
record_dimension.record_key)
                     ->  Seq Scan on record_type_mv  (cost=0.00..7076.20
rows=30012 width=4) (actual time=0.920..270.790 rows=31056 loops=1)
                           Filter: (("Type")::text = ANY (('{"Cat 1 -
yyy","Cat 1 - www","Cat 1 - zzz"}'::character varying[])::text[]))
                     ->  Hash  (cost=33287.67..33287.67 rows=4178 width=8)
(actual time=447.819..447.819 rows=10203 loops=1)
                           ->  Bitmap Heap Scan on record_dimension
 (cost=18214.01..33287.67 rows=4178 width=8) (actual time=371.277..428.278
rows=10203 loops=1)
                                 Recheck Cond: (("Category" = 'Cat 1'::text)
AND ("Sector" = 'Sector I'::text) AND ("Dataset - A" OR "Dataset - C" OR
"Dataset - B"))
                                 Filter: (("Dataset - A" OR "Dataset - C" OR
"Dataset - B") AND ("Country" = 'USA'::text))
                                 ->  BitmapAnd  (cost=18214.01..18214.01
rows=4623 width=0) (actual time=367.336..367.336 rows=0 loops=1)
                                       ->  Bitmap Index Scan on "Category"
 (cost=0.00..3091.75 rows=123623 width=0) (actual time=53.713..53.713
rows=124053 loops=1)
                                             Index Cond: ("Category" = 'Cat
1'::text)
                                       ->  Bitmap Index Scan on "Sector"
 (cost=0.00..7525.37 rows=327577 width=0) (actual time=129.610..129.610
rows=328054 loops=1)
                                             Index Cond: ("Sector" = 'Sector
I'::text)
                                       ->  BitmapOr  (cost=7594.30..7594.30
rows=410371 width=0) (actual time=128.983..128.983 rows=0 loops=1)
                                             ->  Bitmap Index Scan on
"Dataset - A"  (cost=0.00..550.30 rows=29579 width=0) (actual
time=11.393..11.393 rows=30016 loops=1)
                                                   Index Cond: ("Dataset -
A" = true)
                                             ->  Bitmap Index Scan on
"Dataset - C"  (cost=0.00..6981.18 rows=377696 width=0) (actual
time=116.306..116.306 rows=380788 loops=1)
                                                   Index Cond: ("Dataset -
C" = true)
                                             ->  Bitmap Index Scan on
"Dataset - B"  (cost=0.00..59.69 rows=3097 width=0) (actual
time=1.272..1.272 rows=3116 loops=1)
                                                   Index Cond: ("Dataset -
B" = true)
               ->  Index Scan using date_dimension_pkey on date_dimension
"date_dimension_Published Date"  (cost=0.00..0.54 rows=1 width=4) (actual
time=0.007..0.007 rows=0 loops=3884)
                     Index Cond: ("date_dimension_Published Date".date_key =
record_dimension.published_date_key)
                     Filter: ("date_dimension_Published Date"."SQL Date" >=
(now() - '6 mons'::interval))
         ->  Index Scan using record_date on instance_core_fact fact_table
 (cost=0.00..122.48 rows=33 width=12) (actual time=0.007..0.011 rows=2
loops=110)
               Index Cond: (fact_table.record_key =
record_dimension.record_key)
   ->  Hash  (cost=24.45..24.45 rows=1 width=4) (actual time=1.004..1.004
rows=169 loops=1)
         ->  Bitmap Heap Scan on originator_dimension  (cost=20.44..24.45
rows=1 width=4) (actual time=0.313..0.662 rows=169 loops=1)
               Recheck Cond: (("Area" = ANY ('{National,"Phoenix,
AZ"}'::text[])) AND ("Originator Type" = 'NO DATA'::text))
               ->  BitmapAnd  (cost=20.44..20.44 rows=1 width=0) (actual
time=0.299..0.299 rows=0 loops=1)
                     ->  Bitmap Index Scan on publication_market_area
 (cost=0.00..8.95 rows=54 width=0) (actual time=0.165..0.165 rows=169
loops=1)
                           Index Cond: ("Area" = ANY ('{National,"Phoenix,
AZ"}'::text[]))
                     ->  Bitmap Index Scan on "Originator Type"
 (cost=0.00..11.24 rows=390 width=0) (actual time=0.121..0.121 rows=426
loops=1)
                           Index Cond: ("Originator Type" = 'NO DATA'::text)
===================
WITH CURSOR:

Nested Loop  (cost=0.00..145138.44 rows=1 width=4) (actual
time=1508.556..556939.884 rows=14 loops=1)
   Join Filter: (fact_table.originator_key =
originator_dimension.originator_key)
   ->  Index Scan using "Originator Type" on originator_dimension
 (cost=0.00..125.29 rows=1 width=4) (actual time=26.067..41.906 rows=169
loops=1)
         Index Cond: ("Originator Type" = 'NO DATA'::text)
         Filter: ("Area" = ANY ('{National,"Phoenix, AZ"}'::text[]))
   ->  Nested Loop  (cost=0.00..144995.75 rows=1392 width=8) (actual
time=1081.426..3294.915 rows=186 loops=169)
         ->  Nested Loop  (cost=0.00..139096.78 rows=48 width=8) (actual
time=1081.108..3290.440 rows=110 loops=169)
               ->  Nested Loop  (cost=0.00..124922.31 rows=3026 width=4)
(actual time=1061.766..3284.424 rows=380 loops=169)
                     ->  Seq Scan on record_dimension  (cost=0.00..122618.12
rows=4178 width=8) (actual time=0.427..3177.623 rows=10203 loops=169)
                           Filter: (("Dataset - A" OR "Dataset - C" OR
"Dataset - B") AND ("Sector" = 'Sector I'::text) AND ("Category" = 'Cat
1'::text) AND ("Country" = 'USA'::text))
                     ->  Index Scan using date_dimension_pkey on
date_dimension "date_dimension_Published Date"  (cost=0.00..0.54 rows=1
width=4) (actual time=0.007..0.007 rows=0 loops=1724307)
                           Index Cond: ("date_dimension_Published
Date".date_key = record_dimension.published_date_key)
                           Filter: ("date_dimension_Published Date"."SQL
Date" >= (now() - '6 mons'::interval))
               ->  Index Scan using record_type_mv_pkey on record_type_mv
 (cost=0.00..4.67 rows=1 width=4) (actual time=0.011..0.012 rows=0
loops=64220)
                     Index Cond: (record_type_mv.record_key =
record_dimension.record_key)
                     Filter: ((record_type_mv."Type")::text = ANY (('{"Cat 1
- yyy","Cat 1 - www","Cat 1 - zzz"}'::character varying[])::text[]))
         ->  Index Scan using record_date on instance_core_fact fact_table
 (cost=0.00..122.48 rows=33 width=12) (actual time=0.024..0.032 rows=2
loops=18590)
               Index Cond: (fact_table.record_key =
record_dimension.record_key)
Steve McLellan <smclellan@mintel.com> writes:
> Sorry, I should've tried that. The two explains are below; without cursor
> then with cursor. I don't know enough to reliably say whether there's
> anything wrong with either (we use default_statistics_target=100 although
> the estimates don't look right for some of the query fragments), but they're
> certainly very different.

Right, so the cursor plan is one that is estimated to take 3x longer,
but can deliver rows starting right away, where the other plan will have
a long delay and then spit everything out in a bunch.  It's not a bad
tradeoff, if the estimate were right.  The killer mistake is here:

>    ->  Index Scan using "Originator Type" on originator_dimension
>  (cost=0.00..125.29 rows=1 width=4) (actual time=26.067..41.906 rows=169
> loops=1)
>          Index Cond: ("Originator Type" = 'NO DATA'::text)
>          Filter: ("Area" = ANY ('{National,"Phoenix, AZ"}'::text[]))

Since that's on the outside of a nestloop, the 169x rowcount error
translates directly to a 169x error in the total runtime estimate ---
and it looks like that's not very far at all from the real ratio,
so the other estimates are pretty good.

I'm not sure whether you'd get any real improvement from a further
increase in stats target, though that's certainly an easy thing to
try.  What this looks like to me is the longstanding problem of having
correlations between the distributions of two different columns
(here, Area and Originator Type), which the planner doesn't know about.
So its estimate of the combined probability of meeting the conditions
is off.  You could try checking estimated vs real rowcount for the
area and originator conditions separately to see if that's the case
or not.  If either one is off by itself then a stats increase for that
column would help, if not there's not much to be done except lean on
the cursor target knob ...

            regards, tom lane