BUG #5120: Performance difference between running a query with named cursor and straight SELECT - Mailing list pgsql-bugs

From Steven McLellan
Subject BUG #5120: Performance difference between running a query with named cursor and straight SELECT
Date
Msg-id 200910151750.n9FHoNCD007257@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5120
Logged by:          Steven McLellan
Email address:      smclellan@mintel.com
PostgreSQL version: 8.3.x
Operating system:   FreeBSD 6.2
Description:        Performance difference between running a query with
named cursor and straight SELECT
Details:

Hi,

I've found what appears to be a bug seriously affecting performance running
a particular query using a named cursor versus running it as a simple
SELECT. Running with SELECT, the query below takes around .5s on a
reasonably specced dev database (4 core, 3GB RAM) running postgresql 8.3.3
and is a little faster on a better specced production database running
8.3.5. Running through a named cursor with FETCH FORWARD it takes upwards of
10 minutes (about 6 on the production box). Both boxes run FreeBSD 6.2. I
can reproduce the performance problem using:
BEGIN;
DECLARE named_cursor CURSOR FOR <<query>>;
FETCH ALL FROM named_cursor;

The prompt hangs at the FETCH ALL. The CPU remains pegged the whole
execution time, disks are inactive, and the query returns around 20 results.
Curiously, FETCH NEXT returns faster (in 3-5 seconds), but subsequent calls
to FETCH <anything> take minutes.

Below is a slightly anonymized version of the query (company policy).
Reducing the filters speed things up, and removing joins or WHERE clauses
have a beneficial impact, but I can't narrow it down to any individual piece
of the query slowing things down, and in any case, I'm not able to change it
much (it's machine generated). EXPLAIN ANALYZE follows below.

Does anyone know of any reason why this particular query would take so much
longer using a named cursor than running it without? We don't know of any
other queries that exhibit this behaviour.

Any help greatly appreciated, and if any more information is required please
ask.

Steve McLellan

SELECT fact_table."Instance Id"
FROM
 instance_core_fact AS fact_table
 INNER JOIN record_dimension ON fact_table.record_key =
record_dimension.record_key
 INNER JOIN originator_dimension ON fact_table.originator_key =
originator_dimension.originator_key
 INNER JOIN date_dimension AS "date_dimension_Published Date" ON
record_dimension.published_date_key = "date_dimension_Published
Date".date_key
 INNER JOIN record_type_mv ON fact_table.record_key =
record_type_mv.record_key
WHERE
 record_dimension."Sector" = 'Sector I' AND
 record_dimension."Category" = 'Cat 1' AND
 record_dimension."Country" = 'USA' AND
 (record_dimension."Dataset - A" OR record_dimension."Dataset - C" OR
record_dimension."Dataset - B") AND
 originator_dimension."Area" IN('National', 'Phoenix, AZ') AND
 "date_dimension_Published Date"."SQL Date" >= NOW() - INTERVAL '6 MONTH'
AND
 record_type_mv."Type" IN('Cat 1 - yyy', 'Cat 1 - www', 'Cat 1 - zzz') AND
 originator_dimension."Originator Type" = 'NO DATA';

======
EXPLAIN ANALYZE

Hash Join  (cost=33364.36..46606.90 rows=1 width=4) (actual
time=795.690..833.577 rows=14 loops=1)
   Hash Cond: (fact_table.originator_key =
originator_dimension.originator_key)
   ->  Nested Loop  (cost=33339.89..46577.20 rows=1392 width=8) (actual
time=792.676..832.203 rows=186 loops=1)
         ->  Nested Loop  (cost=33339.89..40678.24 rows=48 width=8) (actual
time=792.653..830.017 rows=110 loops=1)
               ->  Hash Join  (cost=33339.89..40641.84 rows=66 width=12)
(actual time=467.029..790.075 rows=3884 loops=1)
                     Hash Cond: (record_type_mv.record_key =
record_dimension.record_key)
                     ->  Seq Scan on record_type_mv  (cost=0.00..7076.20
rows=30012 width=4) (actual time=0.920..270.790 rows=31056 loops=1)
                           Filter: (("Type")::text = ANY (('{"Cat 1 -
Medicare","Cat 1 - www","Cat 1 - zzz"}'::character varying[])::text[]))
                     ->  Hash  (cost=33287.67..33287.67 rows=4178 width=8)
(actual time=447.819..447.819 rows=10203 loops=1)
                           ->  Bitmap Heap Scan on record_dimension
(cost=18214.01..33287.67 rows=4178 width=8) (actual time=371.277..428.278
rows=10203 loops=1)
                                 Recheck Cond: (("Category" = 'Cat 1'::text)
AND ("Sector" = 'Sector I'::text) AND ("Dataset - A" OR "Dataset - C" OR
"Dataset - B"))
                                 Filter: (("Dataset - A" OR "Dataset - C" OR
"Dataset - B") AND ("Country" = 'USA'::text))
                                 ->  BitmapAnd  (cost=18214.01..18214.01
rows=4623 width=0) (actual time=367.336..367.336 rows=0 loops=1)
                                       ->  Bitmap Index Scan on "Category"
(cost=0.00..3091.75 rows=123623 width=0) (actual time=53.713..53.713
rows=124053 loops=1)
                                             Index Cond: ("Category" = 'Cat
1'::text)
                                       ->  Bitmap Index Scan on "Sector"
(cost=0.00..7525.37 rows=327577 width=0) (actual time=129.610..129.610
rows=328054 loops=1)
                                             Index Cond: ("Sector" = 'Sector
I'::text)
                                       ->  BitmapOr  (cost=7594.30..7594.30
rows=410371 width=0) (actual time=128.983..128.983 rows=0 loops=1)
                                             ->  Bitmap Index Scan on
"Dataset - A"  (cost=0.00..550.30 rows=29579 width=0) (actual
time=11.393..11.393 rows=30016 loops=1)
                                                   Index Cond: ("Dataset -
A" = true)
                                             ->  Bitmap Index Scan on
"Dataset - C"  (cost=0.00..6981.18 rows=377696 width=0) (actual
time=116.306..116.306 rows=380788 loops=1)
                                                   Index Cond: ("Dataset -
C" = true)
                                             ->  Bitmap Index Scan on
"Dataset - B"  (cost=0.00..59.69 rows=3097 width=0) (actual
time=1.272..1.272 rows=3116 loops=1)
                                                   Index Cond: ("Dataset -
B" = true)
               ->  Index Scan using date_dimension_pkey on date_dimension
"date_dimension_Published Date"  (cost=0.00..0.54 rows=1 width=4) (actual
time=0.007..0.007 rows=0 loops=3884)
                     Index Cond: ("date_dimension_Published Date".date_key =
record_dimension.published_date_key)
                     Filter: ("date_dimension_Published Date"."SQL Date" >=
(now() - '6 mons'::interval))
         ->  Index Scan using record_date on instance_core_fact fact_table
(cost=0.00..122.48 rows=33 width=12) (actual time=0.007..0.011 rows=2
loops=110)
               Index Cond: (fact_table.record_key =
record_dimension.record_key)
   ->  Hash  (cost=24.45..24.45 rows=1 width=4) (actual time=1.004..1.004
rows=169 loops=1)
         ->  Bitmap Heap Scan on originator_dimension  (cost=20.44..24.45
rows=1 width=4) (actual time=0.313..0.662 rows=169 loops=1)
               Recheck Cond: (("Area" = ANY ('{National,"Phoenix,
AZ"}'::text[])) AND ("Originator Type" = 'NO DATA'::text))
               ->  BitmapAnd  (cost=20.44..20.44 rows=1 width=0) (actual
time=0.299..0.299 rows=0 loops=1)
                     ->  Bitmap Index Scan on publication_market_area
(cost=0.00..8.95 rows=54 width=0) (actual time=0.165..0.165 rows=169
loops=1)
                           Index Cond: ("Area" = ANY ('{National,"Phoenix,
AZ"}'::text[]))
                     ->  Bitmap Index Scan on "Originator Type"
(cost=0.00..11.24 rows=390 width=0) (actual time=0.121..0.121 rows=426
loops=1)
                           Index Cond: ("Originator Type" = 'NO DATA'::text)

pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BUG #5118: start-status-insert-fatal
Next
From: "Kevin Grittner"
Date:
Subject: Re: BUG #5118: start-status-insert-fatal