Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT
Date
Msg-id 14062.1255642397@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT  (Steve McLellan <smclellan@mintel.com>)
List pgsql-bugs
Steve McLellan <smclellan@mintel.com> writes:
> Sorry, I should've tried that. The two explains are below; without cursor
> then with cursor. I don't know enough to reliably say whether there's
> anything wrong with either (we use default_statistics_target=100 although
> the estimates don't look right for some of the query fragments), but they're
> certainly very different.

Right, so the cursor plan is one that is estimated to take 3x longer,
but can deliver rows starting right away, where the other plan will have
a long delay and then spit everything out in a bunch.  It's not a bad
tradeoff, if the estimate were right.  The killer mistake is here:

>    ->  Index Scan using "Originator Type" on originator_dimension
>  (cost=0.00..125.29 rows=1 width=4) (actual time=26.067..41.906 rows=169
> loops=1)
>          Index Cond: ("Originator Type" = 'NO DATA'::text)
>          Filter: ("Area" = ANY ('{National,"Phoenix, AZ"}'::text[]))

Since that's on the outside of a nestloop, the 169x rowcount error
translates directly to a 169x error in the total runtime estimate ---
and it looks like that's not very far at all from the real ratio,
so the other estimates are pretty good.

I'm not sure whether you'd get any real improvement from a further
increase in stats target, though that's certainly an easy thing to
try.  What this looks like to me is the longstanding problem of having
correlations between the distributions of two different columns
(here, Area and Originator Type), which the planner doesn't know about.
So its estimate of the combined probability of meeting the conditions
is off.  You could try checking estimated vs real rowcount for the
area and originator conditions separately to see if that's the case
or not.  If either one is off by itself then a stats increase for that
column would help, if not there's not much to be done except lean on
the cursor target knob ...

            regards, tom lane

pgsql-bugs by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: BUG #5121: Segmentation Fault when using pam w/ krb5
Next
From: Steve McLellan
Date:
Subject: Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT