Re: Same query - Slow in production - Mailing list pgsql-performance

From Tom Lane
Subject Re: Same query - Slow in production
Date
Msg-id 25818.1147310405@sss.pgh.pa.us
Whole thread Raw
In response to Same query - Slow in production  (Brian Wipf <brian@clickspace.com>)
Responses Speed Up Offset and Limit Clause  ("Christian Paul Cosinas" <cpc@cybees.com>)
List pgsql-performance
Brian Wipf <brian@clickspace.com> writes:
> I'm trying to determine why an identical query is running
> approximately 500 to 1000 times slower on our production database
> compared to our backup database server.

It looks to me like it's pure luck that the query is fast on the backup
server.  The outer side of the EXISTS' join is being badly misestimated:

>             ->  Index Scan using
> category_product__category_id_is_active_and_status_idx on
> category_product cp  (cost=0.00..4362.64 rows=1103 width=4) (actual
> time=0.013..0.015 rows=2 loops=5)
>                   Index Cond: ((category_id = $1) AND
> ((product_is_active)::text = 'true'::text) AND
> ((product_status_code)::text = 'complete'::text))

If there actually had been 1100 matching rows instead of 2, the query
would have run 550 times slower, putting it in the same ballpark as
the other plan.  So what I'm guessing is that the planner sees these
two plans as being nearly the same cost, and small differences in the
stats between the two databases are enough to tip its choice in one
direction or the other.

So what you want, of course, is to improve that rowcount estimate.
I suppose the reason it's so bad is that we don't have multicolumn
statistics ... is there a strong correlation between product_is_active
and product_status_code?  If so, it might be worth your while to find a
way to merge them into one column.

            regards, tom lane

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Big IN() clauses etc : feature proposal
Next
From: "Christian Paul Cosinas"
Date:
Subject: Speed Up Offset and Limit Clause