Re: Query running a lot faster with enable_nestloop=false - Mailing list pgsql-performance

From Robert Haas
Subject Re: Query running a lot faster with enable_nestloop=false
Date
Msg-id CA+TgmoY1Qhi6-8iz_y6Qr_Wnjcs_BXufHHOZPuBqij_4jewVHQ@mail.gmail.com
Whole thread Raw
In response to Query running a lot faster with enable_nestloop=false  (Mohanaraj Gopala Krishnan <mohangk@gmail.com>)
List pgsql-performance
On Tue, Oct 25, 2011 at 5:09 AM, Mohanaraj Gopala Krishnan
<mohangk@gmail.com> wrote:
> I have a query that runs a lot slower (~5 minutes) when I run it with
> the default enable_nestloop=true and enable_nestloop=false (~10 secs).
> The actual query is available here http://pastie.org/2754424 . It is a
> reporting query with many joins as the database is mainly used for
> transaction processing.
>
> Explain analyse result for both cases:
>
> Machine A nestloop=true - http://explain.depesz.com/s/nkj0 (~5 minutes)
> Machine A nestloop=false - http://explain.depesz.com/s/wBM (~10 secs)

A good start might be to refactor this:

Seq Scan on retailer_categories retailer_category_leaf_nodes
(cost=0.00..18.52 rows=1 width=16) (actual time=0.016..0.392 rows=194
loops=1)
    Filter: ((tree_right - tree_left) = 1)

And this:

Seq Scan on product_categories product_category_leaf_nodes
(cost=0.00..148.22 rows=2 width=32) (actual time=0.031..1.109 rows=383
loops=1)
    Filter: ((tree_right - tree_left) = 1)

The query planner really has no idea what selectivity to assign to
that condition, and so it's basically guessing, and it's way off.  You
could probably improve the estimate a lot by adding a column that
stores the values of tree_right - tree_left and is updated manually or
by triggers as you insert and update data.  Then you could just check
tree_left_right_difference = 1, which should get a much more accurate
estimate, and hopefully therefore a better plan.

You've also got a fairly large estimation error here:

Index Scan using invoices_create_date_idx on invoices (cost=0.00..8.28
rows=1 width=4) (actual time=0.055..0.305 rows=109 loops=1)
    Index Cond: ((create_date >= '2011-09-15'::date) AND (create_date
<= '2011-09-15'::date))
    Filter: (status = 128)

Apparently, status 128 is considerably more common among rows in that
date range than it is overall.  Unfortunately, it's not so easy to fix
this kind of estimation error, unless you can refactor your schema to
avoid needing to filter on both create_date and status at the same
time.

It might be worth using temporary tables here - factor out sections of
the query that are referenced multiple times, like the join between
sales_order_items and invoices, and create a temporary table.  ANALYZE
it, and then use it to run the main query.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-performance by date:

Previous
From: Rodrigo Gonzalez
Date:
Subject: Re: function slower than the same code in an sql file
Next
From: Robert Haas
Date:
Subject: Re: function slower than the same code in an sql file