Thread: Query running a lot faster with enable_nestloop=false

Query running a lot faster with enable_nestloop=false

From

Mohanaraj Gopala Krishnan

Date:

25 October 2011, 09:09:50

Hi guys,

I have a query that runs a lot slower (~5 minutes) when I run it with
the default enable_nestloop=true and enable_nestloop=false (~10 secs).
The actual query is available here http://pastie.org/2754424 . It is a
reporting query with many joins as the database is mainly used for
transaction processing.

Explain analyse result for both cases:

Machine A nestloop=true - http://explain.depesz.com/s/nkj0 (~5 minutes)
Machine A nestloop=false - http://explain.depesz.com/s/wBM (~10 secs)

On a different slightly slower machine (Machine B), copying the
database over and leaving the default enable_nestloop=true it takes
~20 secs.

Machine B nestloop=true - http://explain.depesz.com/s/dYO (~ 20secs)

For all the cases above I ensured that I did an ANALYZE before running
the queries. There were no other queries running in parallel.
Both machines are running  PostgreSQL 8.4.6. Machine B is using the
default configuration provided by the package while for Machine A we
applied the changes suggested by pgtune - http://pastie.org/2755113.

Machine A is running Ubuntu 10.04 32 bit while Machine B is running
Ubuntu 8.04 32 bit.

Machine A spec -
Intel(R) Xeon(R) CPU X3450  @ 2.67GHz (8 Cores)
8GB RAM (2 x 4GB)
4 x 300GB 15k SAS

Machine B spec -
Intel(R) Pentium(R) D CPU 2.80GHz x 2
2GB RAM
1 x 80GB SATA HDD

1. For Machine A, what can I do to make the planner choose the faster
plan without setting enable_nestloop=false ?

2. From the research I have done it seems to be that the reason the
planner is choosing the unoptimal query is because of the huge
difference between the estimated and actual rows. How can I get this
figure closer ?

3. If I should rewrite the query, what should I change ?

4. Why is it that the planner seems to be doing the right thing for
Machine B without setting enable_nestloop=false. What should I be
comparing in both the machines to understand the difference in choice
that the planner made ?

I have tried reading through the manual section "55.1. Row Estimation
Examples", "14.2. Statistics Used by the Planner". I am still trying
to fully apply the information to my specific case above and hence any
help or pointers would be greatly appreciated.

In a last ditch effort we also tried upgrading Machine A to
PostgresSQL 9.1 and that did not rectify the issue. We have reverted
the upgrade for now.

Thank you for your time.


--
Mohan

Re: Query running a lot faster with enable_nestloop=false

From

Robins Tharakan

Date:

25 October 2011, 09:41:42

Hi Mohanaraj,

One thing you should certainly try is to increase the
default_statistics_target value from 50 up to say about 1000 for the
larger tables. Large tables tend to go off on estimates with smaller
values here.

I guess I am not helping here, but apart from your query, those
estimates on Machine B seem odd, coz they shoot up from 10k to the order
of billions without any big change in row-count. Beats me.

--
Robins Tharakan

> 1. For Machine A, what can I do to make the planner choose the faster
> plan without setting enable_nestloop=false ?
>
> 2. From the research I have done it seems to be that the reason the
> planner is choosing the unoptimal query is because of the huge
> difference between the estimated and actual rows. How can I get this
> figure closer ?
>
> 3. If I should rewrite the query, what should I change ?
>
> 4. Why is it that the planner seems to be doing the right thing for
> Machine B without setting enable_nestloop=false. What should I be
> comparing in both the machines to understand the difference in choice
> that the planner made ?
>
> I have tried reading through the manual section "55.1. Row Estimation
> Examples", "14.2. Statistics Used by the Planner". I am still trying
> to fully apply the information to my specific case above and hence any
> help or pointers would be greatly appreciated.
>
> In a last ditch effort we also tried upgrading Machine A to
> PostgresSQL 9.1 and that did not rectify the issue. We have reverted
> the upgrade for now.
>
> Thank you for your time.

Attachment

smime.p7s

Re: Query running a lot faster with enable_nestloop=false

From

Robert Haas

Date:

03 November 2011, 15:35:09

On Tue, Oct 25, 2011 at 5:09 AM, Mohanaraj Gopala Krishnan
<mohangk@gmail.com> wrote:
> I have a query that runs a lot slower (~5 minutes) when I run it with
> the default enable_nestloop=true and enable_nestloop=false (~10 secs).
> The actual query is available here http://pastie.org/2754424 . It is a
> reporting query with many joins as the database is mainly used for
> transaction processing.
>
> Explain analyse result for both cases:
>
> Machine A nestloop=true - http://explain.depesz.com/s/nkj0 (~5 minutes)
> Machine A nestloop=false - http://explain.depesz.com/s/wBM (~10 secs)

A good start might be to refactor this:

Seq Scan on retailer_categories retailer_category_leaf_nodes
(cost=0.00..18.52 rows=1 width=16) (actual time=0.016..0.392 rows=194
loops=1)
    Filter: ((tree_right - tree_left) = 1)

And this:

Seq Scan on product_categories product_category_leaf_nodes
(cost=0.00..148.22 rows=2 width=32) (actual time=0.031..1.109 rows=383
loops=1)
    Filter: ((tree_right - tree_left) = 1)

The query planner really has no idea what selectivity to assign to
that condition, and so it's basically guessing, and it's way off.  You
could probably improve the estimate a lot by adding a column that
stores the values of tree_right - tree_left and is updated manually or
by triggers as you insert and update data.  Then you could just check
tree_left_right_difference = 1, which should get a much more accurate
estimate, and hopefully therefore a better plan.

You've also got a fairly large estimation error here:

Index Scan using invoices_create_date_idx on invoices (cost=0.00..8.28
rows=1 width=4) (actual time=0.055..0.305 rows=109 loops=1)
    Index Cond: ((create_date >= '2011-09-15'::date) AND (create_date
<= '2011-09-15'::date))
    Filter: (status = 128)

Apparently, status 128 is considerably more common among rows in that
date range than it is overall.  Unfortunately, it's not so easy to fix
this kind of estimation error, unless you can refactor your schema to
avoid needing to filter on both create_date and status at the same
time.

It might be worth using temporary tables here - factor out sections of
the query that are referenced multiple times, like the join between
sales_order_items and invoices, and create a temporary table.  ANALYZE
it, and then use it to run the main query.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company