Home > mailing lists

Re: Why PG uses nested-loop join when no indexes are available? - Mailing list pgsql-general

From	David Rowley
Subject	Re: Why PG uses nested-loop join when no indexes are available?
Date	January 14, 2016 08:57:06
Msg-id	CAKJS1f8nsm-T0KMvGJz_bskUjQ=yGmGUUtUdAcFoEaZ_tuTXjA@mail.gmail.com Whole thread
In response to	Why PG uses nested-loop join when no indexes are available? (David Grelaud <dgrelaud@ideolys.com>)
Responses	Re: Why PG uses nested-loop join when no indexes are available?
List	pgsql-general

Tree view

On 14 January 2016 at 03:48, David Grelaud <dgrelaud@ideolys.com> wrote:

3) Always avoid nested-loop join when no indexes are available?

Tom Lane said "There might be some cases where this would help, but there would be many more where it would be useless or counterproductive."
Who is right between Tom Lane and the Leis Viktor's paper above?

We tried to disable nested_loop all the time in a production environment and we observed an overall improvement in all queries where Indexes are not useful or not available (CTEs), which confirms the paper.
In fact, one of our production environment is still running with "nested_loop off" because benefits are a lot greater than drawbacks as long as some tables are relatively small (Indexes not used).

I don't really think any of them are wrong. Simply Tom is talking in general terms for no specific workload, and the paper is dealing with one specific workload. Of course there are cases when a non-parameterised nested loop are the fastest way, I mean what could possibility be faster if there's only 1 row to be joined, for example. It's just that it's not that much faster since such a join is likely to perform very quickly no matter which join algorithm is used.

On the other hand, if your tables are not tiny, or you're never just joining to just a few rows, and you are suffering from stats underestimations, then it's quite probable that you'll improve your workload overall by doing enable_nestloop = off. But you have to remember that if you do this, then you miss out on parameterised inner scans on nested loops. Quite often these are the fastest option, even when the number of rows is fairly large, as it might save building a hash table on a very large relation, or having to sort that relation for a merge join.

Perhaps separating out enable_nestloop so that it only disables non-parameterised nested loops, and add another GUC for parameterised nested loops would be a good thing to do. Likely setting enable_nestloop to off in production would be a slightly easier decision to make, if that was the case.

It looks pretty simple to do this, so I hacked it up, and attached it here. There's no doc changes and I'm not that interested in fighting for this change, it's more just an idea for consideration.

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Attachment

enable_paramnestloop.patch

pgsql-general by date:

From: "David G. Johnston"
Date: 14 January 2016, 05:31:42
Subject: Re: Function error

From: Karsten Hilbert
Date: 14 January 2016, 08:59:16
Subject: Re: pg_dump problem with dropped NOT NULL on child table

Re: Why PG uses nested-loop join when no indexes are available? - Mailing list pgsql-general

Attachment

Previous

Next