Re: Why PG uses nested-loop join when no indexes are available? - Mailing list pgsql-general

From David Rowley
Subject Re: Why PG uses nested-loop join when no indexes are available?
Date
Msg-id CAKJS1f8nsm-T0KMvGJz_bskUjQ=yGmGUUtUdAcFoEaZ_tuTXjA@mail.gmail.com
Whole thread Raw
In response to Why PG uses nested-loop join when no indexes are available?  (David Grelaud <dgrelaud@ideolys.com>)
Responses Re: Why PG uses nested-loop join when no indexes are available?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On 14 January 2016 at 03:48, David Grelaud <dgrelaud@ideolys.com> wrote:
3) Always avoid nested-loop join when no indexes are available?

Tom Lane said "There might be some cases where this would help, but there would be many more where it would be useless or counterproductive."
Who is right between Tom Lane and the Leis Viktor's paper above?

We tried to disable nested_loop all the time in a production environment and we observed an overall improvement in all queries where Indexes are not useful or not available (CTEs), which confirms the paper.
In fact, one of our production environment is still running with "nested_loop off" because benefits are a lot greater than drawbacks as long as some tables are relatively small (Indexes not used).

I don't really think any of them are wrong. Simply Tom is talking in general terms for no specific workload, and the paper is dealing with one specific workload. Of course there are cases when a non-parameterised nested loop are the fastest way, I mean what could possibility be faster if there's only 1 row to be joined, for example. It's just that it's not that much faster since such a join is likely to perform very quickly no matter which join algorithm is used.

On the other hand, if your tables are not tiny, or you're never just joining to just a few rows, and you are suffering from stats underestimations, then it's quite probable that you'll improve your workload overall by doing enable_nestloop = off. But you have to remember that if you do this, then you miss out on parameterised inner scans on nested loops. Quite often these are the fastest option, even when the number of rows is fairly large, as it might save building a hash table on a very large relation, or having to sort that relation for a merge join.

Perhaps separating out enable_nestloop so that it only disables non-parameterised nested loops, and add another GUC for parameterised nested loops would be a good thing to do. Likely setting enable_nestloop to off in production would be a slightly easier decision to make, if that was the case.

It looks pretty simple to do this, so I hacked it up, and attached it here. There's no doc changes and I'm not that interested in fighting for this change, it's more just an idea for consideration.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Function error
Next
From: Karsten Hilbert
Date:
Subject: Re: pg_dump problem with dropped NOT NULL on child table