Re: Re: Query planner using hash join when merge join seems orders of magnitude faster - Mailing list pgsql-general

From Tom Lane
Subject Re: Re: Query planner using hash join when merge join seems orders of magnitude faster
Date
Msg-id 20318.1470092209@sss.pgh.pa.us
Whole thread Raw
In response to Re: Query planner using hash join when merge join seems orders of magnitude faster  (Branden Visser <mrvisser@gmail.com>)
Responses Re: Re: Query planner using hash join when merge join seems orders of magnitude faster  (Branden Visser <mrvisser@gmail.com>)
List pgsql-general
Branden Visser <mrvisser@gmail.com> writes:
> I just wanted to update that I've found evidence that fixing the
> planner row estimation may not actually influence it to use the more
> performant merge join instead of hash join. I have found instances
> where the row estimation is *overestimated* by a magnitude of 4x
> (estimates 2.4m rows) and still chooses hash join over merge join,
> where merge join is much faster (45s v.s. 12s).

I wonder why the merge join is faster exactly.  It doesn't usually have a
huge benefit unless the inputs are presorted already.  The one case I can
think of where it can win quite a lot is if the range of merge keys in one
input is such that we can skip reading most of the other input.  (Extreme
example: one input has keys 1..10, but the other input has keys 1..10000.
We only need to read the first 1% of the second input, assuming there's an
index on its key column so that we don't have to read the whole thing
anyway to sort it.)

The planner is aware of that effect, but I wonder if it's misestimating it
for some reason.  Anyway it would be worth looking closely at your EXPLAIN
ANALYZE results to determine whether early-stop is happening or not.  It'd
manifest as one join input node showing an actual number of rows returned
that's less than you'd expect.

            regards, tom lane


pgsql-general by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Force pg_hba.conf user with LDAP
Next
From: Venkata Balaji N
Date:
Subject: Re: How to best archetect Multi-Tenant SaaS application using Postgres