Thread: Query planner using hash join when merge join seems orders of magnitude faster

Query planner using hash join when merge join seems orders of magnitude faster

From
Branden Visser
Date:
Hi all, I seem to be in a bit of a jam with the query planner choosing
a significantly less optimal route for a query.

First, a common check list I've seen around when researching these
types of issues on list:

1. pgsql version is PostgreSQL 9.4.7 running on Amazon RDS
2. Vacuum analyze was run
3. default_statistics_target was increased to 10,000 for the whole
test database and re-analyzed

Code and info referenced is in a github gist [1]. The files are:

[Q-Report.sql] The actual SQL query I'm trying to optimize, runs a
report on a large set of data
[QP-Report-Normal.txt] The query plan of Q-Report.sql without any
modification of the query plan
[QP-Report-DisableHashJoinltxt] The query plan of Q-Report.sql after
`set enable_hashjoin = false`
[Q-CC-Underestimated.sql] An isolated query that shows 1 of
potentially 2 significant row count underestimations
[QP-CC-Understimated.txt] The query plan of Q-CC-Underestimated.sql

Essentially, my report query has a hash join that is taking about 70%
of the query time (45s), whereas a merge join seems to take less than
a second.

It looks like there are probably 2 significant row underestimations
which may be contributing to this (first of which is the Q-CC query),
but I haven't been able to validate that this is specifically the
reason the query planner picks a hash join.

My usage on this schema are mostly queries that get very small slices
of data, however this particular query is a background workload that
generates denormalized reports. That said, any kind of global
configuration being added that would jeopardize the performance of the
small-slice queries is less desirable.

My question is, what would be the right way to go about diagnosing and
optimizing this kind of issue? I'm trying to think of ways to
potentially alter the structure of the query to reduce impact of the
planning estimates, but I'm not coming up with any good ideas there.
As mentioned, stats target was increased to 10,000 for testing and
while it changed the row estimates, it was only by about 12 rows --
did I maybe do something wrong there?

With this particular kind of query, I feel that I'd be willing to
trade out for a vast "overestimation" scenario for the analyzer, but I
can't find anything that allows me to tweak this for a particular
query. I've come across pg_hint_plan but doesn't seem like it would
support my version of PSQL and I'd prefer to be able to stay on RDS if
possible.

I came across a pending patch for multivariate statistics, but that
seems a little bit far out at this point.

Any advice for workarounds or solutions would be greatly appreciated!

Thanks in advance,
Branden

[1] https://gist.github.com/mrvisser/a22dddbdaa64ae1aa3dbc9637617b70d


I just wanted to update that I've found evidence that fixing the
planner row estimation may not actually influence it to use the more
performant merge join instead of hash join. I have found instances
where the row estimation is *overestimated* by a magnitude of 4x
(estimates 2.4m rows) and still chooses hash join over merge join,
where merge join is much faster (45s v.s. 12s).

Is there maybe a cost constant that I should be increasing for RDS to
influence the planner to choose a different strategy? I'm on a
db.m4.large instance.

Any insight around optimizing postgresql for RDS on this kind of
workload or what cost constant parameters would help this situation
would be extremely helpful.

Thanks again,
Branden


On Sat, Jul 30, 2016 at 11:21 AM, Branden Visser <mrvisser@gmail.com> wrote:
> Hi all, I seem to be in a bit of a jam with the query planner choosing
> a significantly less optimal route for a query.
>
> First, a common check list I've seen around when researching these
> types of issues on list:
>
> 1. pgsql version is PostgreSQL 9.4.7 running on Amazon RDS
> 2. Vacuum analyze was run
> 3. default_statistics_target was increased to 10,000 for the whole
> test database and re-analyzed
>
> Code and info referenced is in a github gist [1]. The files are:
>
> [Q-Report.sql] The actual SQL query I'm trying to optimize, runs a
> report on a large set of data
> [QP-Report-Normal.txt] The query plan of Q-Report.sql without any
> modification of the query plan
> [QP-Report-DisableHashJoinltxt] The query plan of Q-Report.sql after
> `set enable_hashjoin = false`
> [Q-CC-Underestimated.sql] An isolated query that shows 1 of
> potentially 2 significant row count underestimations
> [QP-CC-Understimated.txt] The query plan of Q-CC-Underestimated.sql
>
> Essentially, my report query has a hash join that is taking about 70%
> of the query time (45s), whereas a merge join seems to take less than
> a second.
>
> It looks like there are probably 2 significant row underestimations
> which may be contributing to this (first of which is the Q-CC query),
> but I haven't been able to validate that this is specifically the
> reason the query planner picks a hash join.
>
> My usage on this schema are mostly queries that get very small slices
> of data, however this particular query is a background workload that
> generates denormalized reports. That said, any kind of global
> configuration being added that would jeopardize the performance of the
> small-slice queries is less desirable.
>
> My question is, what would be the right way to go about diagnosing and
> optimizing this kind of issue? I'm trying to think of ways to
> potentially alter the structure of the query to reduce impact of the
> planning estimates, but I'm not coming up with any good ideas there.
> As mentioned, stats target was increased to 10,000 for testing and
> while it changed the row estimates, it was only by about 12 rows --
> did I maybe do something wrong there?
>
> With this particular kind of query, I feel that I'd be willing to
> trade out for a vast "overestimation" scenario for the analyzer, but I
> can't find anything that allows me to tweak this for a particular
> query. I've come across pg_hint_plan but doesn't seem like it would
> support my version of PSQL and I'd prefer to be able to stay on RDS if
> possible.
>
> I came across a pending patch for multivariate statistics, but that
> seems a little bit far out at this point.
>
> Any advice for workarounds or solutions would be greatly appreciated!
>
> Thanks in advance,
> Branden
>
> [1] https://gist.github.com/mrvisser/a22dddbdaa64ae1aa3dbc9637617b70d


Branden Visser <mrvisser@gmail.com> writes:
> I just wanted to update that I've found evidence that fixing the
> planner row estimation may not actually influence it to use the more
> performant merge join instead of hash join. I have found instances
> where the row estimation is *overestimated* by a magnitude of 4x
> (estimates 2.4m rows) and still chooses hash join over merge join,
> where merge join is much faster (45s v.s. 12s).

I wonder why the merge join is faster exactly.  It doesn't usually have a
huge benefit unless the inputs are presorted already.  The one case I can
think of where it can win quite a lot is if the range of merge keys in one
input is such that we can skip reading most of the other input.  (Extreme
example: one input has keys 1..10, but the other input has keys 1..10000.
We only need to read the first 1% of the second input, assuming there's an
index on its key column so that we don't have to read the whole thing
anyway to sort it.)

The planner is aware of that effect, but I wonder if it's misestimating it
for some reason.  Anyway it would be worth looking closely at your EXPLAIN
ANALYZE results to determine whether early-stop is happening or not.  It'd
manifest as one join input node showing an actual number of rows returned
that's less than you'd expect.

            regards, tom lane


Thanks for your reply Tom.

On Mon, Aug 1, 2016 at 6:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Branden Visser <mrvisser@gmail.com> writes:
>> I just wanted to update that I've found evidence that fixing the
>> planner row estimation may not actually influence it to use the more
>> performant merge join instead of hash join. I have found instances
>> where the row estimation is *overestimated* by a magnitude of 4x
>> (estimates 2.4m rows) and still chooses hash join over merge join,
>> where merge join is much faster (45s v.s. 12s).
>
> I wonder why the merge join is faster exactly.  It doesn't usually have a
> huge benefit unless the inputs are presorted already.  The one case I can
> think of where it can win quite a lot is if the range of merge keys in one
> input is such that we can skip reading most of the other input.  (Extreme
> example: one input has keys 1..10, but the other input has keys 1..10000.
> We only need to read the first 1% of the second input, assuming there's an
> index on its key column so that we don't have to read the whole thing
> anyway to sort it.)
>

The nature of the data is such that rows for tables `uv`, `ci` and `r`
aliases tend to be created and linked in unison, therefore maybe their
incremental ids and references may have some natural ordering in the
DB that coincidentally helps out the merge join?

> The planner is aware of that effect, but I wonder if it's misestimating it
> for some reason.

The uv.content_item_id field has 25% null values, if that helps
uncover anything. Aside from that it's a bit of a mystery to me.

> Anyway it would be worth looking closely at your EXPLAIN
> ANALYZE results to determine whether early-stop is happening or not.  It'd
> manifest as one join input node showing an actual number of rows returned
> that's less than you'd expect.
>

I think I'd need a little more detail on this to be able to
investigate it. The merge join shows a number of rows (~410k)
consistent with the sort that occurs just before and just after. Also
the join with result_items shows an expected increase in rows given my
data.

All that said adding an index on the r.content_item_id has improved
the performance of this query without having to fiddle with the
planner -- a pretty silly oversight :/

Let me know if there's anything in here that you want me to dig into
any further.

Cheers,
Branden


>                         regards, tom lane