Re: Problems with hash join over nested loop - Mailing list pgsql-performance

From Jim Nasby
Subject Re: Problems with hash join over nested loop
Date
Msg-id 526FE086.4020903@enova.com
Whole thread Raw
In response to Re: Problems with hash join over nested loop  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Problems with hash join over nested loop  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 10/29/13 9:10 AM, Merlin Moncure wrote:
> On Mon, Oct 28, 2013 at 6:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jim Nasby <jnasby@enova.com> writes:
>>> I've been working on trying to normalize a table that's got a bunch of text fields. Normalizing the first 4 has
beena non-issue. But when I try and normalize 2 additional fields a bunch of query plans go belly-up. 
>>
>> Try increasing join_collapse_limit/from_collapse_limit.  I'm a bit
>> confused by your description but I think maybe you've got more than 8
>> relations in the subqueries.
>
> Hm -- wondering out loud if there would be any value in terms of
> decorating explain output when that limit was hit and if it's
> practical to do so...

I think the community would *love* any method of noting potential performance problems. Hitting the GEQO limit fits in
thereas well. We could eventually warn about other things as well, like going just over work_mem or seqscanning a big
tablefor a small number of rows. 

I'm also wondering if it's time to raise those limits. I constructed a somewhat contrived test query in our schema to
testthis. This is a legitimate join path for our schema... I can't see why someone would use the *full* path, but
smallersections are definitely in use. It's basically all joins, with one simple filter on top of that. 

I'd rather not share the actual query or plan, but:

grep -i scan temp.txt |wc -l
28

All tests done via EXPLAIN ... in psql with \timing turned on. I ignored obvious outliers... margin of error is ~5%
fromwhat I saw: 

Default config:            21ms
geqo = off:            19ms
geqo off, from_collapse = 99:    19ms
from_collapse_limit = 99:    21ms
join_collapse_limit = 99:    171ms
both = 99:            176ms
geqo off, join_collapse = 99    1.2s
both + geqo = off:        1.2s

Obviously there's cases where 1.2 seconds of planning time will kill you... but if you're that time sensitive and using
28tables I think it's reasonable to expect people to do some hand tuning! :) 

Conversely, where you are likely to get to that sheer number of tables is when you're doing something that's going to
takea non-trivial amount of time to execute. In this particular case, if I limit the query to a single row (via blah_id
=2, not via limit), it takes ~2ms to execute when cached with full optimization (interestingly, planning time was at
about926ms at that point). 

Now that looks horrible... 926ms to plan a query that takes 2ms to return. But I'm not even going to bother with the
20msplan, because it's going to take minutes if not HOURS to run (it's just full scanning everything it can find). 
--
Jim Nasby, Lead Data Architect   (512) 569-9461


pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Problems with hash join over nested loop
Next
From: Tom Lane
Date:
Subject: Re: Problems with hash join over nested loop