Re: bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: bad estimation together with large work_mem generates terrible slow hash joins
Date
Msg-id 5436D8FC.3040901@fuzzy.cz
Whole thread Raw
In response to Re: bad estimation together with large work_mem generates terrible slow hash joins  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: bad estimation together with large work_mem generates terrible slow hash joins
List pgsql-hackers
On 9.10.2014 16:55, Kevin Grittner wrote:
>
> I've tried various other tests using \timing rather than EXPLAIN, and
> the patched version looks even better in those cases. I have seen up
> to 4x the performance for a query using the patched version, higher
> variability in run time without the patch, and have yet to devise a
> benchmark where the patched version came out slower (although I admit
> to not being as good at creating such cases as some here).

Nice. Thanks for the testing!

The only case I've been able to come up with is when the hash table fits
into work_mem only thanks to not counting the buckets. The new code will
start batching in this case.

That is mostly luck, however, because it depends on the cardinality
estimates, and the best way to fix it is increasing work_mem (which is
safer thanks to reducing the overhead).

Also, Robert proposed a way to mitigate this, if we realize we'd have to
do batching during the initial sizing, we can peek whether reducing the
number of buckets (to 1/2 or maybe 1/4) would help. I believe this is a
good approach, and will look into that after pgconf.eu (i.e. early
November), unless someone else is interested.

> When given a generous work_mem setting the patched version often uses
> more of what it is allowed than the unpatched version (which is
> probably one of the reasons it tends to do better). If someone has
> set a high work_mem and has gotten by only because the configured
> amount is not all being used when it would benefit performance, they
> may need to adjust work_mem down to avoid memory problems. That
> doesn't seem to me to be a reason to reject the patch.

I'm not entirely sure I understand this paragraph. What do you mean by
"configured amount is not all being used when it would benefit
performance"? Can you give an example?

The only thing I can think of is the batching behavior described above.

> This is in "Waiting on Author" status only because I never got an
> answer about why the debug code used printf() rather the elog() at
> a DEBUG level.  Other than that, I would say this patch is Ready
> for Committer.  Tomas?  You there?

I think I responded to that on October 2, quoting:

===================================================================
On 2.10.2014 09:50, Tomas Vondra wrote:
> On 2.10.2014, 2:20, Kevin Grittner wrote:
>>
>> The patch applied and built without problem, and pass `make 
>> check-world`. The only thing that caught my eye was the addition of
>> debug code using printf() instead of logging at a DEBUG level. Is
>> there any reason for that?
>
> Not really. IIRC the main reason it that the other code in
> nodeHash.c uses the same approach.
===================================================================

I believe it's safe to switch the logging to elog(). IMHO the printf
logging is there from some very early version of the code, before elog
was introduced. Or something like that.

Tomas





pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Expose options to explain? (track_io_timing)
Next
From: Peter Geoghegan
Date:
Subject: Re: Promise index tuples for UPSERT