Home > mailing lists

Re: bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: bad estimation together with large work_mem generates terrible slow hash joins
Date	October 9, 2014 21:50:45
Msg-id	5436D8FC.3040901@fuzzy.cz Whole thread Raw
In response to	Re: bad estimation together with large work_mem generates terrible slow hash joins (Kevin Grittner <kgrittn@ymail.com>)
Responses	Re: bad estimation together with large work_mem generates terrible slow hash joins
List	pgsql-hackers

Tree view

On 9.10.2014 16:55, Kevin Grittner wrote:
>
> I've tried various other tests using \timing rather than EXPLAIN, and
> the patched version looks even better in those cases. I have seen up
> to 4x the performance for a query using the patched version, higher
> variability in run time without the patch, and have yet to devise a
> benchmark where the patched version came out slower (although I admit
> to not being as good at creating such cases as some here).

Nice. Thanks for the testing!

The only case I've been able to come up with is when the hash table fits
into work_mem only thanks to not counting the buckets. The new code will
start batching in this case.

That is mostly luck, however, because it depends on the cardinality
estimates, and the best way to fix it is increasing work_mem (which is
safer thanks to reducing the overhead).

Also, Robert proposed a way to mitigate this, if we realize we'd have to
do batching during the initial sizing, we can peek whether reducing the
number of buckets (to 1/2 or maybe 1/4) would help. I believe this is a
good approach, and will look into that after pgconf.eu (i.e. early
November), unless someone else is interested.

> When given a generous work_mem setting the patched version often uses
> more of what it is allowed than the unpatched version (which is
> probably one of the reasons it tends to do better). If someone has
> set a high work_mem and has gotten by only because the configured
> amount is not all being used when it would benefit performance, they
> may need to adjust work_mem down to avoid memory problems. That
> doesn't seem to me to be a reason to reject the patch.

I'm not entirely sure I understand this paragraph. What do you mean by
"configured amount is not all being used when it would benefit
performance"? Can you give an example?

The only thing I can think of is the batching behavior described above.

> This is in "Waiting on Author" status only because I never got an
> answer about why the debug code used printf() rather the elog() at
> a DEBUG level.  Other than that, I would say this patch is Ready
> for Committer.  Tomas?  You there?

I think I responded to that on October 2, quoting:

===================================================================
On 2.10.2014 09:50, Tomas Vondra wrote:
> On 2.10.2014, 2:20, Kevin Grittner wrote:
>>
>> The patch applied and built without problem, and pass `make 
>> check-world`. The only thing that caught my eye was the addition of
>> debug code using printf() instead of logging at a DEBUG level. Is
>> there any reason for that?
>
> Not really. IIRC the main reason it that the other code in
> nodeHash.c uses the same approach.
===================================================================

I believe it's safe to switch the logging to elog(). IMHO the printf
logging is there from some very early version of the code, before elog
was introduced. Or something like that.

Tomas

pgsql-hackers by date:

From: Jeff Janes
Date: 09 October 2014, 21:41:40
Subject: Re: Expose options to explain? (track_io_timing)

From: Peter Geoghegan
Date: 09 October 2014, 21:52:11
Subject: Re: Promise index tuples for UPSERT

Re: bad estimation together with large work_mem generates terrible slow hash joins - Mailing list pgsql-hackers

Previous

Next