Re: [HACKERS] Poor memory context performance in large hash joins - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: [HACKERS] Poor memory context performance in large hash joins
Date
Msg-id CAMkU=1xdSg0g8JuwUq=zro=PG508_0qKaKFLQbEcGciT4hAYLw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Poor memory context performance in large hash joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Poor memory context performance in large hash joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Feb 24, 2017 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Thu, Feb 23, 2017 at 2:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe it's time to convert that to a doubly-linked list.

> I don't think that would help.  You would need some heuristic to guess
> whether the chunk you are looking for is near the front, or near the end.

Uh, what?  In a doubly-linked list, you can remove an element in O(1)
time, you don't need any searching.  It basically becomes
      item->prev->next = item->next;
      item->next->prev = item->prev;
modulo possible special cases for the head and tail elements.

Currently it is walking the chain to identify which block holds the chunk in the first place, not just to get the pointer to the previous block.  But I see that that could be fixed by pointer arithmetic once there is a reason to fix it. Which there isn't a reason to as long as you need to walk the chain to get the prev pointer anyway.  Like this:?

targetblock = (AllocBlock) (((char*)chunk) - ALLOC_BLOCKHDRSZ);
 
>> Although if the
>> hash code is producing a whole lot of requests that are only a bit bigger
>> than the separate-block threshold, I'd say It's Doing It Wrong.  It should
>> learn to aggregate them into larger requests.

> Right now it is using compiled-in 32KB chunks.  Should it use something
> like max(32kb,work_mem/128) instead?

I'd say it should double the size of the request each time.  That's what
we do in most places.

I thought the design goal there was that the space in old chunks could get re-used into the new chunks in a reasonably fine-grained way. If the last chunk contains just over half of all the data, that couldn't happen.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] btree_gin and btree_gist for enums
Next
From: Jim Nasby
Date:
Subject: Re: [HACKERS] case_preservation_and_insensitivity = on