Re: [HACKERS] Poor memory context performance in large hash joins - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: [HACKERS] Poor memory context performance in large hash joins
Date
Msg-id CAMkU=1zbiQYRtR_qOEkfqGLhqite-JN-J92v7SyEf7P9pg94YA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Poor memory context performance in large hash joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Poor memory context performance in large hash joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Feb 23, 2017 at 2:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> The number of new chunks can be almost as as large as the number of old
> chunks, especially if there is a very popular value.  The problem is that
> every time an old chunk is freed, the code in aset.c around line 968 has to
> walk over all the newly allocated chunks in the linked list before it can
> find the old one being freed.  This is an N^2 operation, and I think it has
> horrible CPU cache hit rates as well.

Maybe it's time to convert that to a doubly-linked list. 


I don't think that would help.  You would need some heuristic to guess whether the chunk you are looking for is near the front, or near the end.  And in this case, the desired chunk starts out at the front, and then keeps moving down the list with each iteration as new things are added to the front, until near the end of the ExecHashIncreaseNumBatches it is freeing them from near the end.  But in between, it is freeing them from the middle, so I don't think a doubly-linked list would change it from N^2, just lower the constant, even if you always knew which end to start at.  Or am I misunderstanding what the implications for a doubly-linked-list are?

What would really help here is if it remembered the next pointer of the just-freed chunk, and started the scan from that location the next time, cycling around to the head pointer if it doesn't find anything.  But I don't think that that is a very general solution.  

Or, if you could pass a flag when creating the context telling it whether memory will be freed mostly-LIFO or mostly-FIFO, and have it use a stack or a queue accordingly.
 
Although if the
hash code is producing a whole lot of requests that are only a bit bigger
than the separate-block threshold, I'd say It's Doing It Wrong.  It should
learn to aggregate them into larger requests.

Right now it is using compiled-in 32KB chunks.  Should it use something like max(32kb,work_mem/128) instead?

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Checksums by default?
Next
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] Checksums by default?