Home > mailing lists

[HACKERS] Poor memory context performance in large hash joins - Mailing list pgsql-hackers

From	Jeff Janes
Subject	[HACKERS] Poor memory context performance in large hash joins
Date	February 24, 2017 04:13:19
Msg-id	CAMkU=1x1hvue1XYrZoWk_omG0Ja5nBvTdvgrOeVkkeqs71CV8g@mail.gmail.com Whole thread Raw
Responses	Re: [HACKERS] Poor memory context performance in large hash joins (Peter Geoghegan <pg@bowt.ie>) Re: [HACKERS] Poor memory context performance in large hash joins (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

When doing a hash join with large work_mem, you can have a large number of chunks. Then if ExecHashIncreaseNumBatches gets called, those chunks are walked through, moving the tuples to new chunks (or to disk, if they no longer match the batch's bitmask), and freeing the old chunks.

The number of new chunks can be almost as as large as the number of old chunks, especially if there is a very popular value. The problem is that every time an old chunk is freed, the code in aset.c around line 968 has to walk over all the newly allocated chunks in the linked list before it can find the old one being freed. This is an N^2 operation, and I think it has horrible CPU cache hit rates as well.

Is there a good solution to this? Could the new chunks be put in a different memory context, and then destroy the old context and install the new one at the end of ExecHashIncreaseNumBatches? I couldn't find a destroy method for memory contexts, it looks like you just reset the parent instead. But I don't think that would work here.

Thanks,

Jeff

pgsql-hackers by date:

From: Jim Nasby
Date: 24 February 2017, 03:56:41
Subject: Re: [HACKERS] Faster methods for getting SPI results (460%improvement)

From: Peter Geoghegan
Date: 24 February 2017, 04:15:18
Subject: Re: [HACKERS] Poor memory context performance in large hash joins

[HACKERS] Poor memory context performance in large hash joins - Mailing list pgsql-hackers

Previous

Next