Re: Patch for ginCombineData - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Patch for ginCombineData
Date
Msg-id CAMkU=1xLz5ted=ajLJ9PBpshCByN6wJAFWQoiMhF+5WQA+nv=w@mail.gmail.com
Whole thread Raw
In response to Patch for ginCombineData  (Robert Abraham <robert.abraham86@googlemail.com>)
List pgsql-hackers
On Wed, Aug 5, 2015 at 3:17 AM, Robert Abraham <robert.abraham86@googlemail.com> wrote:
Hello,

we are using gin indexes on big tables. these tables happen to have several billion rows.
the index creation fails in ginCombineData in src/backend/access/ginbulk.c because repalloc is limited to 1 gb.
this limitation makes no sense in this context (compare comments in src/include/utils/memutils.h).
To overcome this limitation on tables with lots of rows repalloc_huge is used.
The test suite still succeeds on my machine.
Find the patch attached,

This looks good to me.  

One possible issue I see is that if accum.allocatedMemory is only slightly less than maintenance_work_mem just before we decide to repalloc, then doing the repalloc_huge could cause us to exceed maintenance_work_mem. Potentially by a lot.  In the worst case (when all the data we've seen during this round of accumulation falls into the same key), we would overrun by a factor of almost 2.  Or almost 3, if you count the time during the repalloc when the new data has been allocated but the old data not yet freed.  Perhaps the code here should look at the amount of maintenance_work_mem left and grow by less than a factor of 2 if it is too close to going over.

Mitigating that, it won't actually use very much of that memory.  Once control passes back at the end of this tuple, the caller will then realize it has overshot the maintenance_work_mem and will flush it all.  I think most modern OSes will not have a problems caused by allocated but untouched memory.

This patch doesn't introduce that problem, it just allows it to operate at a higher absolute size.  So I'm marking this as ready for committer.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Error message with plpgsql CONTINUE
Next
From: Tom Lane
Date:
Subject: Re: (full) Memory context dump considered harmful