Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZQfU3Y-Zs29u4x5wVK39pb3b2WbcPmTPLrLeC4zen=e7w@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Wed, Mar 16, 2016 at 6:42 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> - I think that batchmemtuples() is somewhat weird.  Normally,
>> grow_memtuples() doubles the size of the array each time it's called.
>> So if you somehow called this function when you still had lots of
>> memory available, it would just double the size of the array.
>> However, I think the expectation is that it's only going to be called
>> when availMem is less than half of allowedMem, in which case we're
>> going to get the special "last increment of memtupsize" behavior,
>> where we expand the memtuples array by some multiple between 1.0 and
>> 2.0 based on allowedMem/memNowUsed.
>
> That's right. It might be possible for the simple doubling behavior to
> happen under artificial conditions instead, for example when we have
> enormous individual tuples, but if that does happen it's still
> correct. I just didn't think it was worth worrying about giving back
> more memory in such extreme edge-cases.

Come to think of it, maybe the pass-by-value datum sort case should
also call batchmemtuples() too (or something similar). If you look at
how beginmerge() is called, you'll see that that doesn't happen.

Obviously this case is not entitiled to a "memtupsize *
STANDARDCHUNKHEADERSIZE" refund, since of course there never was any
overhead like that at any point. And, obviously this case has no need
for batch memory at all. However, it is entitled to get a refund for
non-used tapes (accounted for, but, it turns out, never allocated
tapes). It should then get the benefit of that refund by way of
growing memtuples through a similar "final, honestly, I really mean it
this time" call to grow_memtuples().

So, while the "memtupsize * STANDARDCHUNKHEADERSIZE refund" part
should still be batch-specific (i.e. used for the complement of
tuplesort cases, never the datum pass-by-val case), the new
grow_memtuples() thing should always happen with external sorts.

The more I think about it, the more I wonder if we should commit
something like the debugging patch 0004-* (enabled only when
trace_sort = on, of course). Close scrutiny of what tuplesort.c is
doing with memory is important.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Gendered language in source
Next
From: "Joshua D. Drake"
Date:
Subject: Make primnodes.h gender neutral