Re: Sort performance cliff with small work_mem - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Sort performance cliff with small work_mem
Date
Msg-id 8aa3334f-49f7-e797-5b55-6dc7cf031500@iki.fi
Whole thread Raw
In response to Re: Sort performance cliff with small work_mem  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Sort performance cliff with small work_mem  (Peter Geoghegan <pg@bowt.ie>)
Re: Sort performance cliff with small work_mem  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On 02/05/18 19:41, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, May 2, 2018 at 11:38 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> To fix, I propose that we change the above so that we always subtract
>>> tapeSpace, but if there is less than e.g. 32 kB of memory left after that
>>> (including, if it went below 0), then we bump availMem back up to 32 kB. So
>>> we'd always reserve 32 kB to hold the tuples, even if that means that we
>>> exceed 'work_mem' slightly.
> 
>> Sounds very reasonable.
> 
> Agreed.  I think that was my code to start with, and the issue certainly
> didn't occur to me at the time.
> 
> I don't like the idea of using hardwired "32kB" though; some multiple
> of TAPE_BUFFER_OVERHEAD seems more plausible.  Perhaps MINORDER times
> TAPE_BUFFER_OVERHEAD would be good?

I don't think the amount that we reserve to hold the tuples should 
depend on those things. The function is "allocating" memory for the tape 
buffers, yes, but now we're talking about keeping some memory for the 
tuples, while quicksorting the initial runs. That seems independent from 
the number of tapes or the tape buffer size.

I'm not sure what you could derive that from, to make it less arbitrary. 
At the moment, I'm thinking of just doing something like this:

/*
  * Minimum amount of memory reserved to hold the sorted tuples in
  * TSS_BUILDRUNS phase.  This specifies a minimum size for the merge runs,
  * when work_mem is very small.
  */
#define MIN_TUPLE_MEMORY    (32 * 1024)

Independently of this, perhaps we should put in special case in 
dumptuples(), so that it would never create a run with fewer than 
maxTapes tuples. The rationale is that you'll need to hold that many 
tuples in memory during the merge phase anyway, so it seems silly to 
bail out before that while building the initial runs. You're going to 
exceed work_mem by the roughly same amount anyway, just in a different 
phase. That's not the case in this example, but it might happen when 
sorting extremely wide tuples.

- Heikki


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: GSOC 2018
Next
From: Peter Geoghegan
Date:
Subject: Re: Sort performance cliff with small work_mem