Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Using quicksort for every external sort run
Date
Msg-id CA+TgmobfJGNg8wojiJgv42xrsP8op0DMVYjt2XjoiGdn3+4-gQ@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Thu, Apr 7, 2016 at 1:17 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> I certainly agree that GUCs that aren't easy to tune are bad.  I'm
>> wondering whether the fact that this one is hard to tune is something
>> that can be fixed.  The comments about "padding" - a term I don't
>> like, because it to me implies a deliberate attempt to game the
>> benchmark when in reality wanting to sort a wide row is entirely
>> reasonable - make me wonder if this should be based on a number of
>> tuples rather than an amount of memory.  If considering the row width
>> makes us get the wrong answer, then let's not do that.
>
> That's a good point. While I don't think it will make it easy to tune
> the GUC, it will make it easier. Although, I think that it should
> probably still be GUC_UNIT_KB. That should just be something that my
> useselection() function compares to the overall size of memtuples
> alone when we must initially spill, not the value of
> work_mem/maintenance_work_mem. The degree of padding isn't entirely
> irrelevant, because not all comparisons will be resolved at the
> stup.datum1 level, but it's still clearly an improvement to not have
> wide tuples mess with things.
>
> Would you like me to revise the patch along those lines? Or, do you
> prefer units of tuples? Tuples are basically equivalent, but make it
> way less obvious what the relationship with CPU cache might be. If I
> revise the patch along these lines, I should also reduce the default
> replacement_sort_mem to produce roughly equivalent behavior for
> non-padded cases.

I prefer units of tuples, with the GUC itself therefore being
unitless.  I suggest we call the parameter replacement_sort_threshold
and document that (1) the ideal value may depend on the amount of CPU
cache available to running processes, with more cache implying higher
values; and (2) the ideal value may depend somewhat on the input data,
with more correlation implying higher values.  And then pick some
value that you think is likely to work well for most people and call
it good.

If you could prepare a new patch with those changes and also making
the changes requested in my other email, I will try to commit that
before the deadline.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Using quicksort for every external sort run
Next
From: Peter Geoghegan
Date:
Subject: Re: Using quicksort for every external sort run