Home > mailing lists

Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Using quicksort for every external sort run
Date	April 7, 2016 21:11:06
Msg-id	CA+TgmobfJGNg8wojiJgv42xrsP8op0DMVYjt2XjoiGdn3+4-gQ@mail.gmail.com Whole thread Raw
In response to	Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>) Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>)
List	pgsql-hackers

Tree view

On Thu, Apr 7, 2016 at 1:17 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> I certainly agree that GUCs that aren't easy to tune are bad.  I'm
>> wondering whether the fact that this one is hard to tune is something
>> that can be fixed.  The comments about "padding" - a term I don't
>> like, because it to me implies a deliberate attempt to game the
>> benchmark when in reality wanting to sort a wide row is entirely
>> reasonable - make me wonder if this should be based on a number of
>> tuples rather than an amount of memory.  If considering the row width
>> makes us get the wrong answer, then let's not do that.
>
> That's a good point. While I don't think it will make it easy to tune
> the GUC, it will make it easier. Although, I think that it should
> probably still be GUC_UNIT_KB. That should just be something that my
> useselection() function compares to the overall size of memtuples
> alone when we must initially spill, not the value of
> work_mem/maintenance_work_mem. The degree of padding isn't entirely
> irrelevant, because not all comparisons will be resolved at the
> stup.datum1 level, but it's still clearly an improvement to not have
> wide tuples mess with things.
>
> Would you like me to revise the patch along those lines? Or, do you
> prefer units of tuples? Tuples are basically equivalent, but make it
> way less obvious what the relationship with CPU cache might be. If I
> revise the patch along these lines, I should also reduce the default
> replacement_sort_mem to produce roughly equivalent behavior for
> non-padded cases.

I prefer units of tuples, with the GUC itself therefore being
unitless.  I suggest we call the parameter replacement_sort_threshold
and document that (1) the ideal value may depend on the amount of CPU
cache available to running processes, with more cache implying higher
values; and (2) the ideal value may depend somewhat on the input data,
with more correlation implying higher values.  And then pick some
value that you think is likely to work well for most people and call
it good.

If you could prepare a new patch with those changes and also making
the changes requested in my other email, I will try to commit that
before the deadline.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Robert Haas
Date: 07 April 2016, 21:05:18
Subject: Re: Using quicksort for every external sort run

From: Peter Geoghegan
Date: 07 April 2016, 21:16:55
Subject: Re: Using quicksort for every external sort run

Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

Previous

Next