Re: [HACKERS] The case for removing replacement selection sort - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] The case for removing replacement selection sort
Date
Msg-id CAH2-Wz=XuK6WSUpXX-7=y7n8XmDVV5YVFTUoXfWrgWt+Djk8Xw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] The case for removing replacement selection sort  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [HACKERS] The case for removing replacement selection sort
List pgsql-hackers
On Sun, Sep 10, 2017 at 5:07 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I'm currently re-running the benchmarks we did in 2016 for 9.6, but
> those are all sorts with a single column (see the attached script). But
> it'd be good to add a few queries testing sorts with multiple keys. We
> can either tweak some of the existing data sets + queries, or come up
> with entirely new tests.

I see that work_mem is set like this in the script:

"for wm in '1MB' '8MB' '32MB' '128MB' '512MB' '1GB'; do"

I suggest that we forget about values over 32MB, since the question of
how well quicksort does there was settled by your tests in 2016. I
would also add '4MB' to the list of wm values that you'll actually
test.

Any case with input that is initially in random order or DESC sort
order is not interesting, either. I suggest you remove those, too.

I think we're only interested in benchmarks where replacement
selection really does get its putative best case (no merge needed in
the end). Any (almost) sorted cases (the only cases that you are
interesting to test now) will always manage that, once you set
replacement_sort_tuples high enough, and provided there isn't even a
single tuple that is completely out of order. The "before" cases here
should have a replacement_sort_tuples of 1 billion (so that we're sure
to not have the limit prevent the use of replacement selection in the
first place), versus the "after" cases, which should have a
replacement_sort_tuples of 0 to represent my proposal (to represent
performance in a world where replacement selection is totally
removed).

> For the existing queries, I should have some initial results tomorrow,
> at least for the data sets with 100k and 1M rows. The tests with 10M
> rows will take much more time (it takes 1-2hours for a single work_mem
> value, and we're testing 6 of them).

I myself don't see that much value in a 10M row test.

Thanks for volunteering to test!
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [HACKERS] Automatic testing of patches in commit fest
Next
From: "Bossart, Nathan"
Date:
Subject: Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands