Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZTX5=nHxPpogPirQsH4cR+BpQS6r7Ktax0HMQiNLf-1qA@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Using quicksort for every external sort run  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Nov 18, 2015 at 6:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Nov 18, 2015 at 6:29 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> In principle, I have no problem with doing that. Through testing, I
>> cannot see any actual upside, though. Perhaps I just missed something.
>> Even 8MB is enough to avoid the multipass merge in the event of a
>> surprisingly high volume of data (my work laptop is elsewhere, so I
>> don't have my notes on this in front of me, but I figured out the
>> crossover point for a couple of cases).
>
> I'd be interested in seeing this analysis in some detail.

Sure. Jeff mentioned 8MB as a work_mem setting, so let's examine a
case where that's the work_mem setting, and see experimentally where
the crossover point for a multi-pass sort ends up.

If this table is created:

postgres=# create unlogged table bar as select (random() * 1e9)::int4
idx, 'payload xyz'::text payload from generate_series(1, 10100000) i;
SELECT 10100000

Then, on my system, a work_mem setting of 8MB *just about* avoids
seeing the multipass_warning message with this query:

postgres=# select count(distinct idx) from bar ;
  count
------------10,047,433
(1 row)

A work_mem setting of 235MB is just enough to make the query's sort
fully internal.

Let's see how things change with a higher work_mem setting of 16MB. I
mentioned quadratic growth: Having doubled work_mem, let's *quadruple*
the number of tuples, to see where this leaves a 16MB setting WRT a
multi-pass merge:

postgres=# drop table bar ;
DROP TABLE
postgres=# create unlogged table bar as select (random() * 1e9)::int4
idx, 'payload xyz'::text payload from generate_series(1, 10100000 * 4)
i;
SELECT 40400000

Further experiments show that this is the exact point at which the
16MB work_mem setting similarly narrowly avoids a multi-pass warning.
This should be the dominant consideration, because now a fully
internal sort requires 4X the work_mem of my original 16MB work_mem
example table/query.

The quadratic growth in a simple hybrid sort-merge strategy's ability
to avoid a multi-pass merge phase (growth relative to linear increases
in work_mem) can be demonstrated with simple experiments.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Using quicksort for every external sort run
Next
From: Greg Stark
Date:
Subject: Re: Using quicksort for every external sort run