Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Using quicksort for every external sort run
Date
Msg-id CA+Tgmob9ssG0XB4-GzPARCe0C8gXzHBCrAzNyout1oWj+9Nc9Q@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Thu, Nov 19, 2015 at 3:43 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> I'd be interested in seeing this analysis in some detail.
>
> Sure. Jeff mentioned 8MB as a work_mem setting, so let's examine a
> case where that's the work_mem setting, and see experimentally where
> the crossover point for a multi-pass sort ends up.
>
> If this table is created:
>
> postgres=# create unlogged table bar as select (random() * 1e9)::int4
> idx, 'payload xyz'::text payload from generate_series(1, 10100000) i;
> SELECT 10100000
>
> Then, on my system, a work_mem setting of 8MB *just about* avoids
> seeing the multipass_warning message with this query:
>
> postgres=# select count(distinct idx) from bar ;
>
>    count
> ------------
>  10,047,433
> (1 row)
>
> A work_mem setting of 235MB is just enough to make the query's sort
> fully internal.
>
> Let's see how things change with a higher work_mem setting of 16MB. I
> mentioned quadratic growth: Having doubled work_mem, let's *quadruple*
> the number of tuples, to see where this leaves a 16MB setting WRT a
> multi-pass merge:
>
> postgres=# drop table bar ;
> DROP TABLE
> postgres=# create unlogged table bar as select (random() * 1e9)::int4
> idx, 'payload xyz'::text payload from generate_series(1, 10100000 * 4)
> i;
> SELECT 40400000
>
> Further experiments show that this is the exact point at which the
> 16MB work_mem setting similarly narrowly avoids a multi-pass warning.
> This should be the dominant consideration, because now a fully
> internal sort requires 4X the work_mem of my original 16MB work_mem
> example table/query.
>
> The quadratic growth in a simple hybrid sort-merge strategy's ability
> to avoid a multi-pass merge phase (growth relative to linear increases
> in work_mem) can be demonstrated with simple experiments.

OK, so reversing this analysis, with the default work_mem of 4MB, we'd
need a multi-pass merge for more than 235MB/4 = 58MB of data.  That is
very, very far from being a can't-happen scenario, and I would not at
all think it would be acceptable to ignore such a case.  Even ignoring
the possibility that someone with work_mem = 8MB will try to sort
235MB of data strikes me as out of the question.  Those seem like
entirely reasonable things for users to do.  Greg's example of someone
with work_mem = 1GB trying to sort 4TB does not seem like a crazy
thing to me.  Yeah, in all of those cases you might think that users
should set work_mem higher, but that doesn't mean that they actually
do.  Most systems have to set work_mem very conservatively to make
sure they don't start swapping under heavily load.

I think you need to revisit your assumptions here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Error with index on unlogged table
Next
From: Peter Geoghegan
Date:
Subject: Re: Using quicksort for every external sort run