Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Using quicksort for every external sort run
Date
Msg-id CAM3SWZTDzBFxRig3rsWydxDNnHfg+AJYEiEydtQL0krqVTqKgQ@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Dec 2, 2015 at 10:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I'm not very concerned about a regression that is only seen when
>> work_mem is set below the (very conservative) postgresql.conf default
>> value of 4MB when sorting 100 million integers.
>
> Perhaps surprisingly, I tend to agree.  I'm cautious of regressions
> here, but large sorts in queries are relatively uncommon.  You're
> certainly not going to want to return a 100 million tuples to the
> client.

Right. The fact that it was only a 5% regression is also a big part of
what made me unconcerned. I am glad that we've characterized the
regression that I assumed was there, though -- I certainly knew that
Knuth and so on were not wrong to emphasize increasing run size in the
1970s. Volume 3 of The Art of Computer Programming literally has a
pull-out chart showing the timing of external sorts. This includes the
time it takes for a human operator to switch magnetic tapes, and
rewind those tapes. The underlying technology has changed rather a lot
since, of course.

> While large sorts are uncommon in queries, they are much more common
> in index builds.  Therefore, I think we ought to be worrying more
> about regressions at 64MB than at 4MB, because we ship with
> maintenance_work_mem = 64MB and a lot of people probably don't change
> it before trying to build an index.  If we make those index builds go
> faster, users will be happy.  If we make them go slower, users will be
> sad.  So I think it's worth asking the question "are there any CREATE
> INDEX commands that someone might type on a system on which they've
> done no other configuration that will be slower with this patch"?

I certainly agree that that's a good place to focus. I think that it's
far, far less likely that anything will be slowed down when you take
this as a cut-off point. I don't want to overemphasize it, but the
analysis of how many more passes are needed because of lack of a
replacement selection heap (the "quadratic growth" thing) gives me
confidence. A case with less than 4MB of work_mem is where we actually
saw *some* regression.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: cevian
Date:
Subject: Postgres_FDW optimizations
Next
From: Pavel Stehule
Date:
Subject: Re: broken tests