Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Using quicksort for every external sort run
Date
Msg-id CA+TgmoZbwm-hpu2dUWKD7JZGTc=ByB9VDK0O0LiAmbPRQ8a5KQ@mail.gmail.com
Whole thread Raw
In response to Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Mon, Mar 21, 2016 at 2:01 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Mar 17, 2016 at 1:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> OK, I have now committed 0001
>
> I attach a revision of the external quicksort patch and supplementary
> small patches, rebased on top of the master branch.

I spent some time today reading through the new 0001 and in general I
think it looks pretty good.  But I think that there is some stuff in
there that logically seems to me to deserve to be separate patches.
In particular:

1. Changing cost_sort to consider disk access as 90% sequential, 10%
random rather than 75% sequential, 25% random.  As far as I can recall
from the thread, zero test results have been posted to demonstrate
that this is a good idea.  It also seems completely unprincipled.  If
the cost of sorts decreases as a result of this patch, it is because
we've reduced the CPU cost, not the I/O cost.  The changes we're
talking about here make I/O more random, not less random, because we
will now have more tapes, not fewer; which means merges will have to
seek the disk head more frequently, not less frequently.  Now, it's
tempting to say that this patch should result in some change to the
cost model: if the patch doesn't make sorting faster, we shouldn't
commit it at all, and if it does, then surely the cost model should
change accordingly.  But the question for the cost model isn't whether
the change to the model somehow reflects the increase in execution
speed.  It's whether we get better query plans with the change than
without.  I don't think there's been a degree of review of that aspect
of this patch on list that would give me confidence to commit a change
like this.

2. Restricting the maximum number of tapes to 500.  This seems like a
sound change and I don't object to it in theory.  But I've seen no
benchmark results which demonstrate that this is a good idea, and it
is quite separate from the core purpose of the patch.

Since time is short, I recommend we remove both of these things from
the patch and you can resubmit them as separate patches later.  As far
as I can see, neither of them is so tied into the rest of the patch
that the main part of the patch can't be committed without those
changes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.
Next
From: Robert Haas
Date:
Subject: Re: Using quicksort for every external sort run