Re: A worst case for qsort - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: A worst case for qsort
Date
Msg-id CAM3SWZT1=PZRp721edUcNayWBkfeiaq9pFjLk6eZX_9TmvPnWQ@mail.gmail.com
Whole thread Raw
In response to Re: A worst case for qsort  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: A worst case for qsort
Re: A worst case for qsort
List pgsql-hackers
On Thu, Aug 7, 2014 at 11:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think that's actually not a very unrealistic case at all.  In
> general, I think that if a particular data distribution is a
> reasonable scenario, that data distribution plus "it's already sorted"
> is also reasonable.  Data gets presorted in all kinds of ways:
> sometimes it gets loaded in sorted (or nearly-sorted) order from
> another data source; sometimes we do an index scan that produces data
> in order by column A and then later sort by A, B; sometimes somebody
> clusters the table.  Respecting the right of other people to have
> different opinions, I don't think I'll ever be prepared to concede the
> presorted case as either uncommon or unimportant.

I think that pre-sorted, all-unique text datums, that have all
differences beyond the first 8 bytes, that the user happens to
actually want to sort are fairly rare. All I'm saying is: if that's a
case that has to lose out more than your more limited case in order to
get a large number of representative cases 3 - 7 times faster, and
marginal cases a good bit faster, that's probably worth it. I am
willing to fix any case that I can - as you say, it's often easier to
write the code that to argue - but let's have a sense of proportion
about these things. Even your quite reasonable case is going to lose
out a bit, because what I've done is fundamentally about deciding at
intervals during copying tuples if it's worth it. If it isn't, then
clearly that whole process went to waste, and we've merely cut our
losses. We could add a GUC to control it as suggested by Simon, or
even put something in attoptions per attribute to indicate that the
optimization should be avoided, but that's something that we've
historically resisted doing.

> That's not to prejudge anything that may or may not be in your patch,
> which I have not studied in enormous detail.  It's just what I think
> about the subject in general.

Fair enough. At the risk of sounding trite, I'll add: everything is a
trade-off.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Pg_upgrade and toast tables bug discovered
Next
From: Robert Haas
Date:
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)