Re: A worst case for qsort - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: A worst case for qsort
Date
Msg-id CAM3SWZRkSUYe9bpY9ZVxCH7ogEwHfR6OCH2E1KEMjFemSciQQw@mail.gmail.com
Whole thread Raw
In response to Re: A worst case for qsort  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: A worst case for qsort
List pgsql-hackers
On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> So here.  You may not agree that the mitigation strategies for which
> others are asking for are worthwhile, but you can't expect everyone
> else to agree with your assessment of which cases are likely to occur
> in practice.  The case of a cohort of strings to be sorted which share
> a long fixed prefix and have different stuff at the end does not seem
> particularly pathological to me.  It doesn't, in other words, require
> an adversary: some real-world data sets will look like that.  I will
> forebear repeating examples I've given before, but I've seen that kind
> of thing more than once in real data sets that people (well, me,
> anyway) actually wanted to put into a PostgreSQL database.  So I'm
> personally of the opinion that the time you've put into trying to
> protect against those cases is worthwhile.  I understand that you may
> disagree with that, and that's fine: we're not all going to agree on
> everything.

I actually agree with you here. Sorting text is important, so we
should spend a lot of time avoiding regressions. Your example is
reasonable - I believe that people do that to a non-trivial degree.
The fact is, I probably can greatly ameliorate that case. However, to
give an example of a case I have less sympathy for, I'll name the case
you mention, *plus* the strings are already in logical order, and so
our qsort() can (dubiously) take advantage of that, so that there is a
1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls.
There might be other cases like that that crop up.

I also thought the paper was pretty cool, and thought it might be
interesting because of the fact that is was authored by McIlroy, and
he refers to weaknesses in his and our very own implementation
specifically.
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: John Cochran
Date:
Subject: Re: A worst case for qsort
Next
From: Gabriele Bartolini
Date:
Subject: Re: Proposal: Incremental Backup