A worst case for qsort - Mailing list pgsql-hackers

From Peter Geoghegan
Subject A worst case for qsort
Date
Msg-id CAM3SWZS3obANDaPqDy4rPSA962O7Tb1x-HDqE+AMkD8FwaFPrQ@mail.gmail.com
Whole thread Raw
Responses Re: A worst case for qsort
Re: A worst case for qsort
Re: A worst case for qsort
List pgsql-hackers
There was some informal discussion of worst case performance of my
text sort support patch at the most recent developer's meeting in
Ottawa. There is no need to repeat the details here, which aren't
terribly interesting - I made a point about putting worst case
performance in context. I also made an argument around quicksort,
which despite being a widely used sorting algorithm is known to go
quadratic in the worst case, a really bad outcome. I've heard stories
about GUIs that hit the worst case and froze when a user clicked on a
column header to sort its contents. This kind of thing occurred until
surprisingly recently, and was one reason why Postgres eventually
adopted its own qsort() rather than using the OS-supplied
implementation. A lot of the measures proposed by Sedgewick in
particular made these outcomes occur very infrequently in practice.

Our quicksort routine is almost a verbatim copy of the implementation
that appeared in NetBSD, which is itself almost a verbatim copy of the
implementation that appears in the 1993 paper by J. L. Bentley and M.
D. McIlroy, ‘Engineering a sort function’. McIlroy
(with some input from Bentley) subsequently described a way of making
many high quality implementations perform very badly, including his
own:

http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

I don't want to belabor the point here. I do not mean to imply that
everything I've proposed to date wrt sort support for text is
self-evidently reasonable, in the same way that our use of Bentley and
McIlroy's qsort() seems reasonable on balance. I expect some
interesting discussion there.

Anyway, the paper says:

"The adversarial method works for almost any polymorphic program
recognizable as quicksort. The subject quicksort may copy values at
will, or work with lists rather than arrays. It may even pick the
pivot at random. The quicksort will be vulnerable provided only that
it satisfies some mild assumptions that are met by every
implementation I have seen".

IMHO, while worst case performance is a very useful tool for analyzing
algorithms (particularly their worst case time complexity), a worst
case should be put in its practical context. For example, if we had
reason to be concerned about *adversarial* inputs, I think that there
is a good chance that our qsort() actually would be problematic to the
point of driving us to prefer some generally slower alternative.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Proposal: Incremental Backup
Next
From: Michael Paquier
Date:
Subject: Re: Proposal: Incremental Backup