Thread: A worst case for qsort
There was some informal discussion of worst case performance of my text sort support patch at the most recent developer's meeting in Ottawa. There is no need to repeat the details here, which aren't terribly interesting - I made a point about putting worst case performance in context. I also made an argument around quicksort, which despite being a widely used sorting algorithm is known to go quadratic in the worst case, a really bad outcome. I've heard stories about GUIs that hit the worst case and froze when a user clicked on a column header to sort its contents. This kind of thing occurred until surprisingly recently, and was one reason why Postgres eventually adopted its own qsort() rather than using the OS-supplied implementation. A lot of the measures proposed by Sedgewick in particular made these outcomes occur very infrequently in practice. Our quicksort routine is almost a verbatim copy of the implementation that appeared in NetBSD, which is itself almost a verbatim copy of the implementation that appears in the 1993 paper by J. L. Bentley and M. D. McIlroy, ‘Engineering a sort function’. McIlroy (with some input from Bentley) subsequently described a way of making many high quality implementations perform very badly, including his own: http://www.cs.dartmouth.edu/~doug/mdmspe.pdf I don't want to belabor the point here. I do not mean to imply that everything I've proposed to date wrt sort support for text is self-evidently reasonable, in the same way that our use of Bentley and McIlroy's qsort() seems reasonable on balance. I expect some interesting discussion there. Anyway, the paper says: "The adversarial method works for almost any polymorphic program recognizable as quicksort. The subject quicksort may copy values at will, or work with lists rather than arrays. It may even pick the pivot at random. The quicksort will be vulnerable provided only that it satisfies some mild assumptions that are met by every implementation I have seen". IMHO, while worst case performance is a very useful tool for analyzing algorithms (particularly their worst case time complexity), a worst case should be put in its practical context. For example, if we had reason to be concerned about *adversarial* inputs, I think that there is a good chance that our qsort() actually would be problematic to the point of driving us to prefer some generally slower alternative. -- Peter Geoghegan
> For example, if we had reason to be concerned about *adversarial* > inputs, I think that there is a good chance that our qsort() actually > would be problematic to the point of driving us to prefer some generally > slower alternative. That is an interesting point. Indeed, a database in general often stores user-supplied data, which may happen to be sorted for presentation purpose in an interface. Same thing occured with hashtable algorithms and was/is a way to do DOS attacks on web applications. I'm not sure whether the qsort version discussed here would apply on user-supplied data, though. If so, adding some randomness in the decision process would suffice to counter the adversarial input argument you raised. -- Fabien.
On Tue, Aug 5, 2014 at 11:49 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > If so, adding some randomness in the decision process would suffice to > counter the adversarial input argument you raised. This is specifically addressed by the paper. Indeed, randomly choosing a pivot is a common strategy. It won't fix the problem. -- Peter Geoghegan
>> If so, adding some randomness in the decision process would suffice to >> counter the adversarial input argument you raised. > > This is specifically addressed by the paper. Indeed, randomly choosing > a pivot is a common strategy. It won't fix the problem. Too bad. I must admit that I do not see how to build a test case which would trigger a worst case behavior against a qsort which chooses the pivot randomly, but I have not read the paper, and possibly there is an element of context which is eluding me. -- Fabien.
I just browsed the paper linked by Peter and it looks like the attack has to be active against a currently executing qsort. In the paper, what happens is the comparison function is supplied by the attacker and effectively lies about the result of a comparison. It keeps the lies consistent in a very specific manner so that eventually qsort returns its input in a properly sorted fashion.
So it seems to me that the vulnerability only exits if an attacker supplied comparison function is permitted. For all other cases, assuming that only vetted comparison functions are permitted, the selection of a random pivot would make an attack on qsort using specially tailored input data improbable.
On Wed, Aug 6, 2014 at 9:18 AM, John Cochran <j69cochran@gmail.com> wrote: > So it seems to me that the vulnerability only exits if an attacker supplied > comparison function is permitted. For all other cases, assuming that only > vetted comparison functions are permitted, the selection of a random pivot > would make an attack on qsort using specially tailored input data > improbable. Whether or not random pivot selection would make it more difficult for a malicious party to generate pre-processed data that will cause very bad performance is not all that relevant IMV, since we don't do that. There are good practical reasons to prefer median of medians pivot selection, which is what we actually do, and which is clearly affected to the extent that pre-processing malicious data that causes (at least) near worst-case performance is possible. It's possible to predict pivot selection. As the paper says, "An adversary can make such a quicksort go quadratic by arranging for the pivot to compare low against almost all items not seen during pivot selection". Random pivot selection will certainly result in more frequent lopsided partitions without any malicious intent. -- Peter Geoghegan
> Random pivot selection will certainly result in more frequent lopsided > partitions without any malicious intent. Yep. It makes "adversary input" attacks more or less impossible, at the price of higher average cost. Maybe a less randomized version would do, i.e. select randomly one of the "3" medians found, but would keep some nice better performance property. It would need proof, though. -- Fabien.
> IMHO, while worst case performance is a very useful tool for analyzing > algorithms (particularly their worst case time complexity), a worst > case should be put in its practical context. For example, if we had > reason to be concerned about *adversarial* inputs, I think that there > is a good chance that our qsort() actually would be problematic to the > point of driving us to prefer some generally slower alternative. ISTM that you raise two distinct questions wrt to PostgreSQL, which are, is the worst case performance really an issue: (1) in general (2) wrt adversarial inputs The answer could be (1) "mostly no" and (2) "maybe yes". It suggests that where qsort is used, the administrator wary of (2) could be allowed to use an alternate implementation, maybe some merge sort, say by tweaking a configuration option in "postgresql.conf". -- Fabien.
Hello John, > [...] > In fact, the mentioned paper says this about the subject "Moreover, if > worst-case performance is important, Quicksort is the wrong algorithm." I fully agree with this conclusion. -- Fabien
On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan <pg@heroku.com> wrote: > "The adversarial method works for almost any polymorphic program > recognizable as quicksort. The subject quicksort may copy values at > will, or work with lists rather than arrays. It may even pick the > pivot at random. The quicksort will be vulnerable provided only that > it satisfies some mild assumptions that are met by every > implementation I have seen". > > IMHO, while worst case performance is a very useful tool for analyzing > algorithms (particularly their worst case time complexity), a worst > case should be put in its practical context. For example, if we had > reason to be concerned about *adversarial* inputs, I think that there > is a good chance that our qsort() actually would be problematic to the > point of driving us to prefer some generally slower alternative. I completely agree with this, and I think everyone else does, too. Where we don't all necessarily agree is which worst cases are realistic and which worst cases are simply pathological cases with which we need not be concerned in practice. For example, when I proposed the patch to use MVCC catalog snapshots, Andres invented a test case which I thought was far more brutal than anything likely to be encountered in the real world. But Andres didn't agree; he thought the test case was realistic. So, I worked on the patch some more and came up with a solution that performs acceptably even if those brutal conditions are encountered in the world. As a result, the patch as committed is better than the one originally submitted. I could certainly have spent more time debating whether that effort was worthwhile, and maybe I would have won the argument, but it was a better of use of that time to instead try to improve the patch. So here. You may not agree that the mitigation strategies for which others are asking for are worthwhile, but you can't expect everyone else to agree with your assessment of which cases are likely to occur in practice. The case of a cohort of strings to be sorted which share a long fixed prefix and have different stuff at the end does not seem particularly pathological to me. It doesn't, in other words, require an adversary: some real-world data sets will look like that. I will forebear repeating examples I've given before, but I've seen that kind of thing more than once in real data sets that people (well, me, anyway) actually wanted to put into a PostgreSQL database. So I'm personally of the opinion that the time you've put into trying to protect against those cases is worthwhile. I understand that you may disagree with that, and that's fine: we're not all going to agree on everything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 7, 2014 at 11:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan <pg@heroku.com> wrote:
> "The adversarial method works for almost any polymorphic program
> recognizable as quicksort. The subject quicksort may copy values at
> will, or work with lists rather than arrays. It may even pick the
> pivot at random. The quicksort will be vulnerable provided only that
> it satisfies some mild assumptions that are met by every
> implementation I have seen".
>
> IMHO, while worst case performance is a very useful tool for analyzing
> algorithms (particularly their worst case time complexity), a worst
> case should be put in its practical context. For example, if we had
> reason to be concerned about *adversarial* inputs, I think that there
> is a good chance that our qsort() actually would be problematic to the
> point of driving us to prefer some generally slower alternative.
If one is concerned about adversarial inputs, then as mentioned earlier, two main steps need to be taken.
1. Don't permit potential adversaries define the comparison function used by qsort().
2. Perform random selection of pivot to prevent precomputed lists of data that invoke quadratic behavior in qsort.
And before the argument that a random pivot will be less efficient than the median of median that's currently being used, you need to look at what is currently being used.
1. If a "small" partition is being sorted, the element at n/2 is selected as the pivot with n being the size of the partition.
2. If a "medium" partition is being sorted, then pivot selected is the median of the elements at 0, n/2, and n.
3. And finally, if a "large" partition is being sorted, potential pivots are selected from 0, n/8, n/4, 3n/8, n/2, 5n/8, 3n/4, 7n/8, and n. Of those 9 candidates, the median of medians is selected as the pivot.
There is nothing in the world that would prevent the following.
1. Short partition - select a pivot at random.
2. Medium partition - select the median of 3 randomly selected candidates.
3. Large partition - select the median of medians of 9 randomly selected candidates.
Using the above random pivot selection methods, the performance of a hardened qsort should be close to that of an unhardened qsort. The only real difference is the cost of generating random numbers to select the candidates for the median tests. However, if an malicious compare function is permitted to be defined, it would still be vulnerable to that attack, but it would be pretty much immune to a precomputed data attack.
Or perhaps it just might be simpler to detect an attack and fall back to a sort algorithm that doesn't have quadratic behavior.
What I mean by that is the qsort in the "Engineering a Sort Function" has a big O of approximately 1.1 n ln n comparisons. If upon starting qsort, it computes an upper bound of comparisons it's willing to perform. Say 3 n ln n or so (1.386 n log n is the estimate for a randomly selected pivot). Then if the upper bound is reached, the qsort function backs out leaving the array in a partially sorted order, and then calls a slower sort that doesn't exhibit quadratic behavior to actually finish the sort.
The result of doing that would be most, if not all, sorts being handled by qsort in a timely fashion. But if bad luck or an adversary strikes, the qsort bails out after things look bad and passes the task to a different sort. I'll be the first to admit that the hybrid approach would be slower in those bad cases than it would be if the alternate sort was selected at the beginning, but it would be faster than letting qsort continue with quadratic behavior.
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. — C.A.R. Hoare
On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas <robertmhaas@gmail.com> wrote: > So here. You may not agree that the mitigation strategies for which > others are asking for are worthwhile, but you can't expect everyone > else to agree with your assessment of which cases are likely to occur > in practice. The case of a cohort of strings to be sorted which share > a long fixed prefix and have different stuff at the end does not seem > particularly pathological to me. It doesn't, in other words, require > an adversary: some real-world data sets will look like that. I will > forebear repeating examples I've given before, but I've seen that kind > of thing more than once in real data sets that people (well, me, > anyway) actually wanted to put into a PostgreSQL database. So I'm > personally of the opinion that the time you've put into trying to > protect against those cases is worthwhile. I understand that you may > disagree with that, and that's fine: we're not all going to agree on > everything. I actually agree with you here. Sorting text is important, so we should spend a lot of time avoiding regressions. Your example is reasonable - I believe that people do that to a non-trivial degree. The fact is, I probably can greatly ameliorate that case. However, to give an example of a case I have less sympathy for, I'll name the case you mention, *plus* the strings are already in logical order, and so our qsort() can (dubiously) take advantage of that, so that there is a 1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls. There might be other cases like that that crop up. I also thought the paper was pretty cool, and thought it might be interesting because of the fact that is was authored by McIlroy, and he refers to weaknesses in his and our very own implementation specifically. -- Peter Geoghegan
On Thu, Aug 7, 2014 at 1:16 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> So here. You may not agree that the mitigation strategies for which >> others are asking for are worthwhile, but you can't expect everyone >> else to agree with your assessment of which cases are likely to occur >> in practice. The case of a cohort of strings to be sorted which share >> a long fixed prefix and have different stuff at the end does not seem >> particularly pathological to me. It doesn't, in other words, require >> an adversary: some real-world data sets will look like that. I will >> forebear repeating examples I've given before, but I've seen that kind >> of thing more than once in real data sets that people (well, me, >> anyway) actually wanted to put into a PostgreSQL database. So I'm >> personally of the opinion that the time you've put into trying to >> protect against those cases is worthwhile. I understand that you may >> disagree with that, and that's fine: we're not all going to agree on >> everything. > > I actually agree with you here. /me pops cork. > Sorting text is important, so we > should spend a lot of time avoiding regressions. Your example is > reasonable - I believe that people do that to a non-trivial degree. > The fact is, I probably can greatly ameliorate that case. However, to > give an example of a case I have less sympathy for, I'll name the case > you mention, *plus* the strings are already in logical order, and so > our qsort() can (dubiously) take advantage of that, so that there is a > 1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls. > There might be other cases like that that crop up. I think that's actually not a very unrealistic case at all. In general, I think that if a particular data distribution is a reasonable scenario, that data distribution plus "it's already sorted" is also reasonable. Data gets presorted in all kinds of ways: sometimes it gets loaded in sorted (or nearly-sorted) order from another data source; sometimes we do an index scan that produces data in order by column A and then later sort by A, B; sometimes somebody clusters the table. Respecting the right of other people to have different opinions, I don't think I'll ever be prepared to concede the presorted case as either uncommon or unimportant. That's not to prejudge anything that may or may not be in your patch, which I have not studied in enormous detail. It's just what I think about the subject in general. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 7, 2014 at 11:33 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I think that's actually not a very unrealistic case at all. In > general, I think that if a particular data distribution is a > reasonable scenario, that data distribution plus "it's already sorted" > is also reasonable. Data gets presorted in all kinds of ways: > sometimes it gets loaded in sorted (or nearly-sorted) order from > another data source; sometimes we do an index scan that produces data > in order by column A and then later sort by A, B; sometimes somebody > clusters the table. Respecting the right of other people to have > different opinions, I don't think I'll ever be prepared to concede the > presorted case as either uncommon or unimportant. I think that pre-sorted, all-unique text datums, that have all differences beyond the first 8 bytes, that the user happens to actually want to sort are fairly rare. All I'm saying is: if that's a case that has to lose out more than your more limited case in order to get a large number of representative cases 3 - 7 times faster, and marginal cases a good bit faster, that's probably worth it. I am willing to fix any case that I can - as you say, it's often easier to write the code that to argue - but let's have a sense of proportion about these things. Even your quite reasonable case is going to lose out a bit, because what I've done is fundamentally about deciding at intervals during copying tuples if it's worth it. If it isn't, then clearly that whole process went to waste, and we've merely cut our losses. We could add a GUC to control it as suggested by Simon, or even put something in attoptions per attribute to indicate that the optimization should be avoided, but that's something that we've historically resisted doing. > That's not to prejudge anything that may or may not be in your patch, > which I have not studied in enormous detail. It's just what I think > about the subject in general. Fair enough. At the risk of sounding trite, I'll add: everything is a trade-off. -- Peter Geoghegan
On Thu, Aug 7, 2014 at 12:06 PM, Peter Geoghegan <pg@heroku.com> wrote: > I think that pre-sorted, all-unique text datums, that have all > differences beyond the first 8 bytes, that the user happens to > actually want to sort are fairly rare. Actually, you could use that case to justify not doing strxfrm() transformation of very large lists in the more general case, where strxfrm() blobs aren't truncated/abbreviated, but our qsort() is used, which goes against the strong recommendation of, just to pick an example at random, the glibc documentation. It states: "Here is an example of how you can use strxfrm when you plan to do many comparisons. It does the same thing as the previous example, but much faster, because it has to transform each string only once, no matter how many times it is compared with other strings. Even the time needed to allocate and free storage is much less than the time we save, when there are many strings." [1] Sure, that would still be better than the equivalent abbreviated keys case, since no strcoll() tie-breakers are required, but it would still be bad, and all because of our pre-check for sorted input. [1] https://www.gnu.org/software/libc/manual/html_node/Collation-Functions.html -- Peter Geoghegan
On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan <pg@heroku.com> wrote:
I think that pre-sorted, all-unique text datums, that have all
differences beyond the first 8 bytes, that the user happens to
actually want to sort are fairly rare.
While I'm sure it's not common, I've seen a couple of ten-million tuple tables having a URL column as primary key where 98% of the entries begin with 'http://www.'
So, that exact scenario is out there.
On Thu, Aug 7, 2014 at 2:23 PM, Rod Taylor <rod.taylor@gmail.com> wrote: > While I'm sure it's not common, I've seen a couple of ten-million tuple > tables having a URL column as primary key where 98% of the entries begin > with 'http://www.' > > So, that exact scenario is out there. Sure, that scenario is relatively common. My point was that that scenario, alongside a perfect logical/physical heap correlation, and alongside a frequent need to sort is uncommon. If you're able to get away with a "bubble sort best case" there, then the sort is going to be extremely fast relative to any other database system anyway, and you don't have too much to complain about. It's exactly the same wasted cost as in the general case (where the tuples aren't in order), except that it looks more expensive next to your unrealistically fast sort that you were very lucky to be able to get away with. I think the special pre-check for already sorted tuples within qsort() is at best only justified by scenarios like where a serial primary key index is reindexed. That's about it. You can totally alter the outcome of this case by inserting a single tuple, so the top-level pre-check fails. -- Peter Geoghegan
Sigh. Found another example.
A table with 15 million entries and a unique key on filesystem location for things users created via a web interface.
Entries all begin with /usr/home/ ...
Entries all begin with /usr/home/ ...
This one is frequently sorted as batch operations against the files are performed in alphabetical order to reduce conflict issues that a random ordering may cause between jobs.
regards,
Rod
regards,
Rod
On Thu, Aug 7, 2014 at 5:23 PM, Rod Taylor <rod.taylor@gmail.com> wrote:
On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan <pg@heroku.com> wrote:I think that pre-sorted, all-unique text datums, that have all
differences beyond the first 8 bytes, that the user happens to
actually want to sort are fairly rare.While I'm sure it's not common, I've seen a couple of ten-million tuple tables having a URL column as primary key where 98% of the entries begin with 'http://www.'So, that exact scenario is out there.
On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor <rod.taylor@gmail.com> wrote: > This one is frequently sorted as batch operations against the files are > performed in alphabetical order to reduce conflict issues that a random > ordering may cause between jobs. Sure. There are cases out there. But, again, I have a hard time imagining why you'd expect those to be pre-sorted in practice, and particularly why you'd feel justified in expecting that to sort much faster than equivalent though slightly imperfectly correlated data. Without that, the fmgr-elision aspect of sort support appears to offer enough for us to still win on balance [1], assuming 9.4 is our basis of comparison. [1] http://www.postgresql.org/message-id/CAM3SWZQHjxiyrsqBs5w3u-vTJ_jT2hp8o02big5wYb4m9Lp1jg@mail.gmail.com -- Peter Geoghegan
On Thu, Aug 7, 2014 at 5:52 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor <rod.taylor@gmail.com> wrote: >> This one is frequently sorted as batch operations against the files are >> performed in alphabetical order to reduce conflict issues that a random >> ordering may cause between jobs. > > Sure. There are cases out there. But, again, I have a hard time > imagining why you'd expect those to be pre-sorted in practice, ... Well, I'm not sure why you're having a hard time imagining it. Presorted input is a common case in general; that's why we have a check for it. That check adds overhead in the non-pre-sorted case to improve the pre-sorted case, and nobody's ever argued for removing it that I can recall. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Thu, Aug 7, 2014 at 5:52 PM, Peter Geoghegan <pg@heroku.com> wrote: > > On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor <rod.taylor@gmail.com> wrote: > >> This one is frequently sorted as batch operations against the files are > >> performed in alphabetical order to reduce conflict issues that a random > >> ordering may cause between jobs. > > > > Sure. There are cases out there. But, again, I have a hard time > > imagining why you'd expect those to be pre-sorted in practice, ... > > Well, I'm not sure why you're having a hard time imagining it. > Presorted input is a common case in general; that's why we have a > check for it. That check adds overhead in the non-pre-sorted case to > improve the pre-sorted case, and nobody's ever argued for removing it > that I can recall. Agreed. This is not all that uncommon to happen in practice. That said- this is perhaps another good case for where it'd be extremely handy to have anonymized statistics information from real systems which would allow us to more easily go *look* at this specific question. There are organizations out there who run many thousands of PG instances who have expressed interest in supporting exactly that kind of statistics gathering (indeed, I'm pretty sure Peter is familiar with one... ;) - what we need is an architecture, design, and implementation to make it happen.. I'm guessing you (Robert and Peter, especially) have already been thinking about what we could do to make the above wish/dream a reality. Perhaps if we could get the initial requirements down, someone would be able to have time to work on making it happen; it would be really great to see progress on this front for 9.5. Or, if existing products implement such metrics collection already, perhaps some numbers could be shared with the community to help address this (and other) questions. Thanks, Stephen
On Fri, Aug 8, 2014 at 5:54 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Well, I'm not sure why you're having a hard time imagining it. > Presorted input is a common case in general; that's why we have a > check for it. That check adds overhead in the non-pre-sorted case to > improve the pre-sorted case, and nobody's ever argued for removing it > that I can recall. I think that there has been a fair amount of skepticism - e.g., [1] But that's beside the point. What I mean is that I think that the intersection of those three things - pre-sorted input, with all differences after the first 8 bytes, and with a user requirement to sort using the column - is fairly rare in practice. It's not impossible, but it is fairly rare. If that was the only case that was actually regressed, even taking into account fmgr elision, I think that would be quite reasonable. It wouldn't be massively regressed either, and it's a case that's already very fast relative to other systems anyway, if you're lucky enough to get it. You'd better have exactly sorted tuples, though. It's been my observation that one slight difference can drastically alter the outcome [2]. [1] http://www.postgresql.org/message-id/18033.1361789032@sss.pgh.pa.us [2] http://www.postgresql.org/message-id/CAEYLb_W++UhrcWprzG9TyBVF7Sn-c1s9oLbABvAvPGdeP2DFSQ@mail.gmail.com -- Peter Geoghegan