Re: Fix for OpenSSL error queue bug - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Fix for OpenSSL error queue bug |
Date | |
Msg-id | CAM3SWZQWH=FLdZ-8Qtsc9ja-deR132_=JEoO7M=4--nZ2PAFng@mail.gmail.com Whole thread Raw |
In response to | Re: Fix for OpenSSL error queue bug (Peter Eisentraut <peter_e@gmx.net>) |
Responses |
Re: Fix for OpenSSL error queue bug
|
List | pgsql-hackers |
On Thu, Mar 10, 2016 at 7:22 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > Arguably, if everyone followed "my" approach, this should be very easy > to fix everywhere. I don't think that there is any clear indication that the OpenSSL people would share that view. Or my view. Or anything that's sensible or practical or actionable. > Instead, reading through the PHP bug report, they > are coming up with a fairly complex solution for clearing the error > queue afterwards so as to not leave "landmines" for the next guy. But > their code will (AFAICT) still be wrong because they are not clearing > the error *before* the API calls where it is required per documentation. > So "everyone" (sample of 2) is scrambling to clean up for the next guy > instead of doing the straightforward fix of following the API > documentation and cleaning up before their own calls. It will be less wrong, though. PostgreSQL is the project that doesn't trust a C90 standard library function to not blithely write passed the bounds of a passed buffer, all because of a bug in certain versions of Solaris based systems that was several years old at the time. See commit be8b06c364. My view that that wasn't really worth worrying about was clearly the minority view when this was discussed (a minority of 1, and a majority of 4 or 5, IIRC). I think that this case vastly exceeds that standard for worrying about other people's broken code; in this case, we ourselves made the same mistake for years and years. > I also see the clean-up-afterwards approach in the Python ssl module. I > fear there is de facto a second API specification that requires you to > clean up errors after yourself and gives an implicit guarantee that the > error queue is empty whenever you want to make any SSL calls. I don't > think this actually works in all cases, but maybe if everyone else is > convinced of that (in plain violation of the published OpenSSL > documentation, AFAICT) we need to get on board with that for > interoperability. I didn't know that Python's ssl module did that. That seems to lend support to my view, which is that we should similarly clear the thread's queue lest anyone else be affected. Yes, this approach is fairly scatter-gun, but frankly that's just the situation we find ourselves in. >>> Also, there is nothing that >>> says that an error produces exactly one entry in the error queue; it >>> could be multiple. Or that errors couldn't arise at random times >>> between the reset and whatever happens next. >> >> I think that it's kind of implied, since calling ERR_get_error() pops >> the stack. But even if that isn't so, it might be worth preventing bad >> things from happening to client applications only sometimes. > > The lore on the internet suggests that multiple errors could definitely > happen. So popping one error afterwards isn't going to fix it, it just > moves the edge case around. Are you sure, specifically, that an I/O function is known to add more than one error to the per-thread queue? Obviously there can be more than one error in the queue. But I haven't seen any specific indication, either in the docs or in the lore, that more than one error can be added by a single call to an I/O function such as SSL_read(). Perhaps you can share where you encountered the lore. >> doesn't it give you pause? Heikki seemed to think that clearing our >> own queue was important when he looked at this a year ago: >> >> http://www.postgresql.org/message-id/54EDD30D.5050107@vmware.com > > I think that message suggests that we should clear the queue before each > call, not after. Uh, it very clearly *is* Heikki's view that we should clear the queue *afterwards*. Certainly, I think Heikki also wanted us to clear the queue before, so we aren't screwed, "just to be sure", as he puts it -- but nobody disputes that that's necessary anyway. That it might not be *sufficient* to just call ERR_get_error() is the new information in the bug report. Heikki said: """ <pedantic> The OpenSSL manual doesn't directly require you to call ERR_clear_error() before every SSL_* call. It just requires that you ensure that the error queue is empty. Libpq ensures that by always clearing the queue *after* an error happens, in SSLerrmessage(). </pedantic> """ The problem with this, as Heikki goes on to say, it that we might not get to that point in SSLerrmessage(); we may not be able to pop the queue/call ERR_get_error(), more or less by accident (e.g. I noticed an OOM could do that). That's why I proposed to fix that by calling ERR_get_error() early and unambiguously. If we must rely on that happening, it should not be from such a long distance (i.e. from within SSLerrmessage(), which is kind of far removed from the original I/O function calls). Thanks -- Peter Geoghegan
pgsql-hackers by date: