Thread: CC_send_query_append crash

CC_send_query_append crash

From
Malcolm MacLeod
Date:
Hello,

While stress testing our software to handle network
reconnects/disconnects I have run into a crash in your ODBC driver.

The crash seems to occur because CC_send_query_append crash takes a
local copy of the pointer 'self->sock' at the top of the function,
'self' is then passed around to various functions (some of which have
the side effect of setting self->sock to NULL (and deleting) if there is
a lost connection) and then the local copy of the pointer (which is now
dangling) is dereferenced lower down in the function.
Essentially if there is a disconnect while CC_send_query_append is
running there is a risk of crash.

I hope that you can fix this soon.

Regards,
Malcolm MacLeod


Attachment

Re: CC_send_query_append crash

From
Michael Paquier
Date:
On Wed, Apr 30, 2014 at 1:44 AM, Malcolm MacLeod
<malcolm.macleod@tshwanedje.com> wrote:
> The crash seems to occur because CC_send_query_append crash takes a
> local copy of the pointer 'self->sock' at the top of the function,
> 'self' is then passed around to various functions (some of which have
> the side effect of setting self->sock to NULL (and deleting) if there is
> a lost connection) and then the local copy of the pointer (which is now
> dangling) is dereferenced lower down in the function.
> Essentially if there is a disconnect while CC_send_query_append is
> running there is a risk of crash.
Looking at the code, I am seeing that the problem is related to
CC_on_abort where conn->sock is set to NULL when the connection is
considered as dead. And I am indeed seeing two code paths (when
sending the 'C' message there is an ABORT check and in cleanup
section) that could use this NULL socket afterwards. Your patch is
perhaps a bit too much. So I am proposing the attached patch instead.
Let me know if this fixes your issue as well.
Regards,
--
Michael

Attachment

Re: CC_send_query_append crash

From
Malcolm MacLeod
Date:
> <malcolm.macleod@tshwanedje.com> wrote:
> > The crash seems to occur because CC_send_query_append crash takes a
> > local copy of the pointer 'self->sock' at the top of the function,
> > 'self' is then passed around to various functions (some of which have
> > the side effect of setting self->sock to NULL (and deleting) if there is
> > a lost connection) and then the local copy of the pointer (which is now
> > dangling) is dereferenced lower down in the function.
> > Essentially if there is a disconnect while CC_send_query_append is
> > running there is a risk of crash.
> Looking at the code, I am seeing that the problem is related to
> CC_on_abort where conn->sock is set to NULL when the connection is
> considered as dead. And I am indeed seeing two code paths (when
> sending the 'C' message there is an ABORT check and in cleanup
> section) that could use this NULL socket afterwards. Your patch is
> perhaps a bit too much. So I am proposing the attached patch instead.
> Let me know if this fixes your issue as well.

Thanks for the fast response!
Your proposed patch would also fix the issue, so I have no problem with
it being used instead.

I guess from my side I just don't personally understand the point of
keeping the local pointer copy at all (it just seems like an invitation
for this sort of thing to occur) - so it made more sense to me to remove
it entirely to prevent future occurrences of similar issues - although I
suppose also the less code disturbed the better. I am not overly
familiar with the code so can't say what is best.

Thanks,
Malcolm




Re: CC_send_query_append crash

From
Heikki Linnakangas
Date:
On 05/01/2014 02:47 PM, Malcolm MacLeod wrote:
>
>> <malcolm.macleod@tshwanedje.com> wrote:
>>> The crash seems to occur because CC_send_query_append crash takes a
>>> local copy of the pointer 'self->sock' at the top of the function,
>>> 'self' is then passed around to various functions (some of which have
>>> the side effect of setting self->sock to NULL (and deleting) if there is
>>> a lost connection) and then the local copy of the pointer (which is now
>>> dangling) is dereferenced lower down in the function.
>>> Essentially if there is a disconnect while CC_send_query_append is
>>> running there is a risk of crash.
>> Looking at the code, I am seeing that the problem is related to
>> CC_on_abort where conn->sock is set to NULL when the connection is
>> considered as dead. And I am indeed seeing two code paths (when
>> sending the 'C' message there is an ABORT check and in cleanup
>> section) that could use this NULL socket afterwards. Your patch is
>> perhaps a bit too much. So I am proposing the attached patch instead.
>> Let me know if this fixes your issue as well.
>
> Thanks for the fast response!
> Your proposed patch would also fix the issue, so I have no problem with
> it being used instead.
>
> I guess from my side I just don't personally understand the point of
> keeping the local pointer copy at all (it just seems like an invitation
> for this sort of thing to occur) - so it made more sense to me to remove
> it entirely to prevent future occurrences of similar issues - although I
> suppose also the less code disturbed the better. I am not overly
> familiar with the code so can't say what is best.

I agree that keeping the local pointer copy is an accident waiting to
happen. Michael's more surgical approach would make sense if we had to
carefully track the places where self->sock might become NULL. But we
don't, as all the SOCK_* macros and functions contain NULL checks.

I think the extra null check that both patch versions added to the
cleanup code path is also unnecessary and wrong. SOCK_get_errcode()
returns SOCKET_CLOSED error if the argument is null, which will cause
the connection to be marked as dead, which is the right thing to do.
However, when I removed that check, the "diagnostics" regression test
started to fail. What happens is that when the socket is closed while
processing results in CC_send_query_append, the incomplete QResult
object is discarded, even though it contained a useful error message. So
we lose some information that way. But that in turn is pretty simple to
fix: if we don't set ReadyToReturn in the cleanup routine, then the
partial QResult object is returned again.

Looking closer at the loop in CC_send_query_append(), I think it's
basically wrong to set ReadyToReturn when an error occurs. ReadyToReturn
indicates that the whole result set has been processed, and we're ready
to send a new query. In many of the error conditions where we currently
set ReadyToReturn, we're clearly not ready to process the next query.
For example, in the 'T' case, if QR_Constructor() fails because of
out-of-memory, we set ReadyToReturn=true and just bail out of the loop.
That's wrong - we must process the rest of the result set before returning.

Anyway, I committed your original patch with minor changes, which fixes
your original problem (commit 22b151e). But the error handling in that
function could use some more work...

- Heikki


Re: CC_send_query_append crash

From
Malcolm MacLeod
Date:
On Mon, 2014-05-05 at 11:07 +0300, Heikki Linnakangas wrote:
> On 05/01/2014 02:47 PM, Malcolm MacLeod wrote:
> >
> >> <malcolm.macleod@tshwanedje.com> wrote:
> >>> The crash seems to occur because CC_send_query_append crash takes a
> >>> local copy of the pointer 'self->sock' at the top of the function,
> >>> 'self' is then passed around to various functions (some of which have
> >>> the side effect of setting self->sock to NULL (and deleting) if there is
> >>> a lost connection) and then the local copy of the pointer (which is now
> >>> dangling) is dereferenced lower down in the function.
> >>> Essentially if there is a disconnect while CC_send_query_append is
> >>> running there is a risk of crash.
> >> Looking at the code, I am seeing that the problem is related to
> >> CC_on_abort where conn->sock is set to NULL when the connection is
> >> considered as dead. And I am indeed seeing two code paths (when
> >> sending the 'C' message there is an ABORT check and in cleanup
> >> section) that could use this NULL socket afterwards. Your patch is
> >> perhaps a bit too much. So I am proposing the attached patch instead.
> >> Let me know if this fixes your issue as well.
> >
> > Thanks for the fast response!
> > Your proposed patch would also fix the issue, so I have no problem with
> > it being used instead.
> >
> > I guess from my side I just don't personally understand the point of
> > keeping the local pointer copy at all (it just seems like an invitation
> > for this sort of thing to occur) - so it made more sense to me to remove
> > it entirely to prevent future occurrences of similar issues - although I
> > suppose also the less code disturbed the better. I am not overly
> > familiar with the code so can't say what is best.
>
> I agree that keeping the local pointer copy is an accident waiting to
> happen. Michael's more surgical approach would make sense if we had to
> carefully track the places where self->sock might become NULL. But we
> don't, as all the SOCK_* macros and functions contain NULL checks.
>
> I think the extra null check that both patch versions added to the
> cleanup code path is also unnecessary and wrong. SOCK_get_errcode()
> returns SOCKET_CLOSED error if the argument is null, which will cause
> the connection to be marked as dead, which is the right thing to do.
> However, when I removed that check, the "diagnostics" regression test
> started to fail. What happens is that when the socket is closed while
> processing results in CC_send_query_append, the incomplete QResult
> object is discarded, even though it contained a useful error message. So
> we lose some information that way. But that in turn is pretty simple to
> fix: if we don't set ReadyToReturn in the cleanup routine, then the
> partial QResult object is returned again.
>
> Looking closer at the loop in CC_send_query_append(), I think it's
> basically wrong to set ReadyToReturn when an error occurs. ReadyToReturn
> indicates that the whole result set has been processed, and we're ready
> to send a new query. In many of the error conditions where we currently
> set ReadyToReturn, we're clearly not ready to process the next query.
> For example, in the 'T' case, if QR_Constructor() fails because of
> out-of-memory, we set ReadyToReturn=true and just bail out of the loop.
> That's wrong - we must process the rest of the result set before returning.
>
> Anyway, I committed your original patch with minor changes, which fixes
> your original problem (commit 22b151e). But the error handling in that
> function could use some more work...

Thanks! Will get a copy of the latest code and use it for testing.
Unfortunately I don't have the time to get familiar with the code myself
or I would try help a bit more. I will throw our stress tests at the
latest code and see if anything else turns up though.

Any idea on when the next official psqlodbc build is likely to take
place? Would be nice to know for planning purposes so we can get our
clients onto a fixed version.

- Malcolm MacLeod



Re: CC_send_query_append crash

From
Heikki Linnakangas
Date:
On 05/07/2014 05:37 PM, Malcolm MacLeod wrote:
> Any idea on when the next official psqlodbc build is likely to take
> place? Would be nice to know for planning purposes so we can get our
> clients onto a fixed version.

CC'ing Hiroshi Saito. He has usually prepared the releases, I don't know
if he has any imminent plans.

There have been a few bug fixes since last release, including this one,
so it would be nice to get a release out. Then again, none of the bugs
fixed have been very serious, so there's no big hurry.

- Heikki


Re: CC_send_query_append crash

From
Alvaro Herrera
Date:
Heikki Linnakangas wrote:
> On 05/07/2014 05:37 PM, Malcolm MacLeod wrote:
> >Any idea on when the next official psqlodbc build is likely to take
> >place? Would be nice to know for planning purposes so we can get our
> >clients onto a fixed version.
>
> CC'ing Hiroshi Saito. He has usually prepared the releases, I don't
> know if he has any imminent plans.
>
> There have been a few bug fixes since last release, including this
> one, so it would be nice to get a release out. Then again, none of
> the bugs fixed have been very serious, so there's no big hurry.

Can I direct your attention towards
http://www.postgresql.org/message-id/20140318183906.GW6899@eldon.alvh.no-ip.org
?  I'm not in a position to produce a tested patch, but it should be
fairly trivial in nature.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services