Thread: Incorrect allocation handling for cryptohash functions with OpenSSL

Incorrect allocation handling for cryptohash functions with OpenSSL

From
Michael Paquier
Date:
Hi all,

As of the work done in 87ae9691, I have played with error injections
in the code paths using this code, but forgot to count for cases where
cascading resowner cleanups are involved.  Like other resources (JIT,
DSM, etc.), this requires an allocation in TopMemoryContext to make
sure that nothing gets forgotten or cleaned up on the way until the
resowner that did the cryptohash allocation is handled.

Attached is a small extension I have played with by doing some error
injections, and a patch.  If there are no objections, I would like to
commit this fix.

Thanks,
--
Michael

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Heikki Linnakangas
Date:
On 18/12/2020 09:35, Michael Paquier wrote:
> Hi all,
> 
> As of the work done in 87ae9691, I have played with error injections
> in the code paths using this code, but forgot to count for cases where
> cascading resowner cleanups are involved.  Like other resources (JIT,
> DSM, etc.), this requires an allocation in TopMemoryContext to make
> sure that nothing gets forgotten or cleaned up on the way until the
> resowner that did the cryptohash allocation is handled.
> 
> Attached is a small extension I have played with by doing some error
> injections, and a patch.  If there are no objections, I would like to
> commit this fix.

pg_cryptohash_create() is now susceptible to leaking memory in 
TopMemoryContext, if the allocations fail. I think the attached should 
fix it (but I haven't tested it at all).

BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we 
need two structs? They're both allocated and controlled by the 
cryptohash implementation. It would seem simpler to have just one.

- Heikki

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Heikki Linnakangas
Date:
On 18/12/2020 11:35, Heikki Linnakangas wrote:
> BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we
> need two structs? They're both allocated and controlled by the
> cryptohash implementation. It would seem simpler to have just one.

Something like this. Passes regression tests, but otherwise untested.

- Heikki

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Michael Paquier
Date:
On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote:
> pg_cryptohash_create() is now susceptible to leaking memory in
> TopMemoryContext, if the allocations fail. I think the attached should fix
> it (but I haven't tested it at all).

Yeah, you are right here.  If the second allocation fails the first
one would leak.  I don't think that your suggested fix is completely
right though because it ignores that the callers of
pg_cryptohash_create() in the backend expect an error all the time, so
it could crash.  Perhaps we should just bite the bullet and switch the
OpenSSL and fallback implementations to use allocation APIs that never
cause an error, and always return NULL?  That would have the advantage
to be more consistent with the frontend that relies in malloc(), at
the cost of requiring more changes for the backend code where the
_create() call would need to handle the NULL case properly.  The
backend calls are already aware of errors so that would not be
invasive except for the addition of some elog(ERROR) or similar, and
we could change the fallback implementation to use palloc_extended()
with MCXT_ALLOC_NO_OOM.

> BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we need
> two structs? They're both allocated and controlled by the cryptohash
> implementation. It would seem simpler to have just one.

Depending on the implementation, the data to track can be completely
different, and this split allows to know about the resowner dependency
only in the OpenSSL part of cryptohashes, without having to include
this knowledge in neither cryptohash.h nor in the fallback
implementation that can just use palloc() in the backend.
--
Michael

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Michael Paquier
Date:
On Fri, Dec 18, 2020 at 11:51:55AM +0200, Heikki Linnakangas wrote:
> Something like this. Passes regression tests, but otherwise untested.

...  And I wanted to keep track of the type of cryptohash directly in
the shared structure.  ;)
--
Michael

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Heikki Linnakangas
Date:
On 18/12/2020 12:10, Michael Paquier wrote:
> On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote:
>> pg_cryptohash_create() is now susceptible to leaking memory in
>> TopMemoryContext, if the allocations fail. I think the attached should fix
>> it (but I haven't tested it at all).
> 
> Yeah, you are right here.  If the second allocation fails the first
> one would leak.  I don't think that your suggested fix is completely
> right though because it ignores that the callers of
> pg_cryptohash_create() in the backend expect an error all the time, so
> it could crash.

Ah, right.

> Perhaps we should just bite the bullet and switch the
> OpenSSL and fallback implementations to use allocation APIs that never
> cause an error, and always return NULL?  That would have the advantage
> to be more consistent with the frontend that relies in malloc(), at
> the cost of requiring more changes for the backend code where the
> _create() call would need to handle the NULL case properly.  The
> backend calls are already aware of errors so that would not be
> invasive except for the addition of some elog(ERROR) or similar, and
> we could change the fallback implementation to use palloc_extended()
> with MCXT_ALLOC_NO_OOM.

-1. On the contrary, I think we should reduce the number of checks 
needed in the callers, and prefer throwing errors, if possible. It's too 
easy to forget the check, and it makes the code more verbose, too.

In fact, it might be better if pg_cryptohash_init() and 
pg_cryptohash_update() didn't return errors either. If an error happens, 
they could just set a flag in the pg_cryptohash_ctx, and 
pg_cryptohash_final() function would return the error. That way, you 
would only need to check for error return in the call to 
pg_cryptohash_final().

>> BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we need
>> two structs? They're both allocated and controlled by the cryptohash
>> implementation. It would seem simpler to have just one.
> 
> Depending on the implementation, the data to track can be completely
> different, and this split allows to know about the resowner dependency
> only in the OpenSSL part of cryptohashes, without having to include
> this knowledge in neither cryptohash.h nor in the fallback
> implementation that can just use palloc() in the backend.

> ...  And I wanted to keep track of the type of cryptohash directly in
> the shared structure.  ;)

You could also define a shared header, with the rest of the struct being 
implementation-specific:

typedef struct pg_cryptohash_ctx
{
    pg_cryptohash_type type;

    /* implementation-specific data follows */
} pg_cryptohash_ctx;

- Heikki



Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Heikki Linnakangas
Date:
On 18/12/2020 12:55, Heikki Linnakangas wrote:
> On 18/12/2020 12:10, Michael Paquier wrote:
>> On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote:
>>> pg_cryptohash_create() is now susceptible to leaking memory in
>>> TopMemoryContext, if the allocations fail. I think the attached should fix
>>> it (but I haven't tested it at all).
>>
>> Yeah, you are right here.  If the second allocation fails the first
>> one would leak.  I don't think that your suggested fix is completely
>> right though because it ignores that the callers of
>> pg_cryptohash_create() in the backend expect an error all the time, so
>> it could crash.
> 
> Ah, right.
> 
>> Perhaps we should just bite the bullet and switch the
>> OpenSSL and fallback implementations to use allocation APIs that never
>> cause an error, and always return NULL?  That would have the advantage
>> to be more consistent with the frontend that relies in malloc(), at
>> the cost of requiring more changes for the backend code where the
>> _create() call would need to handle the NULL case properly.  The
>> backend calls are already aware of errors so that would not be
>> invasive except for the addition of some elog(ERROR) or similar, and
>> we could change the fallback implementation to use palloc_extended()
>> with MCXT_ALLOC_NO_OOM.
> 
> -1. On the contrary, I think we should reduce the number of checks
> needed in the callers, and prefer throwing errors, if possible. It's too
> easy to forget the check, and it makes the code more verbose, too.
> 
> In fact, it might be better if pg_cryptohash_init() and
> pg_cryptohash_update() didn't return errors either. If an error happens,
> they could just set a flag in the pg_cryptohash_ctx, and
> pg_cryptohash_final() function would return the error. That way, you
> would only need to check for error return in the call to
> pg_cryptohash_final().

BTW, it's a bit weird that the pg_cryptohash_init/update/final() 
functions return success, if the ctx argument is NULL. It would seem 
more sensible for them to return an error. That way, if a caller forgets 
to check for NULL result from pg_cryptohash_create(), but correctly 
checks the result of those other functions, it would catch the error. In 
fact, if we documented that pg_cryptohash_create() can return NULL, and 
pg_cryptohash_final() always returns error on NULL argument, then it 
would be sufficient for the callers to only check the return value of 
pg_cryptohash_final(). So the usage pattern would be:

ctx = pg_cryptohash_create(PG_MD5);
pg_cryptohash_inti(ctx);
pg_update(ctx, data, size);
pg_update(ctx, moredata, size);
if (pg_cryptohash_final(ctx, &hash) < 0)
     elog(ERROR, "md5 calculation failed");
pg_cryptohash_free(ctx);

- Heikki



Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Michael Paquier
Date:
On Fri, Dec 18, 2020 at 01:04:27PM +0200, Heikki Linnakangas wrote:
> BTW, it's a bit weird that the pg_cryptohash_init/update/final() functions
> return success, if the ctx argument is NULL. It would seem more sensible for
> them to return an error.

Okay.

> That way, if a caller forgets to check for NULL
> result from pg_cryptohash_create(), but correctly checks the result of those
> other functions, it would catch the error. In fact, if we documented that
> pg_cryptohash_create() can return NULL, and pg_cryptohash_final() always
> returns error on NULL argument, then it would be sufficient for the callers
> to only check the return value of pg_cryptohash_final(). So the usage
> pattern would be:
>
> ctx = pg_cryptohash_create(PG_MD5);
> pg_cryptohash_inti(ctx);
> pg_update(ctx, data, size);
> pg_update(ctx, moredata, size);
> if (pg_cryptohash_final(ctx, &hash) < 0)
>     elog(ERROR, "md5 calculation failed");
> pg_cryptohash_free(ctx);

I'd rather keep the init and update routines to return an error code
directly.  This is more consistent with OpenSSL (note that libnss does
not return error codes for the init, update and final but it is
possible to grab for errors then react on that).  And we have even in
our tree code paths a-la-pgcrypto that have callbacks for each phase
with some processing in-between.  HMAC also gets a bit cleaner by
keeping this flexibility IMO.
--
Michael

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Michael Paquier
Date:
On Fri, Dec 18, 2020 at 11:51:55AM +0200, Heikki Linnakangas wrote:
> On 18/12/2020 11:35, Heikki Linnakangas wrote:
> > BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we
> > need two structs? They're both allocated and controlled by the
> > cryptohash implementation. It would seem simpler to have just one.
>
> Something like this. Passes regression tests, but otherwise untested.

Interesting.  I have looked at that with a fresh mind, thanks for the
idea.  This reduces the number of allocations to one making the error
handling a no-brainer, at the cost of hiding the cryptohash type
directly to the caller.  I originally thought that this would be
useful as I recall reading cases in the OpenSSL code doing checks on
hash type used, but perhaps that's just some over-engineered thoughts
from my side.  I have found a couple of small-ish issues, please see
my comments below.

+   /*
+    * FIXME: this always allocates enough space for the largest hash.
+    * A smaller allocation would be enough for md5, sha224 and sha256.
+    */
I am not sure that this is worth complicating more, and we are not
talking about a lot of memory (sha512 requires 208 bytes, sha224/256
104 bytes, md5 96 bytes with a quick measurement).  This makes free()
equally more simple.  So just allocating the amount of memory based on
the max size in the union looks fine to me.  I would add a memset(0)
after this allocation though.

-#define ALLOC(size) palloc(size)
+#define ALLOC(size) MemoryContextAllocExtended(TopMemoryContext, size, MCXT_ALLOC_NO_OOM)
As the only allocation in TopMemoryContext is for the context, it
would be fine to not use MCXT_ALLOC_NO_OOM here, and fail so as
callers in the backend don't need to worry about create() returning
NULL.

-       state->evpctx = EVP_MD_CTX_create();
+       ctx->evpctx = EVP_MD_CTX_create();

-       if (state->evpctx == NULL)
+       if (ctx->evpctx == NULL)
        {
If EVP_MD_CTX_create() fails, you would leak memory with the context
allocated in TopMemoryContext.  So this requires a free of the context
before the elog(ERROR).

+   /*
+    * Make sure that the resource owner has space to remember this
+    * reference. This can error out with "out of memory", so do this
+    * before any other allocation to avoid leaking.
+    */
 #ifndef FRONTEND
     ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner);
 #endif
Right.  Good point.

-       /* OpenSSL internals return 1 on success, 0 on failure */
+       /* openssl internals return 1 on success, 0 on failure */
It seems to me that this was not wanted.

At the same time, I have taken care of your comment from upthread to
return a failure if the caller passes NULL for the context, and
adjusted some comments.  What do you think of the attached?
--
Michael

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Robert Haas
Date:
On Fri, Dec 18, 2020 at 6:04 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> BTW, it's a bit weird that the pg_cryptohash_init/update/final()
> functions return success, if the ctx argument is NULL. It would seem
> more sensible for them to return an error. That way, if a caller forgets
> to check for NULL result from pg_cryptohash_create(), but correctly
> checks the result of those other functions, it would catch the error. In
> fact, if we documented that pg_cryptohash_create() can return NULL, and
> pg_cryptohash_final() always returns error on NULL argument, then it
> would be sufficient for the callers to only check the return value of
> pg_cryptohash_final(). So the usage pattern would be:
>
> ctx = pg_cryptohash_create(PG_MD5);
> pg_cryptohash_inti(ctx);
> pg_update(ctx, data, size);
> pg_update(ctx, moredata, size);
> if (pg_cryptohash_final(ctx, &hash) < 0)
>      elog(ERROR, "md5 calculation failed");
> pg_cryptohash_free(ctx);

TBH, I think there's no point in return an error here at all, because
it's totally non-specific. You have no idea what failed, just that
something failed. Blech. If we want to check that ctx is non-NULL, we
should do that with an Assert(). Complicating the code with error
checks that have to be added in multiple places that are far removed
from where the actual problem was detected stinks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Michael Paquier
Date:
On Mon, Dec 21, 2020 at 04:28:26PM -0500, Robert Haas wrote:
> TBH, I think there's no point in return an error here at all, because
> it's totally non-specific. You have no idea what failed, just that
> something failed. Blech. If we want to check that ctx is non-NULL, we
> should do that with an Assert(). Complicating the code with error
> checks that have to be added in multiple places that are far removed
> from where the actual problem was detected stinks.

You could technically do that, but only for the backend at the cost of
painting the code of src/common/ with more #ifdef FRONTEND.  Even if
we do that, enforcing an error in the backend could be a problem when
it comes to some code paths.  One of them is the SCRAM mock
authentication where we had better generate a generic error message.
Using an Assert() or just letting the code go through is not good
either, as we should avoid incorrect computations or crash on OOM, not
to mention that this would fail the detection of bugs coming directly
from OpenSSL or any other SSL library this code plugs with.  In short,
I think that there are more benefits in letting the caller control the
error handling.
--
Michael

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Michael Paquier
Date:
On Sat, Dec 19, 2020 at 03:13:50PM +0900, Michael Paquier wrote:
> At the same time, I have taken care of your comment from upthread to
> return a failure if the caller passes NULL for the context, and
> adjusted some comments.  What do you think of the attached?

I have looked again at this thread with a fresher mind and I did not
see a problem with the previous patch, except some indentation
issues.  So if there are no objections, I'd like to commit the
attached.
--
Michael

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Heikki Linnakangas
Date:
On 06/01/2021 13:42, Michael Paquier wrote:
> On Sat, Dec 19, 2020 at 03:13:50PM +0900, Michael Paquier wrote:
>> At the same time, I have taken care of your comment from upthread to
>> return a failure if the caller passes NULL for the context, and
>> adjusted some comments.  What do you think of the attached?
> 
> I have looked again at this thread with a fresher mind and I did not
> see a problem with the previous patch, except some indentation
> issues.  So if there are no objections, I'd like to commit the
> attached.

Looks fine to me.

- Heikki



Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Heikki Linnakangas
Date:
On 25/12/2020 02:57, Michael Paquier wrote:
> On Mon, Dec 21, 2020 at 04:28:26PM -0500, Robert Haas wrote:
>> TBH, I think there's no point in return an error here at all, because
>> it's totally non-specific. You have no idea what failed, just that
>> something failed. Blech. If we want to check that ctx is non-NULL, we
>> should do that with an Assert(). Complicating the code with error
>> checks that have to be added in multiple places that are far removed
>> from where the actual problem was detected stinks.
> 
> You could technically do that, but only for the backend at the cost of
> painting the code of src/common/ with more #ifdef FRONTEND.  Even if
> we do that, enforcing an error in the backend could be a problem when
> it comes to some code paths.

Yeah, you would still need to remember to check for the error in 
frontend code. Maybe it would still be a good idea, not sure. It would 
be a nice backstop, if you forget to check for the error.

I had a quick look at the callers:

contrib/pgcrypto/internal-sha2.c and 
src/backend/utils/adt/cryptohashfuncs.c: the call to 
pg_cryptohash_create() is missing check for NULL result. With your 
latest patch, that's OK because the subsequent pg_cryptohash_update() 
calls will fail if passed a NULL context. But seems sloppy.

contrib/pgcrypto/internal.c: all the calls to pg_cryptohash_* functions 
are missing checks for error return codes.

contrib/uuid-ossp/uuid-ossp.c: uses pg_cryptohash for MD5, but borrows 
the built-in implementation of SHA1 on some platforms. Should we add 
support for SHA1 in pg_cryptohash and use that for consistency?

src/backend/libpq/auth-scram.c: SHA256 is used in the mock 
authentication. If the pg_cryptohash functions fail, we throw a distinct 
error "could not encode salt" that reveals that it was a mock 
authentication. I don't think this is a big deal in practice, it would 
be hard for an attacker to induce the SHA256 computation to fail, and 
there are probably better ways to distinguish mock authentication from 
real, like timing attacks. But still.

src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we 
still need separate fields for the different sha variants.

- Heikki



Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Michael Paquier
Date:
On Wed, Jan 06, 2021 at 03:27:03PM +0200, Heikki Linnakangas wrote:
> Looks fine to me.

Thanks, I have been able to get this part done as of 55fe26a.
--
Michael

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Michael Paquier
Date:
On Wed, Jan 06, 2021 at 03:58:22PM +0200, Heikki Linnakangas wrote:
> contrib/pgcrypto/internal-sha2.c and
> src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create()
> is missing check for NULL result. With your latest patch, that's OK because
> the subsequent pg_cryptohash_update() calls will fail if passed a NULL
> context. But seems sloppy.

Is it?  pg_cryptohash_create() will never return NULL for the backend.

> contrib/pgcrypto/internal.c: all the calls to pg_cryptohash_* functions are
> missing checks for error return codes.

Indeed, these are incorrect, thanks.  I'll go fix that separately.

> contrib/uuid-ossp/uuid-ossp.c: uses pg_cryptohash for MD5, but borrows the
> built-in implementation of SHA1 on some platforms. Should we add support for
> SHA1 in pg_cryptohash and use that for consistency?

Yeah, I have sent a separate patch for that:
https://commitfest.postgresql.org/31/2868/
The cleanups produced by this patch are kind of nice.

> src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication.
> If the pg_cryptohash functions fail, we throw a distinct error "could not
> encode salt" that reveals that it was a mock authentication. I don't think
> this is a big deal in practice, it would be hard for an attacker to induce
> the SHA256 computation to fail, and there are probably better ways to
> distinguish mock authentication from real, like timing attacks. But still.

This maps with the second error in the mock routine, so wouldn't it be
better to change both and back-patch?  The last place where this error
message is used is pg_be_scram_build_secret() for the generation of
what's stored in pg_authid.  An idea would be to use "out of memory".
That can be faced for any palloc() calls.

> src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we
> still need separate fields for the different sha variants.

Using separate fields looked cleaner to me if it came to debugging,
and the cleanup was rather minimal except if we use more switch/case
to set up the various variables.  What about something like the
attached?
--
Michael

Attachment

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Heikki Linnakangas
Date:
On 07/01/2021 06:15, Michael Paquier wrote:
> On Wed, Jan 06, 2021 at 03:58:22PM +0200, Heikki Linnakangas wrote:
>> contrib/pgcrypto/internal-sha2.c and
>> src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create()
>> is missing check for NULL result. With your latest patch, that's OK because
>> the subsequent pg_cryptohash_update() calls will fail if passed a NULL
>> context. But seems sloppy.
> 
> Is it?  pg_cryptohash_create() will never return NULL for the backend.

Ah, you're right.

>> src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication.
>> If the pg_cryptohash functions fail, we throw a distinct error "could not
>> encode salt" that reveals that it was a mock authentication. I don't think
>> this is a big deal in practice, it would be hard for an attacker to induce
>> the SHA256 computation to fail, and there are probably better ways to
>> distinguish mock authentication from real, like timing attacks. But still.
> 
> This maps with the second error in the mock routine, so wouldn't it be
> better to change both and back-patch?  The last place where this error
> message is used is pg_be_scram_build_secret() for the generation of
> what's stored in pg_authid.  An idea would be to use "out of memory".
> That can be faced for any palloc() calls.

Hmm. Perhaps it would be best to change all the errors in mock 
authentication to COMMERROR. It'd be nice to have an accurate error 
message in the log, but no need to send it to the client.

>> src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we
>> still need separate fields for the different sha variants.
> 
> Using separate fields looked cleaner to me if it came to debugging,
> and the cleanup was rather minimal except if we use more switch/case
> to set up the various variables.  What about something like the
> attached?

+1

- Heikki



Re: Incorrect allocation handling for cryptohash functions with OpenSSL

From
Michael Paquier
Date:
On Thu, Jan 07, 2021 at 09:51:00AM +0200, Heikki Linnakangas wrote:
> Hmm. Perhaps it would be best to change all the errors in mock
> authentication to COMMERROR. It'd be nice to have an accurate error message
> in the log, but no need to send it to the client.

Yeah, we could do that.  Still, this mode still requires a hard
failure because COMMERROR is just a log, and if only COMMERROR is done
we still expect a salt to be generated to send a challenge back to the
client, which would require a fallback for the salt if the one
generated from the mock nonce cannot.  Need to think more about that.

>> Using separate fields looked cleaner to me if it came to debugging,
>> and the cleanup was rather minimal except if we use more switch/case
>> to set up the various variables.  What about something like the
>> attached?
>
> +1

Thanks, I have committed this part.
--
Michael

Attachment