Thread: Incorrect allocation handling for cryptohash functions with OpenSSL
Hi all, As of the work done in 87ae9691, I have played with error injections in the code paths using this code, but forgot to count for cases where cascading resowner cleanups are involved. Like other resources (JIT, DSM, etc.), this requires an allocation in TopMemoryContext to make sure that nothing gets forgotten or cleaned up on the way until the resowner that did the cryptohash allocation is handled. Attached is a small extension I have played with by doing some error injections, and a patch. If there are no objections, I would like to commit this fix. Thanks, -- Michael
Attachment
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
From
Heikki Linnakangas
Date:
On 18/12/2020 09:35, Michael Paquier wrote: > Hi all, > > As of the work done in 87ae9691, I have played with error injections > in the code paths using this code, but forgot to count for cases where > cascading resowner cleanups are involved. Like other resources (JIT, > DSM, etc.), this requires an allocation in TopMemoryContext to make > sure that nothing gets forgotten or cleaned up on the way until the > resowner that did the cryptohash allocation is handled. > > Attached is a small extension I have played with by doing some error > injections, and a patch. If there are no objections, I would like to > commit this fix. pg_cryptohash_create() is now susceptible to leaking memory in TopMemoryContext, if the allocations fail. I think the attached should fix it (but I haven't tested it at all). BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we need two structs? They're both allocated and controlled by the cryptohash implementation. It would seem simpler to have just one. - Heikki
Attachment
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
From
Heikki Linnakangas
Date:
On 18/12/2020 11:35, Heikki Linnakangas wrote: > BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we > need two structs? They're both allocated and controlled by the > cryptohash implementation. It would seem simpler to have just one. Something like this. Passes regression tests, but otherwise untested. - Heikki
Attachment
On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote: > pg_cryptohash_create() is now susceptible to leaking memory in > TopMemoryContext, if the allocations fail. I think the attached should fix > it (but I haven't tested it at all). Yeah, you are right here. If the second allocation fails the first one would leak. I don't think that your suggested fix is completely right though because it ignores that the callers of pg_cryptohash_create() in the backend expect an error all the time, so it could crash. Perhaps we should just bite the bullet and switch the OpenSSL and fallback implementations to use allocation APIs that never cause an error, and always return NULL? That would have the advantage to be more consistent with the frontend that relies in malloc(), at the cost of requiring more changes for the backend code where the _create() call would need to handle the NULL case properly. The backend calls are already aware of errors so that would not be invasive except for the addition of some elog(ERROR) or similar, and we could change the fallback implementation to use palloc_extended() with MCXT_ALLOC_NO_OOM. > BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we need > two structs? They're both allocated and controlled by the cryptohash > implementation. It would seem simpler to have just one. Depending on the implementation, the data to track can be completely different, and this split allows to know about the resowner dependency only in the OpenSSL part of cryptohashes, without having to include this knowledge in neither cryptohash.h nor in the fallback implementation that can just use palloc() in the backend. -- Michael
Attachment
On Fri, Dec 18, 2020 at 11:51:55AM +0200, Heikki Linnakangas wrote: > Something like this. Passes regression tests, but otherwise untested. ... And I wanted to keep track of the type of cryptohash directly in the shared structure. ;) -- Michael
Attachment
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
From
Heikki Linnakangas
Date:
On 18/12/2020 12:10, Michael Paquier wrote: > On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote: >> pg_cryptohash_create() is now susceptible to leaking memory in >> TopMemoryContext, if the allocations fail. I think the attached should fix >> it (but I haven't tested it at all). > > Yeah, you are right here. If the second allocation fails the first > one would leak. I don't think that your suggested fix is completely > right though because it ignores that the callers of > pg_cryptohash_create() in the backend expect an error all the time, so > it could crash. Ah, right. > Perhaps we should just bite the bullet and switch the > OpenSSL and fallback implementations to use allocation APIs that never > cause an error, and always return NULL? That would have the advantage > to be more consistent with the frontend that relies in malloc(), at > the cost of requiring more changes for the backend code where the > _create() call would need to handle the NULL case properly. The > backend calls are already aware of errors so that would not be > invasive except for the addition of some elog(ERROR) or similar, and > we could change the fallback implementation to use palloc_extended() > with MCXT_ALLOC_NO_OOM. -1. On the contrary, I think we should reduce the number of checks needed in the callers, and prefer throwing errors, if possible. It's too easy to forget the check, and it makes the code more verbose, too. In fact, it might be better if pg_cryptohash_init() and pg_cryptohash_update() didn't return errors either. If an error happens, they could just set a flag in the pg_cryptohash_ctx, and pg_cryptohash_final() function would return the error. That way, you would only need to check for error return in the call to pg_cryptohash_final(). >> BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we need >> two structs? They're both allocated and controlled by the cryptohash >> implementation. It would seem simpler to have just one. > > Depending on the implementation, the data to track can be completely > different, and this split allows to know about the resowner dependency > only in the OpenSSL part of cryptohashes, without having to include > this knowledge in neither cryptohash.h nor in the fallback > implementation that can just use palloc() in the backend. > ... And I wanted to keep track of the type of cryptohash directly in > the shared structure. ;) You could also define a shared header, with the rest of the struct being implementation-specific: typedef struct pg_cryptohash_ctx { pg_cryptohash_type type; /* implementation-specific data follows */ } pg_cryptohash_ctx; - Heikki
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
From
Heikki Linnakangas
Date:
On 18/12/2020 12:55, Heikki Linnakangas wrote: > On 18/12/2020 12:10, Michael Paquier wrote: >> On Fri, Dec 18, 2020 at 11:35:14AM +0200, Heikki Linnakangas wrote: >>> pg_cryptohash_create() is now susceptible to leaking memory in >>> TopMemoryContext, if the allocations fail. I think the attached should fix >>> it (but I haven't tested it at all). >> >> Yeah, you are right here. If the second allocation fails the first >> one would leak. I don't think that your suggested fix is completely >> right though because it ignores that the callers of >> pg_cryptohash_create() in the backend expect an error all the time, so >> it could crash. > > Ah, right. > >> Perhaps we should just bite the bullet and switch the >> OpenSSL and fallback implementations to use allocation APIs that never >> cause an error, and always return NULL? That would have the advantage >> to be more consistent with the frontend that relies in malloc(), at >> the cost of requiring more changes for the backend code where the >> _create() call would need to handle the NULL case properly. The >> backend calls are already aware of errors so that would not be >> invasive except for the addition of some elog(ERROR) or similar, and >> we could change the fallback implementation to use palloc_extended() >> with MCXT_ALLOC_NO_OOM. > > -1. On the contrary, I think we should reduce the number of checks > needed in the callers, and prefer throwing errors, if possible. It's too > easy to forget the check, and it makes the code more verbose, too. > > In fact, it might be better if pg_cryptohash_init() and > pg_cryptohash_update() didn't return errors either. If an error happens, > they could just set a flag in the pg_cryptohash_ctx, and > pg_cryptohash_final() function would return the error. That way, you > would only need to check for error return in the call to > pg_cryptohash_final(). BTW, it's a bit weird that the pg_cryptohash_init/update/final() functions return success, if the ctx argument is NULL. It would seem more sensible for them to return an error. That way, if a caller forgets to check for NULL result from pg_cryptohash_create(), but correctly checks the result of those other functions, it would catch the error. In fact, if we documented that pg_cryptohash_create() can return NULL, and pg_cryptohash_final() always returns error on NULL argument, then it would be sufficient for the callers to only check the return value of pg_cryptohash_final(). So the usage pattern would be: ctx = pg_cryptohash_create(PG_MD5); pg_cryptohash_inti(ctx); pg_update(ctx, data, size); pg_update(ctx, moredata, size); if (pg_cryptohash_final(ctx, &hash) < 0) elog(ERROR, "md5 calculation failed"); pg_cryptohash_free(ctx); - Heikki
On Fri, Dec 18, 2020 at 01:04:27PM +0200, Heikki Linnakangas wrote: > BTW, it's a bit weird that the pg_cryptohash_init/update/final() functions > return success, if the ctx argument is NULL. It would seem more sensible for > them to return an error. Okay. > That way, if a caller forgets to check for NULL > result from pg_cryptohash_create(), but correctly checks the result of those > other functions, it would catch the error. In fact, if we documented that > pg_cryptohash_create() can return NULL, and pg_cryptohash_final() always > returns error on NULL argument, then it would be sufficient for the callers > to only check the return value of pg_cryptohash_final(). So the usage > pattern would be: > > ctx = pg_cryptohash_create(PG_MD5); > pg_cryptohash_inti(ctx); > pg_update(ctx, data, size); > pg_update(ctx, moredata, size); > if (pg_cryptohash_final(ctx, &hash) < 0) > elog(ERROR, "md5 calculation failed"); > pg_cryptohash_free(ctx); I'd rather keep the init and update routines to return an error code directly. This is more consistent with OpenSSL (note that libnss does not return error codes for the init, update and final but it is possible to grab for errors then react on that). And we have even in our tree code paths a-la-pgcrypto that have callbacks for each phase with some processing in-between. HMAC also gets a bit cleaner by keeping this flexibility IMO. -- Michael
Attachment
On Fri, Dec 18, 2020 at 11:51:55AM +0200, Heikki Linnakangas wrote: > On 18/12/2020 11:35, Heikki Linnakangas wrote: > > BTW, looking at pg_cryptohash_ctx and pg_cryptohash_state, why do we > > need two structs? They're both allocated and controlled by the > > cryptohash implementation. It would seem simpler to have just one. > > Something like this. Passes regression tests, but otherwise untested. Interesting. I have looked at that with a fresh mind, thanks for the idea. This reduces the number of allocations to one making the error handling a no-brainer, at the cost of hiding the cryptohash type directly to the caller. I originally thought that this would be useful as I recall reading cases in the OpenSSL code doing checks on hash type used, but perhaps that's just some over-engineered thoughts from my side. I have found a couple of small-ish issues, please see my comments below. + /* + * FIXME: this always allocates enough space for the largest hash. + * A smaller allocation would be enough for md5, sha224 and sha256. + */ I am not sure that this is worth complicating more, and we are not talking about a lot of memory (sha512 requires 208 bytes, sha224/256 104 bytes, md5 96 bytes with a quick measurement). This makes free() equally more simple. So just allocating the amount of memory based on the max size in the union looks fine to me. I would add a memset(0) after this allocation though. -#define ALLOC(size) palloc(size) +#define ALLOC(size) MemoryContextAllocExtended(TopMemoryContext, size, MCXT_ALLOC_NO_OOM) As the only allocation in TopMemoryContext is for the context, it would be fine to not use MCXT_ALLOC_NO_OOM here, and fail so as callers in the backend don't need to worry about create() returning NULL. - state->evpctx = EVP_MD_CTX_create(); + ctx->evpctx = EVP_MD_CTX_create(); - if (state->evpctx == NULL) + if (ctx->evpctx == NULL) { If EVP_MD_CTX_create() fails, you would leak memory with the context allocated in TopMemoryContext. So this requires a free of the context before the elog(ERROR). + /* + * Make sure that the resource owner has space to remember this + * reference. This can error out with "out of memory", so do this + * before any other allocation to avoid leaking. + */ #ifndef FRONTEND ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner); #endif Right. Good point. - /* OpenSSL internals return 1 on success, 0 on failure */ + /* openssl internals return 1 on success, 0 on failure */ It seems to me that this was not wanted. At the same time, I have taken care of your comment from upthread to return a failure if the caller passes NULL for the context, and adjusted some comments. What do you think of the attached? -- Michael
Attachment
On Fri, Dec 18, 2020 at 6:04 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > BTW, it's a bit weird that the pg_cryptohash_init/update/final() > functions return success, if the ctx argument is NULL. It would seem > more sensible for them to return an error. That way, if a caller forgets > to check for NULL result from pg_cryptohash_create(), but correctly > checks the result of those other functions, it would catch the error. In > fact, if we documented that pg_cryptohash_create() can return NULL, and > pg_cryptohash_final() always returns error on NULL argument, then it > would be sufficient for the callers to only check the return value of > pg_cryptohash_final(). So the usage pattern would be: > > ctx = pg_cryptohash_create(PG_MD5); > pg_cryptohash_inti(ctx); > pg_update(ctx, data, size); > pg_update(ctx, moredata, size); > if (pg_cryptohash_final(ctx, &hash) < 0) > elog(ERROR, "md5 calculation failed"); > pg_cryptohash_free(ctx); TBH, I think there's no point in return an error here at all, because it's totally non-specific. You have no idea what failed, just that something failed. Blech. If we want to check that ctx is non-NULL, we should do that with an Assert(). Complicating the code with error checks that have to be added in multiple places that are far removed from where the actual problem was detected stinks. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Dec 21, 2020 at 04:28:26PM -0500, Robert Haas wrote: > TBH, I think there's no point in return an error here at all, because > it's totally non-specific. You have no idea what failed, just that > something failed. Blech. If we want to check that ctx is non-NULL, we > should do that with an Assert(). Complicating the code with error > checks that have to be added in multiple places that are far removed > from where the actual problem was detected stinks. You could technically do that, but only for the backend at the cost of painting the code of src/common/ with more #ifdef FRONTEND. Even if we do that, enforcing an error in the backend could be a problem when it comes to some code paths. One of them is the SCRAM mock authentication where we had better generate a generic error message. Using an Assert() or just letting the code go through is not good either, as we should avoid incorrect computations or crash on OOM, not to mention that this would fail the detection of bugs coming directly from OpenSSL or any other SSL library this code plugs with. In short, I think that there are more benefits in letting the caller control the error handling. -- Michael
Attachment
On Sat, Dec 19, 2020 at 03:13:50PM +0900, Michael Paquier wrote: > At the same time, I have taken care of your comment from upthread to > return a failure if the caller passes NULL for the context, and > adjusted some comments. What do you think of the attached? I have looked again at this thread with a fresher mind and I did not see a problem with the previous patch, except some indentation issues. So if there are no objections, I'd like to commit the attached. -- Michael
Attachment
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
From
Heikki Linnakangas
Date:
On 06/01/2021 13:42, Michael Paquier wrote: > On Sat, Dec 19, 2020 at 03:13:50PM +0900, Michael Paquier wrote: >> At the same time, I have taken care of your comment from upthread to >> return a failure if the caller passes NULL for the context, and >> adjusted some comments. What do you think of the attached? > > I have looked again at this thread with a fresher mind and I did not > see a problem with the previous patch, except some indentation > issues. So if there are no objections, I'd like to commit the > attached. Looks fine to me. - Heikki
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
From
Heikki Linnakangas
Date:
On 25/12/2020 02:57, Michael Paquier wrote: > On Mon, Dec 21, 2020 at 04:28:26PM -0500, Robert Haas wrote: >> TBH, I think there's no point in return an error here at all, because >> it's totally non-specific. You have no idea what failed, just that >> something failed. Blech. If we want to check that ctx is non-NULL, we >> should do that with an Assert(). Complicating the code with error >> checks that have to be added in multiple places that are far removed >> from where the actual problem was detected stinks. > > You could technically do that, but only for the backend at the cost of > painting the code of src/common/ with more #ifdef FRONTEND. Even if > we do that, enforcing an error in the backend could be a problem when > it comes to some code paths. Yeah, you would still need to remember to check for the error in frontend code. Maybe it would still be a good idea, not sure. It would be a nice backstop, if you forget to check for the error. I had a quick look at the callers: contrib/pgcrypto/internal-sha2.c and src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create() is missing check for NULL result. With your latest patch, that's OK because the subsequent pg_cryptohash_update() calls will fail if passed a NULL context. But seems sloppy. contrib/pgcrypto/internal.c: all the calls to pg_cryptohash_* functions are missing checks for error return codes. contrib/uuid-ossp/uuid-ossp.c: uses pg_cryptohash for MD5, but borrows the built-in implementation of SHA1 on some platforms. Should we add support for SHA1 in pg_cryptohash and use that for consistency? src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication. If the pg_cryptohash functions fail, we throw a distinct error "could not encode salt" that reveals that it was a mock authentication. I don't think this is a big deal in practice, it would be hard for an attacker to induce the SHA256 computation to fail, and there are probably better ways to distinguish mock authentication from real, like timing attacks. But still. src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we still need separate fields for the different sha variants. - Heikki
On Wed, Jan 06, 2021 at 03:27:03PM +0200, Heikki Linnakangas wrote: > Looks fine to me. Thanks, I have been able to get this part done as of 55fe26a. -- Michael
Attachment
On Wed, Jan 06, 2021 at 03:58:22PM +0200, Heikki Linnakangas wrote: > contrib/pgcrypto/internal-sha2.c and > src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create() > is missing check for NULL result. With your latest patch, that's OK because > the subsequent pg_cryptohash_update() calls will fail if passed a NULL > context. But seems sloppy. Is it? pg_cryptohash_create() will never return NULL for the backend. > contrib/pgcrypto/internal.c: all the calls to pg_cryptohash_* functions are > missing checks for error return codes. Indeed, these are incorrect, thanks. I'll go fix that separately. > contrib/uuid-ossp/uuid-ossp.c: uses pg_cryptohash for MD5, but borrows the > built-in implementation of SHA1 on some platforms. Should we add support for > SHA1 in pg_cryptohash and use that for consistency? Yeah, I have sent a separate patch for that: https://commitfest.postgresql.org/31/2868/ The cleanups produced by this patch are kind of nice. > src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication. > If the pg_cryptohash functions fail, we throw a distinct error "could not > encode salt" that reveals that it was a mock authentication. I don't think > this is a big deal in practice, it would be hard for an attacker to induce > the SHA256 computation to fail, and there are probably better ways to > distinguish mock authentication from real, like timing attacks. But still. This maps with the second error in the mock routine, so wouldn't it be better to change both and back-patch? The last place where this error message is used is pg_be_scram_build_secret() for the generation of what's stored in pg_authid. An idea would be to use "out of memory". That can be faced for any palloc() calls. > src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we > still need separate fields for the different sha variants. Using separate fields looked cleaner to me if it came to debugging, and the cleanup was rather minimal except if we use more switch/case to set up the various variables. What about something like the attached? -- Michael
Attachment
Re: Incorrect allocation handling for cryptohash functions with OpenSSL
From
Heikki Linnakangas
Date:
On 07/01/2021 06:15, Michael Paquier wrote: > On Wed, Jan 06, 2021 at 03:58:22PM +0200, Heikki Linnakangas wrote: >> contrib/pgcrypto/internal-sha2.c and >> src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create() >> is missing check for NULL result. With your latest patch, that's OK because >> the subsequent pg_cryptohash_update() calls will fail if passed a NULL >> context. But seems sloppy. > > Is it? pg_cryptohash_create() will never return NULL for the backend. Ah, you're right. >> src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication. >> If the pg_cryptohash functions fail, we throw a distinct error "could not >> encode salt" that reveals that it was a mock authentication. I don't think >> this is a big deal in practice, it would be hard for an attacker to induce >> the SHA256 computation to fail, and there are probably better ways to >> distinguish mock authentication from real, like timing attacks. But still. > > This maps with the second error in the mock routine, so wouldn't it be > better to change both and back-patch? The last place where this error > message is used is pg_be_scram_build_secret() for the generation of > what's stored in pg_authid. An idea would be to use "out of memory". > That can be faced for any palloc() calls. Hmm. Perhaps it would be best to change all the errors in mock authentication to COMMERROR. It'd be nice to have an accurate error message in the log, but no need to send it to the client. >> src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we >> still need separate fields for the different sha variants. > > Using separate fields looked cleaner to me if it came to debugging, > and the cleanup was rather minimal except if we use more switch/case > to set up the various variables. What about something like the > attached? +1 - Heikki
On Thu, Jan 07, 2021 at 09:51:00AM +0200, Heikki Linnakangas wrote: > Hmm. Perhaps it would be best to change all the errors in mock > authentication to COMMERROR. It'd be nice to have an accurate error message > in the log, but no need to send it to the client. Yeah, we could do that. Still, this mode still requires a hard failure because COMMERROR is just a log, and if only COMMERROR is done we still expect a salt to be generated to send a challenge back to the client, which would require a fallback for the salt if the one generated from the mock nonce cannot. Need to think more about that. >> Using separate fields looked cleaner to me if it came to debugging, >> and the cleanup was rather minimal except if we use more switch/case >> to set up the various variables. What about something like the >> attached? > > +1 Thanks, I have committed this part. -- Michael