Thread: md5 issues Postgres14 on OL7
Hi all, we're currently facing a strange behavior with Postgres14.1 on Oracle Linux 7.9 using md5. a basic statement leads to an out-of-memory error: postgres=# select md5('just a test'); ERROR: out of memory Anyone else facing the same issue? It may be related to hardening though disabling SELinux didn't solve the issue. Thanks in advance, Michael
Hi! ## Michael Mühlbeyer (Michael.Muehlbeyer@trivadis.com): > postgres=# select md5('just a test'); > ERROR: out of memory Active FIPS mode (/proc/sys/crypto/fips_enabled => 1) on the server does produce this behaviour. Regards, Christoph -- Spare Space
On Mon, Dec 20, 2021 at 03:22:31PM +0100, Christoph Moench-Tegeder wrote: > Active FIPS mode (/proc/sys/crypto/fips_enabled => 1) on the server does > produce this behaviour. Most likely, this is a build linked with OpenSSL? The way MD5 hashes are computed in Postgres has largely changed in 14, and the code has been refactored so as we rely on the EVP APIs from OpenSSL when building with --with-ssl=openssl, having as direct consequence to allocate a bit more memory every time a hash is computed. My guess is that this comes from pg_cryptohash_create() in cryptohash_openssl.c, with a complain coming from OpenSSL's EVP_MD_CTX_create(), but there are other palloc() calls in this area as well. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Dec 20, 2021 at 03:22:31PM +0100, Christoph Moench-Tegeder wrote: >> Active FIPS mode (/proc/sys/crypto/fips_enabled => 1) on the server does >> produce this behaviour. > Most likely, this is a build linked with OpenSSL? The way MD5 hashes > are computed in Postgres has largely changed in 14, and the code has > been refactored so as we rely on the EVP APIs from OpenSSL when > building with --with-ssl=openssl, having as direct consequence to > allocate a bit more memory every time a hash is computed. My guess is > that this comes from pg_cryptohash_create() in cryptohash_openssl.c, > with a complain coming from OpenSSL's EVP_MD_CTX_create(), but there > are other palloc() calls in this area as well. I reproduced this on Fedora 35 with FIPS mode enabled. The problem is that OpenSSL treats MD5 as a disallowed cipher type under FIPS mode, so this call in pg_cryptohash_init fails: status = EVP_DigestInit_ex(ctx->evpctx, EVP_md5(), NULL); and then we come back to this in md5_text(): /* get the hash result */ if (pg_md5_hash(VARDATA_ANY(in_text), len, hexsum) == false) ereport(ERROR, (errcode(ERRCODE_OUT_OF_MEMORY), errmsg("out of memory"))); So there's nothing actually misbehaving, but our error reportage sucks: the hash functions have no way to report a specific failure code, and the caller(s) think the only possible failure mode is OOM. I suppose we could get around the error by using our own MD5 code even in OpenSSL-enabled builds, but that'd violate both the spirit and the letter of FIPS certification. I think the right response is to upgrade the error-reporting API in this area, so that the message could look more like "MD5 is disallowed in FIPS mode". regards, tom lane
## Michael Paquier (michael@paquier.xyz): > On Mon, Dec 20, 2021 at 03:22:31PM +0100, Christoph Moench-Tegeder wrote: > > Active FIPS mode (/proc/sys/crypto/fips_enabled => 1) on the server does > > produce this behaviour. > > Most likely, this is a build linked with OpenSSL? The way MD5 hashes > are computed in Postgres has largely changed in 14, and the code has > been refactored so as we rely on the EVP APIs from OpenSSL when > building with --with-ssl=openssl, having as direct consequence to > allocate a bit more memory every time a hash is computed. You can reproduce that behaviour with the PGDG-RPMs on CentOS 7. Enable FIPS-mode, reboot, and immediately md5() fails. The PGDG-RPMS are built with openssl ("--with-openssl" in pg_config output), as of course you need SSL today. "Supports FIPS mode" is one of the selling points for your cryptohash patches in the Release Notes, and that means no md5 when FIPS is enforced (I think FIPS is a little too strict in this regard, as people do invent horrid workarounds, which does not really improve matters; but that's another can of worms). Anyway, it's not the memory, but "out of memory" is all PostgreSQL reports when anything in the hashing operations returns a failure. Regards, Christoph -- Spare Space
On Tue, Jan 04, 2022 at 12:54:35PM -0500, Tom Lane wrote: > I reproduced this on Fedora 35 with FIPS mode enabled. The problem > is that OpenSSL treats MD5 as a disallowed cipher type under FIPS > mode, so this call in pg_cryptohash_init fails: Is that 3.0.0 or 1.1.1? I can see the following, telling that Fedora 35 uses OpenSSL 1.1.1: https://packages.fedoraproject.org/pkgs/openssl/openssl/ And there is FIPS 2.0 for 1.0.1 and 1.0.2 (funnily, I recall that 1.0.2+FIPS allows MD5 to work with the EVP routines), but the module of FIPS 3.0 does not work with 1.1.1 AFAIK, so I may be confused. Or perhaps this is OpenSSL 1.1.1 with a separate module? The upstream code does nothing special with EVP_DigestInit_ex() in 1.1.1 (see digest.c), contrary to 3.0.0 where there is specific knowledge of FIPS. > status = EVP_DigestInit_ex(ctx->evpctx, EVP_md5(), NULL); > > and then we come back to this in md5_text(): > > /* get the hash result */ > if (pg_md5_hash(VARDATA_ANY(in_text), len, hexsum) == false) > ereport(ERROR, > (errcode(ERRCODE_OUT_OF_MEMORY), > errmsg("out of memory"))); > > So there's nothing actually misbehaving, but our error reportage sucks: > the hash functions have no way to report a specific failure code, > and the caller(s) think the only possible failure mode is OOM. Indeed, this error is a pilot error with the cryptohash integration of 14. In ~13, the custom MD5 implementation would only fail on OOM, but more failure modes are possible now. > I suppose we could get around the error by using our own MD5 code > even in OpenSSL-enabled builds, but that'd violate both the spirit > and the letter of FIPS certification. I don't think we should go back to an integration of our custom MD5 if we have external libraries that provide support for it. That's against the set of FIPS policies. > I think the right response is > to upgrade the error-reporting API in this area, so that the message > could look more like "MD5 is disallowed in FIPS mode". Hmm. I am not sure how much we should try to make the backend, or even the frontend, FIPS-aware (remember for example 0182438 where we avoided this kind of complexity), and not all SSL libraries we would potentially add support for may care about such error states. The cleanest approach may be to extend the APIs to store an **errstr so as implementations are free to assign the error string they want to send back for the error reporting, rather than using more error codes. If we want to improve things in this area with FIPS (aka allow check-world to pass in this case), we would need more in terms of alternate test output, and extra tweaks for the authentication tests, as well. Perhaps the best thing to do in the long term would be to drop MD5, but we are not there yet IMO even if password_encryption default has changed, and upgrade scenarios get hairy. At the end, I agree that we should improve the error message in these two cases. However, I would stick to simplicity by not assuming that those two code paths fail only on OOM, and reword things in md5_text() and md5_bytea() with a simple "could not compute MD5 hash". Any code paths calling the routines of md5_common.c just do that as well for ages when the computation fails, and that's what we care about here. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Tue, Jan 04, 2022 at 12:54:35PM -0500, Tom Lane wrote: >> I reproduced this on Fedora 35 with FIPS mode enabled. The problem >> is that OpenSSL treats MD5 as a disallowed cipher type under FIPS >> mode, so this call in pg_cryptohash_init fails: > Is that 3.0.0 or 1.1.1? I can see the following, telling that Fedora > 35 uses OpenSSL 1.1.1: > https://packages.fedoraproject.org/pkgs/openssl/openssl/ I don't have the image booted up right at the moment, but it was a plain vanilla, fresh-out-of-the-box F35 install, so whatever the default openssl version is for that. That link does say that it should be 1.1.1l. > Indeed, this error is a pilot error with the cryptohash integration of > 14. In ~13, the custom MD5 implementation would only fail on OOM, but > more failure modes are possible now. Right, the code in md5_text() was fine when it was written ... but now, not so much. > At the end, I agree that we should improve the error message in these > two cases. However, I would stick to simplicity by not assuming that > those two code paths fail only on OOM, and reword things in md5_text() > and md5_bytea() with a simple "could not compute MD5 hash". Any code > paths calling the routines of md5_common.c just do that as well for > ages when the computation fails, and that's what we care about here. I think it's very important that the error message in this case mention "FIPS mode" explicitly. Otherwise, people will have no idea that that's where the problem originates, and they'll be frustrated and we'll get bug reports. (They may be frustrated anyway, but it was their choice, or their corporate policy's choice, to cut off their access to MD5. Not our place to dodge that decision.) regards, tom lane
On Wed, Jan 05, 2022 at 01:08:53AM -0500, Tom Lane wrote: > I think it's very important that the error message in this case > mention "FIPS mode" explicitly. Otherwise, people will have no > idea that that's where the problem originates, and they'll be > frustrated and we'll get bug reports. (They may be frustrated > anyway, but it was their choice, or their corporate policy's > choice, to cut off their access to MD5. Not our place to dodge > that decision.) I am not completely sure how to detect that in 1.1.1 in the context of Fedora, and portability may become a tricky thing. FIPS_mode() and FIPS_mode_set() are legacy APIs that should not be used, and upstream just disables them in 1.1.1. [... digs a bit ...] Ugh. Fedora patches upstream's 1.1.1 to check and react on /proc/sys/crypto/fips_enabled. Their code is here, see particularly 0009-Add-Kernel-FIPS-mode-flag-support.patch: https://src.fedoraproject.org/rpms/openssl.git So that's why you are able to use it with 1.1.1. Well, we could do something similar to that, but in 3.0.0 things are done very differently: one has to set to alg_sect fips=yes with fips = fips_sect in the OpenSSL configuration to load the FIPS provider. Providing more error context is going to be hairy here.. In order to make things portable with 14 in cryptohash.c, we don't have any need to change the existing cryptohash APIs. We could just store in each implementation context a location to a static string, and add a new routine to extract it if there is an error, defaulting to OOM. -- Michael
Attachment
On Wed, Jan 05, 2022 at 04:09:12PM +0900, Michael Paquier wrote: > In order to make things portable with 14 in cryptohash.c, we don't > have any need to change the existing cryptohash APIs. We could just > store in each implementation context a location to a static string, > and add a new routine to extract it if there is an error, defaulting > to OOM. I have been looking at that, and finished with the attached. It is close to the end of the day, so this needs an extra lookup, but I have finished by using the idea of an extra routine, called pg_cryptohash_error(), able to grab the error saved in the private contexts, so as any callers from the backend or the frontend can feed on that. This way, it is possible to make the difference between several class of errors: OOMs, a too short destination buffer, OpenSSL internal error, etc. There are a couple of things worth noting here: - Two code paths of src/backend/libpq/crypt.c rely on the result of pg_md5_encrypt() to always be an OOM, so as this skips one psnprintf(). This has been let as-is in the backend for now, but we don't pfree() the *logdetail strings passed across the various layers, so we could just pass down the cryptohash error as-is.. We'd better mention that logdetail may not be palloc'd all the time, once we do that. libpq is able to use that properly. - The routines of md5_common.c need to pass down an extra *errstr for their respective callers. That's an ABI breakage but I'd like to think that nobody uses that out-of-core. (I need to double-check this part, as well). - HMAC (hmac_openssl.c and hmac.c) could use the same infra, but I did not see a use for that yet. It is possible to compile HMACs with MD5s, but we don't have any in-core callers, and failures are just part of the SCRAM workflow with dedicated error messages. I am still not sure about the FIPS part, as per the argument of OpenSSL using something different in 3.0.0, and Fedora that patches upstream in its own way, but this could be extended in cryptohash_openssl.c to provide even more context. For now, this allows reports about OpenSSL internal failures, taking care of the disturbance. Thoughts? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > I have been looking at that, and finished with the attached. It is > close to the end of the day, so this needs an extra lookup, but I have > finished by using the idea of an extra routine, called > pg_cryptohash_error(), able to grab the error saved in the private > contexts, so as any callers from the backend or the frontend can feed > on that. This way, it is possible to make the difference between > several class of errors: OOMs, a too short destination buffer, OpenSSL > internal error, etc. I don't like the end result of this at all: postgres=# select md5('foo'); ERROR: could not compute MD5 hash: OpenSSL failure Maybe we've successfully laid off blame somewhere else, but this doesn't move the user one inch towards understanding the cause. I think we need to report the actual OpenSSL error reason. I experimented with the attached, very rough delta on top of your patch, and got postgres=# select md5('foo'); ERROR: could not compute MD5 hash: disabled for FIPS which seems far better, plus it'd work for other sorts of OpenSSL failures. There are two problems with my delta as it stands: 1. It draws a cast-away-const warning. We'd have to make the result of pg_cryptohash_error be "const char *", which would be better practice anyway, but that propagates into some other APIs and I didn't take the trouble to chase it to the end. 2. It feels a bit bogus to be fetching ERR_get_error() at this point. Maybe it's all right to assume that the OpenSSL error stack won't change state before we get to pg_cryptohash_error, but I don't like the idea much. I think it'd be better to capture ERR_get_error() sooner and store it in an additional field in pg_cryptohash_ctx. Also, I wonder if this shouldn't be unified with the SSLerrmessage() support found in be-secure-openssl.c and fe-secure-openssl.c. regards, tom lane diff -u cryptohash_openssl.c.orig cryptohash_openssl.c --- cryptohash_openssl.c.orig 2022-01-06 11:15:59.268256281 -0500 +++ cryptohash_openssl.c 2022-01-06 11:22:28.602734304 -0500 @@ -21,6 +21,7 @@ #include "postgres_fe.h" #endif +#include <openssl/err.h> #include <openssl/evp.h> #include "common/cryptohash.h" @@ -309,7 +310,14 @@ case PG_CRYPTOHASH_ERROR_DEST_LEN: return _("destination buffer too small"); case PG_CRYPTOHASH_ERROR_OPENSSL: - return _("OpenSSL failure"); + { + unsigned long ecode = ERR_get_error(); + const char *errreason = ERR_reason_error_string(ecode); + + if (errreason) + return errreason; + return _("OpenSSL failure"); + } } /* assume that the default is out-of-memory, anyway */
On Thu, Jan 06, 2022 at 11:40:04AM -0500, Tom Lane wrote: > 1. It draws a cast-away-const warning. We'd have to make the result > of pg_cryptohash_error be "const char *", which would be better > practice anyway, but that propagates into some other APIs and I didn't > take the trouble to chase it to the end. Yeah. I wanted to switch all those routines to use a const here anyway, just did not have the time to tackle that yesterday with the rest of the issues I could think about. Looking at that today, I don't see any problems in switching to const in all those places, so done this way (two places in crypt.c are more picky, though, for logdetail). > 2. It feels a bit bogus to be fetching ERR_get_error() at this point. > Maybe it's all right to assume that the OpenSSL error stack won't > change state before we get to pg_cryptohash_error, but I don't like > the idea much. I think it'd be better to capture ERR_get_error() > sooner and store it in an additional field in pg_cryptohash_ctx. Right, I forgot about the ERR_get_error() piece of it. Thanks! I'd also rather have the check done just after the OpenSSL call. If hash computations are split across multiple code paths, this could lead to issues. We don't have any problems currently in the core code, but I see no reason to not make that safer in the long run. And the structure is flexible enough, so that's not an issue. > Also, I wonder if this shouldn't be unified with the SSLerrmessage() > support found in be-secure-openssl.c and fe-secure-openssl.c. Guess so. HEAD could be poked at for this part. I recall looking at that once by that did not seem worth the complications. I have also looked at the ABI part of the patch. I cannot spot any public trace of pg_md5_hash() and pg_md5_binary(). pgbouncer and pgpool2 have each a copy of pg_md5_encrypt(), but they just share the API name with PG, nothing more. So that looks reasonably safe to change. The last thing I have on my notes is to assign logdetail in md5_crypt_verify() and plain_crypt_verify() to feed back a LOG entry to the postmaster on those failures, and saw that it is safe to assign directly the error returned by the cryptohash APIs, avoiding the extra psprintf call that could become an issue under memory pressure. What do you think? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Thu, Jan 06, 2022 at 11:40:04AM -0500, Tom Lane wrote: >> Also, I wonder if this shouldn't be unified with the SSLerrmessage() >> support found in be-secure-openssl.c and fe-secure-openssl.c. > Guess so. HEAD could be poked at for this part. I recall looking at > that once by that did not seem worth the complications. Yeah, I suppose there's only a couple lines of code to be saved, and the complexity of dealing with multiple memory allocation conventions would outweigh that. > What do you think? Hm, you still have cast-away-const in md5_crypt_verify and plain_crypt_verify. Can we adjust their APIs to make them return const char * as well (and then their API spec is that the caller must never free the string, rather than being vague about it)? The other thing that bothers me slightly is that it looks like some code paths could end up passing a NULL string pointer to ereport or sprintf, since you don't positively guarantee that an error will return a string there. I suppose this is safe since 3779ac62d, but I don't really want to start making API specs depend on it. regards, tom lane
On Fri, Jan 07, 2022 at 05:40:09PM -0500, Tom Lane wrote: > Hm, you still have cast-away-const in md5_crypt_verify and > plain_crypt_verify. Can we adjust their APIs to make them > return const char * as well (and then their API spec is that > the caller must never free the string, rather than being > vague about it)? Okay. Hmm. This requires a couple of extra const markers in the area of authentication and SASL for the backend, but not much actually. I thought first that it would be more invasive. > The other thing that bothers me slightly is that it looks like > some code paths could end up passing a NULL string pointer to > ereport or sprintf, since you don't positively guarantee that > an error will return a string there. I suppose this is safe > since 3779ac62d, but I don't really want to start making API > specs depend on it. Sounds fair to me in the long term, even for non-assert builds. I have added a small-ish wrapper routine in crytohash_openssl.c for this purpose, with a name copied from {be,fe}-secure-openssl.c to ease any future refactoring lookups if that proves to be worth in the future. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > [ v3-0001-Improve-error-reporting-for-cryptohashes.patch ] This is looking pretty solid to me. Just a couple of nitpicks: * In most places you initialize variables holding error strings to NULL: + const char *logdetail = NULL; but there are three or so spots that don't, eg PerformRadiusTransaction. They should be consistent. (Even if there's no actual bug, I'd be unsurprised to see Coverity carp about the inconsistency.) * The comments for md5_crypt_verify and plain_crypt_verify claim that the error string is "optionally" stored, but I don't see anything optional about it. To me, "optional" would imply coding like if (logdetail) *logdetail = errstr; which we don't have here, and I don't think we need it. But the comments need adjustment. (They were wrong before too, but no time like the present to clean them up.) * I'd be inclined to just drop the existing comments like - * We do not bother setting logdetail for any pg_md5_encrypt failure - * below: the only possible error is out-of-memory, which is unlikely, and - * if it did happen adding a psprintf call would only make things worse. rather than modify them. Neither the premise nor the conclusion of these comments is accurate anymore. (I think that the psprintf they are talking about is the one that will happen inside elog.c to construct an errdetail string. Yeah, there's some risk there, but I think it's minimal because of the fact that we preallocate some space in ErrorContext.) Other than those things, I think v3 is good to go. regards, tom lane
On Sat, Jan 08, 2022 at 02:00:16PM -0500, Tom Lane wrote: > This is looking pretty solid to me. Just a couple of nitpicks: > > * In most places you initialize variables holding error strings to NULL: > > + const char *logdetail = NULL; > > but there are three or so spots that don't, eg PerformRadiusTransaction. > They should be consistent. (Even if there's no actual bug, I'd be > unsurprised to see Coverity carp about the inconsistency.) Hmm. I have spotted five of them, with one in passwordcheck. > * The comments for md5_crypt_verify and plain_crypt_verify claim that > the error string is "optionally" stored, but I don't see anything > optional about it. To me, "optional" would imply coding like > > if (logdetail) > *logdetail = errstr; > > which we don't have here, and I don't think we need it. But the > comments need adjustment. (They were wrong before too, but no > time like the present to clean them up.) Makes sense. > * I'd be inclined to just drop the existing comments like > > - * We do not bother setting logdetail for any pg_md5_encrypt failure > - * below: the only possible error is out-of-memory, which is unlikely, and > - * if it did happen adding a psprintf call would only make things worse. > > rather than modify them. Neither the premise nor the conclusion > of these comments is accurate anymore. (I think that the psprintf > they are talking about is the one that will happen inside elog.c > to construct an errdetail string. Yeah, there's some risk there, > but I think it's minimal because of the fact that we preallocate > some space in ErrorContext.) Okay, that's fine by me. > Other than those things, I think v3 is good to go. I have done an extra pass on all that, and the result seemed fine to me, so applied. I have changed the non-OpenSSL code path of pgcrypto to deal with that in 14 (does not exist on HEAD). Thanks a lot for the successive reviews! The patch was invasive enough, but we could do more here: - Add the same facility for HMAC. That's not worth on REL_14_STABLE based on the existing set of callers, but I'd like to do something about that on HEAD as that could be helpful in the future. - The error areas related to checksum_helper.c and backup_manifest.c could be improved more. Now these refer only to scenarios unlikely going to happen in the field, so I have left that out. -- Michael