Thread: md5 issues Postgres14 on OL7

md5 issues Postgres14 on OL7

From
Michael Mühlbeyer
Date:
Hi all,

we're currently facing a strange behavior with Postgres14.1 on Oracle Linux 7.9
using md5.

a basic statement leads to an out-of-memory error:

postgres=# select md5('just a test');
ERROR:  out of memory

Anyone else facing the same issue?
It may be related to hardening though disabling SELinux didn't solve the issue.

Thanks in advance,
Michael



Re: md5 issues Postgres14 on OL7

From
Christoph Moench-Tegeder
Date:
Hi!

## Michael Mühlbeyer (Michael.Muehlbeyer@trivadis.com):

> postgres=# select md5('just a test');
> ERROR:  out of memory

Active FIPS mode (/proc/sys/crypto/fips_enabled => 1) on the server does
produce this behaviour.

Regards,
Christoph

-- 
Spare Space



Re: md5 issues Postgres14 on OL7

From
Michael Paquier
Date:
On Mon, Dec 20, 2021 at 03:22:31PM +0100, Christoph Moench-Tegeder wrote:
> Active FIPS mode (/proc/sys/crypto/fips_enabled => 1) on the server does
> produce this behaviour.

Most likely, this is a build linked with OpenSSL?  The way MD5 hashes
are computed in Postgres has largely changed in 14, and the code has
been refactored so as we rely on the EVP APIs from OpenSSL when
building with --with-ssl=openssl, having as direct consequence to
allocate a bit more memory every time a hash is computed.  My guess is
that this comes from pg_cryptohash_create() in cryptohash_openssl.c,
with a complain coming from OpenSSL's EVP_MD_CTX_create(), but there
are other palloc() calls in this area as well.
--
Michael

Attachment

Re: md5 issues Postgres14 on OL7

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Dec 20, 2021 at 03:22:31PM +0100, Christoph Moench-Tegeder wrote:
>> Active FIPS mode (/proc/sys/crypto/fips_enabled => 1) on the server does
>> produce this behaviour.

> Most likely, this is a build linked with OpenSSL?  The way MD5 hashes
> are computed in Postgres has largely changed in 14, and the code has
> been refactored so as we rely on the EVP APIs from OpenSSL when
> building with --with-ssl=openssl, having as direct consequence to
> allocate a bit more memory every time a hash is computed.  My guess is
> that this comes from pg_cryptohash_create() in cryptohash_openssl.c,
> with a complain coming from OpenSSL's EVP_MD_CTX_create(), but there
> are other palloc() calls in this area as well.

I reproduced this on Fedora 35 with FIPS mode enabled.  The problem
is that OpenSSL treats MD5 as a disallowed cipher type under FIPS
mode, so this call in pg_cryptohash_init fails:

    status = EVP_DigestInit_ex(ctx->evpctx, EVP_md5(), NULL);

and then we come back to this in md5_text():

    /* get the hash result */
    if (pg_md5_hash(VARDATA_ANY(in_text), len, hexsum) == false)
        ereport(ERROR,
                (errcode(ERRCODE_OUT_OF_MEMORY),
                 errmsg("out of memory")));

So there's nothing actually misbehaving, but our error reportage sucks:
the hash functions have no way to report a specific failure code,
and the caller(s) think the only possible failure mode is OOM.

I suppose we could get around the error by using our own MD5 code
even in OpenSSL-enabled builds, but that'd violate both the spirit
and the letter of FIPS certification.  I think the right response is
to upgrade the error-reporting API in this area, so that the message
could look more like "MD5 is disallowed in FIPS mode".

            regards, tom lane



Re: md5 issues Postgres14 on OL7

From
Christoph Moench-Tegeder
Date:
## Michael Paquier (michael@paquier.xyz):

> On Mon, Dec 20, 2021 at 03:22:31PM +0100, Christoph Moench-Tegeder wrote:
> > Active FIPS mode (/proc/sys/crypto/fips_enabled => 1) on the server does
> > produce this behaviour.
> 
> Most likely, this is a build linked with OpenSSL?  The way MD5 hashes
> are computed in Postgres has largely changed in 14, and the code has
> been refactored so as we rely on the EVP APIs from OpenSSL when
> building with --with-ssl=openssl, having as direct consequence to
> allocate a bit more memory every time a hash is computed.

You can reproduce that behaviour with the PGDG-RPMs on CentOS 7.
Enable FIPS-mode, reboot, and immediately md5() fails. The PGDG-RPMS
are built with openssl ("--with-openssl" in pg_config output),
as of course you need SSL today.

"Supports FIPS mode" is one of the selling points for your cryptohash
patches in the Release Notes, and that means no md5 when FIPS is
enforced (I think FIPS is a little too strict in this regard, as
people do invent horrid workarounds, which does not really improve
matters; but that's another can of worms).

Anyway, it's not the memory, but "out of memory" is all PostgreSQL
reports when anything in the hashing operations returns a failure.

Regards,
Christoph

-- 
Spare Space



Re: md5 issues Postgres14 on OL7

From
Michael Paquier
Date:
On Tue, Jan 04, 2022 at 12:54:35PM -0500, Tom Lane wrote:
> I reproduced this on Fedora 35 with FIPS mode enabled.  The problem
> is that OpenSSL treats MD5 as a disallowed cipher type under FIPS
> mode, so this call in pg_cryptohash_init fails:

Is that 3.0.0 or 1.1.1?  I can see the following, telling that Fedora
35 uses OpenSSL 1.1.1:
https://packages.fedoraproject.org/pkgs/openssl/openssl/

And there is FIPS 2.0 for 1.0.1 and 1.0.2 (funnily, I recall that
1.0.2+FIPS allows MD5 to work with the EVP routines), but the module
of FIPS 3.0 does not work with 1.1.1 AFAIK, so I may be confused.  Or
perhaps this is OpenSSL 1.1.1 with a separate module?  The upstream
code does nothing special with EVP_DigestInit_ex() in 1.1.1 (see
digest.c), contrary to 3.0.0 where there is specific knowledge of
FIPS.

>     status = EVP_DigestInit_ex(ctx->evpctx, EVP_md5(), NULL);
>
> and then we come back to this in md5_text():
>
>     /* get the hash result */
>     if (pg_md5_hash(VARDATA_ANY(in_text), len, hexsum) == false)
>         ereport(ERROR,
>                 (errcode(ERRCODE_OUT_OF_MEMORY),
>                  errmsg("out of memory")));
>
> So there's nothing actually misbehaving, but our error reportage sucks:
> the hash functions have no way to report a specific failure code,
> and the caller(s) think the only possible failure mode is OOM.

Indeed, this error is a pilot error with the cryptohash integration of
14.  In ~13, the custom MD5 implementation would only fail on OOM, but
more failure modes are possible now.

> I suppose we could get around the error by using our own MD5 code
> even in OpenSSL-enabled builds, but that'd violate both the spirit
> and the letter of FIPS certification.

I don't think we should go back to an integration of our custom MD5 if
we have external libraries that provide support for it.  That's
against the set of FIPS policies.

> I think the right response is
> to upgrade the error-reporting API in this area, so that the message
> could look more like "MD5 is disallowed in FIPS mode".

Hmm.  I am not sure how much we should try to make the backend, or
even the frontend, FIPS-aware (remember for example 0182438 where we
avoided this kind of complexity), and not all SSL libraries we would
potentially add support for may care about such error states.  The
cleanest approach may be to extend the APIs to store an **errstr so
as implementations are free to assign the error string they want to
send back for the error reporting, rather than using more error codes.
If we want to improve things in this area with FIPS (aka allow
check-world to pass in this case), we would need more in terms of
alternate test output, and extra tweaks for the authentication tests,
as well.  Perhaps the best thing to do in the long term would be to
drop MD5, but we are not there yet IMO even if password_encryption
default has changed, and upgrade scenarios get hairy.

At the end, I agree that we should improve the error message in these
two cases.  However, I would stick to simplicity by not assuming that
those two code paths fail only on OOM, and reword things in md5_text()
and md5_bytea() with a simple "could not compute MD5 hash".  Any code
paths calling the routines of md5_common.c just do that as well for
ages when the computation fails, and that's what we care about here.
--
Michael

Attachment

Re: md5 issues Postgres14 on OL7

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Tue, Jan 04, 2022 at 12:54:35PM -0500, Tom Lane wrote:
>> I reproduced this on Fedora 35 with FIPS mode enabled.  The problem
>> is that OpenSSL treats MD5 as a disallowed cipher type under FIPS
>> mode, so this call in pg_cryptohash_init fails:

> Is that 3.0.0 or 1.1.1?  I can see the following, telling that Fedora
> 35 uses OpenSSL 1.1.1:
> https://packages.fedoraproject.org/pkgs/openssl/openssl/

I don't have the image booted up right at the moment, but it was
a plain vanilla, fresh-out-of-the-box F35 install, so whatever the
default openssl version is for that.  That link does say that it
should be 1.1.1l.

> Indeed, this error is a pilot error with the cryptohash integration of
> 14.  In ~13, the custom MD5 implementation would only fail on OOM, but
> more failure modes are possible now.

Right, the code in md5_text() was fine when it was written ... but
now, not so much.

> At the end, I agree that we should improve the error message in these
> two cases.  However, I would stick to simplicity by not assuming that 
> those two code paths fail only on OOM, and reword things in md5_text()
> and md5_bytea() with a simple "could not compute MD5 hash".  Any code
> paths calling the routines of md5_common.c just do that as well for
> ages when the computation fails, and that's what we care about here.

I think it's very important that the error message in this case
mention "FIPS mode" explicitly.  Otherwise, people will have no
idea that that's where the problem originates, and they'll be
frustrated and we'll get bug reports.  (They may be frustrated
anyway, but it was their choice, or their corporate policy's
choice, to cut off their access to MD5.  Not our place to dodge
that decision.)

            regards, tom lane



Re: md5 issues Postgres14 on OL7

From
Michael Paquier
Date:
On Wed, Jan 05, 2022 at 01:08:53AM -0500, Tom Lane wrote:
> I think it's very important that the error message in this case
> mention "FIPS mode" explicitly.  Otherwise, people will have no
> idea that that's where the problem originates, and they'll be
> frustrated and we'll get bug reports.  (They may be frustrated
> anyway, but it was their choice, or their corporate policy's
> choice, to cut off their access to MD5.  Not our place to dodge
> that decision.)

I am not completely sure how to detect that in 1.1.1 in the context of
Fedora, and portability may become a tricky thing.  FIPS_mode() and
FIPS_mode_set() are legacy APIs that should not be used, and upstream
just disables them in 1.1.1.

[... digs a bit ...]

Ugh.  Fedora patches upstream's 1.1.1 to check and react on
/proc/sys/crypto/fips_enabled.  Their code is here, see particularly
0009-Add-Kernel-FIPS-mode-flag-support.patch:
https://src.fedoraproject.org/rpms/openssl.git

So that's why you are able to use it with 1.1.1.  Well, we could do
something similar to that, but in 3.0.0 things are done very
differently: one has to set to alg_sect fips=yes with fips = fips_sect
in the OpenSSL configuration to load the FIPS provider.  Providing
more error context is going to be hairy here..

In order to make things portable with 14 in cryptohash.c, we don't
have any need to change the existing cryptohash APIs.  We could just
store in each implementation context a location to a static string,
and add a new routine to extract it if there is an error, defaulting
to OOM.
--
Michael

Attachment

Re: md5 issues Postgres14 on OL7

From
Michael Paquier
Date:
On Wed, Jan 05, 2022 at 04:09:12PM +0900, Michael Paquier wrote:
> In order to make things portable with 14 in cryptohash.c, we don't
> have any need to change the existing cryptohash APIs.  We could just
> store in each implementation context a location to a static string,
> and add a new routine to extract it if there is an error, defaulting
> to OOM.

I have been looking at that, and finished with the attached.  It is
close to the end of the day, so this needs an extra lookup, but I have
finished by using the idea of an extra routine, called
pg_cryptohash_error(), able to grab the error saved in the private
contexts, so as any callers from the backend or the frontend can feed
on that.  This way, it is possible to make the difference between
several class of errors: OOMs, a too short destination buffer, OpenSSL
internal error, etc.

There are a couple of things worth noting here:
- Two code paths of src/backend/libpq/crypt.c rely on the result of
pg_md5_encrypt() to always be an OOM, so as this skips one
psnprintf().  This has been let as-is in the backend for now, but we
don't pfree() the *logdetail strings passed across the various layers,
so we could just pass down the cryptohash error as-is..  We'd better
mention that logdetail may not be palloc'd all the time, once we do
that.  libpq is able to use that properly.
- The routines of md5_common.c need to pass down an extra *errstr for
their respective callers.  That's an ABI breakage but I'd like to
think that nobody uses that out-of-core.  (I need to double-check this
part, as well).
- HMAC (hmac_openssl.c and hmac.c) could use the same infra, but I did
not see a use for that yet.  It is possible to compile HMACs with MD5s,
but we don't have any in-core callers, and failures are just part of
the SCRAM workflow with dedicated error messages.

I am still not sure about the FIPS part, as per the argument of
OpenSSL using something different in 3.0.0, and Fedora that patches
upstream in its own way, but this could be extended in
cryptohash_openssl.c to provide even more context.  For now, this
allows reports about OpenSSL internal failures, taking care of the
disturbance.

Thoughts?
--
Michael

Attachment

Re: md5 issues Postgres14 on OL7

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> I have been looking at that, and finished with the attached.  It is
> close to the end of the day, so this needs an extra lookup, but I have
> finished by using the idea of an extra routine, called
> pg_cryptohash_error(), able to grab the error saved in the private
> contexts, so as any callers from the backend or the frontend can feed
> on that.  This way, it is possible to make the difference between
> several class of errors: OOMs, a too short destination buffer, OpenSSL
> internal error, etc.

I don't like the end result of this at all:

postgres=# select md5('foo');
ERROR:  could not compute MD5 hash: OpenSSL failure

Maybe we've successfully laid off blame somewhere else, but this
doesn't move the user one inch towards understanding the cause.
I think we need to report the actual OpenSSL error reason.

I experimented with the attached, very rough delta on top of your
patch, and got

postgres=# select md5('foo');
ERROR:  could not compute MD5 hash: disabled for FIPS

which seems far better, plus it'd work for other sorts of OpenSSL
failures.

There are two problems with my delta as it stands:

1. It draws a cast-away-const warning.  We'd have to make the result
of pg_cryptohash_error be "const char *", which would be better
practice anyway, but that propagates into some other APIs and I didn't
take the trouble to chase it to the end.

2. It feels a bit bogus to be fetching ERR_get_error() at this point.
Maybe it's all right to assume that the OpenSSL error stack won't
change state before we get to pg_cryptohash_error, but I don't like
the idea much.  I think it'd be better to capture ERR_get_error()
sooner and store it in an additional field in pg_cryptohash_ctx.

Also, I wonder if this shouldn't be unified with the SSLerrmessage()
support found in be-secure-openssl.c and fe-secure-openssl.c.

            regards, tom lane

diff -u cryptohash_openssl.c.orig cryptohash_openssl.c
--- cryptohash_openssl.c.orig    2022-01-06 11:15:59.268256281 -0500
+++ cryptohash_openssl.c    2022-01-06 11:22:28.602734304 -0500
@@ -21,6 +21,7 @@
 #include "postgres_fe.h"
 #endif
 
+#include <openssl/err.h>
 #include <openssl/evp.h>
 
 #include "common/cryptohash.h"
@@ -309,7 +310,14 @@
         case PG_CRYPTOHASH_ERROR_DEST_LEN:
             return _("destination buffer too small");
         case PG_CRYPTOHASH_ERROR_OPENSSL:
-            return _("OpenSSL failure");
+            {
+                unsigned long ecode = ERR_get_error();
+                const char *errreason = ERR_reason_error_string(ecode);
+
+                if (errreason)
+                    return errreason;
+                return _("OpenSSL failure");
+            }
     }
 
     /* assume that the default is out-of-memory, anyway */

Re: md5 issues Postgres14 on OL7

From
Michael Paquier
Date:
On Thu, Jan 06, 2022 at 11:40:04AM -0500, Tom Lane wrote:
> 1. It draws a cast-away-const warning.  We'd have to make the result
> of pg_cryptohash_error be "const char *", which would be better
> practice anyway, but that propagates into some other APIs and I didn't
> take the trouble to chase it to the end.

Yeah.  I wanted to switch all those routines to use a const here
anyway, just did not have the time to tackle that yesterday with the
rest of the issues I could think about.  Looking at that today, I
don't see any problems in switching to const in all those places, so
done this way (two places in crypt.c are more picky, though, for
logdetail).

> 2. It feels a bit bogus to be fetching ERR_get_error() at this point.
> Maybe it's all right to assume that the OpenSSL error stack won't
> change state before we get to pg_cryptohash_error, but I don't like
> the idea much.  I think it'd be better to capture ERR_get_error()
> sooner and store it in an additional field in pg_cryptohash_ctx.

Right, I forgot about the ERR_get_error() piece of it.  Thanks!  I'd
also rather have the check done just after the OpenSSL call.  If hash
computations are split across multiple code paths, this could lead to
issues.  We don't have any problems currently in the core code, but I
see no reason to not make that safer in the long run.  And the
structure is flexible enough, so that's not an issue.

> Also, I wonder if this shouldn't be unified with the SSLerrmessage()
> support found in be-secure-openssl.c and fe-secure-openssl.c.

Guess so.  HEAD could be poked at for this part.  I recall looking at
that once by that did not seem worth the complications.

I have also looked at the ABI part of the patch.  I cannot spot any
public trace of pg_md5_hash() and pg_md5_binary().  pgbouncer and
pgpool2 have each a copy of pg_md5_encrypt(), but they just share the
API name with PG, nothing more.  So that looks reasonably safe to
change.

The last thing I have on my notes is to assign logdetail in
md5_crypt_verify() and plain_crypt_verify() to feed back a LOG entry
to the postmaster on those failures, and saw that it is safe to assign
directly the error returned by the cryptohash APIs, avoiding the
extra psprintf call that could become an issue under memory pressure.

What do you think?
--
Michael

Attachment

Re: md5 issues Postgres14 on OL7

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Thu, Jan 06, 2022 at 11:40:04AM -0500, Tom Lane wrote:
>> Also, I wonder if this shouldn't be unified with the SSLerrmessage()
>> support found in be-secure-openssl.c and fe-secure-openssl.c.

> Guess so.  HEAD could be poked at for this part.  I recall looking at
> that once by that did not seem worth the complications.

Yeah, I suppose there's only a couple lines of code to be saved,
and the complexity of dealing with multiple memory allocation
conventions would outweigh that.

> What do you think?

Hm, you still have cast-away-const in md5_crypt_verify and
plain_crypt_verify.  Can we adjust their APIs to make them
return const char * as well (and then their API spec is that
the caller must never free the string, rather than being
vague about it)?

The other thing that bothers me slightly is that it looks like
some code paths could end up passing a NULL string pointer to
ereport or sprintf, since you don't positively guarantee that
an error will return a string there.  I suppose this is safe
since 3779ac62d, but I don't really want to start making API
specs depend on it.

            regards, tom lane



Re: md5 issues Postgres14 on OL7

From
Michael Paquier
Date:
On Fri, Jan 07, 2022 at 05:40:09PM -0500, Tom Lane wrote:
> Hm, you still have cast-away-const in md5_crypt_verify and
> plain_crypt_verify.  Can we adjust their APIs to make them
> return const char * as well (and then their API spec is that
> the caller must never free the string, rather than being
> vague about it)?

Okay.  Hmm.  This requires a couple of extra const markers in the area
of authentication and SASL for the backend, but not much actually.
I thought first that it would be more invasive.

> The other thing that bothers me slightly is that it looks like
> some code paths could end up passing a NULL string pointer to
> ereport or sprintf, since you don't positively guarantee that
> an error will return a string there.  I suppose this is safe
> since 3779ac62d, but I don't really want to start making API
> specs depend on it.

Sounds fair to me in the long term, even for non-assert builds.  I
have added a small-ish wrapper routine in crytohash_openssl.c for this
purpose, with a name copied from {be,fe}-secure-openssl.c to ease any
future refactoring lookups if that proves to be worth in the future.
--
Michael

Attachment

Re: md5 issues Postgres14 on OL7

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> [ v3-0001-Improve-error-reporting-for-cryptohashes.patch ]

This is looking pretty solid to me.  Just a couple of nitpicks:

* In most places you initialize variables holding error strings to NULL:

+    const char *logdetail = NULL;

but there are three or so spots that don't, eg PerformRadiusTransaction.
They should be consistent.  (Even if there's no actual bug, I'd be
unsurprised to see Coverity carp about the inconsistency.)

* The comments for md5_crypt_verify and plain_crypt_verify claim that
the error string is "optionally" stored, but I don't see anything
optional about it.  To me, "optional" would imply coding like

    if (logdetail)
        *logdetail = errstr;

which we don't have here, and I don't think we need it.  But the
comments need adjustment.  (They were wrong before too, but no
time like the present to clean them up.)

* I'd be inclined to just drop the existing comments like

-     * We do not bother setting logdetail for any pg_md5_encrypt failure
-     * below: the only possible error is out-of-memory, which is unlikely, and
-     * if it did happen adding a psprintf call would only make things worse.

rather than modify them.   Neither the premise nor the conclusion
of these comments is accurate anymore.  (I think that the psprintf
they are talking about is the one that will happen inside elog.c
to construct an errdetail string.  Yeah, there's some risk there,
but I think it's minimal because of the fact that we preallocate
some space in ErrorContext.)

Other than those things, I think v3 is good to go.

            regards, tom lane



Re: md5 issues Postgres14 on OL7

From
Michael Paquier
Date:
On Sat, Jan 08, 2022 at 02:00:16PM -0500, Tom Lane wrote:
> This is looking pretty solid to me.  Just a couple of nitpicks:
>
> * In most places you initialize variables holding error strings to NULL:
>
> +    const char *logdetail = NULL;
>
> but there are three or so spots that don't, eg PerformRadiusTransaction.
> They should be consistent.  (Even if there's no actual bug, I'd be
> unsurprised to see Coverity carp about the inconsistency.)

Hmm.  I have spotted five of them, with one in passwordcheck.

> * The comments for md5_crypt_verify and plain_crypt_verify claim that
> the error string is "optionally" stored, but I don't see anything
> optional about it.  To me, "optional" would imply coding like
>
>     if (logdetail)
>         *logdetail = errstr;
>
> which we don't have here, and I don't think we need it.  But the
> comments need adjustment.  (They were wrong before too, but no
> time like the present to clean them up.)

Makes sense.

> * I'd be inclined to just drop the existing comments like
>
> -     * We do not bother setting logdetail for any pg_md5_encrypt failure
> -     * below: the only possible error is out-of-memory, which is unlikely, and
> -     * if it did happen adding a psprintf call would only make things worse.
>
> rather than modify them.   Neither the premise nor the conclusion
> of these comments is accurate anymore.  (I think that the psprintf
> they are talking about is the one that will happen inside elog.c
> to construct an errdetail string.  Yeah, there's some risk there,
> but I think it's minimal because of the fact that we preallocate
> some space in ErrorContext.)

Okay, that's fine by me.

> Other than those things, I think v3 is good to go.

I have done an extra pass on all that, and the result seemed fine to
me, so applied.  I have changed the non-OpenSSL code path of pgcrypto
to deal with that in 14 (does not exist on HEAD).  Thanks a lot for
the successive reviews!

The patch was invasive enough, but we could do more here:
- Add the same facility for HMAC.  That's not worth on REL_14_STABLE
based on the existing set of callers, but I'd like to do something
about that on HEAD as that could be helpful in the future.
- The error areas related to checksum_helper.c and backup_manifest.c
could be improved more.  Now these refer only to scenarios unlikely
going to happen in the field, so I have left that out.
--
Michael

Attachment