Thread: Refactor MD5 implementations and switch to EVP for OpenSSL

Refactor MD5 implementations and switch to EVP for OpenSSL

From
Michael Paquier
Date:
Hi all,

Please find attached a patch set to sanitize the use of MD5 we have in
the core tree.

As of now, there are two duplicated implementations of MD5, one in
contrib/pgcrypto/ used as a fallback when not compiling with OpenSSL,
and one in src/common/ used by the backend when compiling with *or*
without OpenSSL.  This is bad on several aspects:
- There is no need to have the same implementation twice, obviously.
- When compiling with OpenSSL, we use an incorrect implementation,
causing Postgres to cheat if FIPS is enabled because MD5 should not be
authorized.  Making use of what OpenSSL provides with EVP allows us to
rely on OpenSSL to control such restrictions.  So we authorize MD5
authentications while these should be blocked, making Postgres not
completely compliant with STIG and its kind.

The attached patch set does a bit of rework to make the Postgres code
more consistent with OpenSSL, similarly to the work I did for all the
SHA2 implementations with EVP in [1]:
- 0001 is something stolen from the SHA2 set, adding to resowner.c
control of EVP contexts, so as it is possible to clean up anything
allocated by OpenSSL.
- 0002 is the central piece, that moves the duplicated
implementation.  src/common/ and pgcrypto/ use the same code, but I
have reused pgcrypto as it was already doing the init/update/final
split similarly to PostgreSQL.  New APIs are designed to control MD5
contexts, similarly to the work done for SHA2.  Upon using this patch,
note that pgcrypto+OpenSSL uses our in-core implementation instead of
OpenSSL's one, but that's fixed in 0003.  We have a set of three
convenience routines used to generate MD5-hashed passwords, that I
have moved to a new file in src/common/md5_common.c, aimed at being
shared between all the implementations.
- 0003 adds the MD5 implementation based on OpenSSL's EVP, ending the
work.

This set of patches is independent on the SHA2 refactoring, even if it
shares a part with the SHA2 refactoring in its design.  Note that 0001
and 0002 don't depend on each other, but 0003 depends on both.

Thanks,

[1]: https://www.postgresql.org/message-id/20200924025314.GE7405@paquier.xyz
--
Michael

Attachment

Re: Refactor MD5 implementations and switch to EVP for OpenSSL

From
Michael Paquier
Date:
On Fri, Nov 06, 2020 at 04:34:34PM +0900, Michael Paquier wrote:
> The attached patch set does a bit of rework to make the Postgres code
> more consistent with OpenSSL, similarly to the work I did for all the
> SHA2 implementations with EVP in [1]:
> - 0001 is something stolen from the SHA2 set, adding to resowner.c
> control of EVP contexts, so as it is possible to clean up anything
> allocated by OpenSSL.
> - 0002 is the central piece, that moves the duplicated
> implementation.  src/common/ and pgcrypto/ use the same code, but I
> have reused pgcrypto as it was already doing the init/update/final
> split similarly to PostgreSQL.  New APIs are designed to control MD5
> contexts, similarly to the work done for SHA2.  Upon using this patch,
> note that pgcrypto+OpenSSL uses our in-core implementation instead of
> OpenSSL's one, but that's fixed in 0003.  We have a set of three
> convenience routines used to generate MD5-hashed passwords, that I
> have moved to a new file in src/common/md5_common.c, aimed at being
> shared between all the implementations.
> - 0003 adds the MD5 implementation based on OpenSSL's EVP, ending the
> work.

The CF bot has been complaining on Windows and this issue is fixed in
the attached.  A refresh of src/tools/msvc for pgcrypto was just
missing.
--
Michael

Attachment

Re: Refactor MD5 implementations and switch to EVP for OpenSSL

From
Michael Paquier
Date:
On Tue, Nov 10, 2020 at 01:28:09PM +0900, Michael Paquier wrote:
> The CF bot has been complaining on Windows and this issue is fixed in
> the attached.  A refresh of src/tools/msvc for pgcrypto was just
> missing.

Now that HEAD has the necessary infrastructure to be able to plug in
easily new cryptohash routines, here is a rebased patch for MD5.  The
basics are unchanged.  Here is a summary:
- The duplication with MD5 implementations (pgcrypto, src/common/) is
removed, and gets only used when not building with OpenSSL.
- MD5 uses EVP when building with OpenSSL.
- Similarly to SHA2, the fallback implementation of MD5 is kept
internal to src/common/, with an internal header called md5_int.h.
The routines for init, update and final calls are similar to the SHA2
equivalents, making the changes of cryptohash.c straight-forward.

The amount of code shaved is still nice:
13 files changed, 641 insertions(+), 775 deletions(-)
--
Michael

Attachment

Re: Refactor MD5 implementations and switch to EVP for OpenSSL

From
Daniel Gustafsson
Date:
> On 4 Dec 2020, at 08:05, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Nov 10, 2020 at 01:28:09PM +0900, Michael Paquier wrote:
>> The CF bot has been complaining on Windows and this issue is fixed in
>> the attached.  A refresh of src/tools/msvc for pgcrypto was just
>> missing.
>
> Now that HEAD has the necessary infrastructure to be able to plug in
> easily new cryptohash routines, here is a rebased patch for MD5.

This is a pretty straightforward patch which given the cryptohash framework
added previously does what it says on the tin.  +1 on rolling MD5 into what we
already have for SHA2 now. Applies clean and all tests pass.

One (two-part) comment on the patch though:

The re-arrangement of the code does lead to some attribution confusion however
IMO.  pgcrypto/md5.c and src/common/md5.c are combined with the actual MD5
implementation from pgcrypto/md5.c and the PG glue code from src/common/md5.c.
That is in itself a good choice, but the file headers are intact and now claims
two implementations of which only one remains.

Further, bytesToHex was imported in commit 957613be18e6b7 with attribution to
"Sverre H.  Huseby <sverrehu@online.no>" without a PGDG defined copyright,
which was later added in 2ca65f716aee9ec441dda91d91b88dd7a00bffa1.  This patch
moves bytesToHex from md5.c (where his attribution is) to md5_common.c with no
maintained attribution.  Off the cuff it seems we should either attribute in a
comment, or leave the function and export it, but the usual "I'm not a lawyer"
disclaimer applies.  Do you have any thoughts?

> The amount of code shaved is still nice:
> 13 files changed, 641 insertions(+), 775 deletions(-)

Always nice with a net minus patch with sustained functionality.

cheers ./daniel


Re: Refactor MD5 implementations and switch to EVP for OpenSSL

From
Michael Paquier
Date:
On Mon, Dec 07, 2020 at 02:15:58PM +0100, Daniel Gustafsson wrote:
> This is a pretty straightforward patch which given the cryptohash framework
> added previously does what it says on the tin.  +1 on rolling MD5 into what we
> already have for SHA2 now. Applies clean and all tests pass.

Thanks.

> One (two-part) comment on the patch though:
>
> The re-arrangement of the code does lead to some attribution confusion however
> IMO.  pgcrypto/md5.c and src/common/md5.c are combined with the actual MD5
> implementation from pgcrypto/md5.c and the PG glue code from src/common/md5.c.
> That is in itself a good choice, but the file headers are intact and now claims
> two implementations of which only one remains.

I was pretty sure that at some point I got the attributions messed up.
Thanks for looking at this stuff and ringing the bell.

> Further, bytesToHex was imported in commit 957613be18e6b7 with attribution to
> "Sverre H.  Huseby <sverrehu@online.no>" without a PGDG defined copyright,
> which was later added in 2ca65f716aee9ec441dda91d91b88dd7a00bffa1.  This patch
> moves bytesToHex from md5.c (where his attribution is) to md5_common.c with no
> maintained attribution.  Off the cuff it seems we should either attribute in a
> comment, or leave the function and export it, but the usual "I'm not a lawyer"
> disclaimer applies.  Do you have any thoughts?

Hmm.  Looking at this area of this history, like d330f155, I think
that what we just need to do is to switch the attribution of Sverre
to md5_common.c for all the sub-functions dedicated to the generation
of the MD5 passwords by fixing the header comment of the file as this
is the remaining code coming from the file where this code was.  In
the new md5.c and md5_int.h, the fallback implementation that we get
to use is the KAME one from pgcrypto so we should just mention KAME
there.

I have spent some time double-checking all this stuff, adjusting some
comments, and making the style of the new files more consistent with
the surroundings while minimizing the amount of noise diffs (pgindent
has adjusted some things by itself for the new files).  In short, this
seems rather committable to me.
--
Michael

Attachment

Re: Refactor MD5 implementations and switch to EVP for OpenSSL

From
Daniel Gustafsson
Date:
> On 9 Dec 2020, at 06:47, Michael Paquier <michael@paquier.xyz> wrote:

> In short, this seems rather committable to me.

Agreed, this version of the patch looks good to me.  I've looked over the
attributions for the code movement and it seems to match now, and all tests
pass etc. +1 on going ahead with this version.

The tiniest level of nitpicking would be that md5.h doesn't have the 
/* PG_MD5_H */ comment on the #endif line (and it's even tinier since it's
not even the fault of this patch - I just happened to notice when looking
at that file).

cheers ./daniel



Re: Refactor MD5 implementations and switch to EVP for OpenSSL

From
Michael Paquier
Date:
On Wed, Dec 09, 2020 at 01:52:32PM +0100, Daniel Gustafsson wrote:
> The tiniest level of nitpicking would be that md5.h doesn't have the
> /* PG_MD5_H */ comment on the #endif line (and it's even tinier since it's
> not even the fault of this patch - I just happened to notice when looking
> at that file).

Good catch.  I have fixed this one, looked again at the code this
morning, did more tests on Linux/Windows with/without OpenSSL, and
finally committed the patch.
--
Michael

Attachment