Thread: Refactor MD5 implementations and switch to EVP for OpenSSL
Hi all, Please find attached a patch set to sanitize the use of MD5 we have in the core tree. As of now, there are two duplicated implementations of MD5, one in contrib/pgcrypto/ used as a fallback when not compiling with OpenSSL, and one in src/common/ used by the backend when compiling with *or* without OpenSSL. This is bad on several aspects: - There is no need to have the same implementation twice, obviously. - When compiling with OpenSSL, we use an incorrect implementation, causing Postgres to cheat if FIPS is enabled because MD5 should not be authorized. Making use of what OpenSSL provides with EVP allows us to rely on OpenSSL to control such restrictions. So we authorize MD5 authentications while these should be blocked, making Postgres not completely compliant with STIG and its kind. The attached patch set does a bit of rework to make the Postgres code more consistent with OpenSSL, similarly to the work I did for all the SHA2 implementations with EVP in [1]: - 0001 is something stolen from the SHA2 set, adding to resowner.c control of EVP contexts, so as it is possible to clean up anything allocated by OpenSSL. - 0002 is the central piece, that moves the duplicated implementation. src/common/ and pgcrypto/ use the same code, but I have reused pgcrypto as it was already doing the init/update/final split similarly to PostgreSQL. New APIs are designed to control MD5 contexts, similarly to the work done for SHA2. Upon using this patch, note that pgcrypto+OpenSSL uses our in-core implementation instead of OpenSSL's one, but that's fixed in 0003. We have a set of three convenience routines used to generate MD5-hashed passwords, that I have moved to a new file in src/common/md5_common.c, aimed at being shared between all the implementations. - 0003 adds the MD5 implementation based on OpenSSL's EVP, ending the work. This set of patches is independent on the SHA2 refactoring, even if it shares a part with the SHA2 refactoring in its design. Note that 0001 and 0002 don't depend on each other, but 0003 depends on both. Thanks, [1]: https://www.postgresql.org/message-id/20200924025314.GE7405@paquier.xyz -- Michael
Attachment
On Fri, Nov 06, 2020 at 04:34:34PM +0900, Michael Paquier wrote: > The attached patch set does a bit of rework to make the Postgres code > more consistent with OpenSSL, similarly to the work I did for all the > SHA2 implementations with EVP in [1]: > - 0001 is something stolen from the SHA2 set, adding to resowner.c > control of EVP contexts, so as it is possible to clean up anything > allocated by OpenSSL. > - 0002 is the central piece, that moves the duplicated > implementation. src/common/ and pgcrypto/ use the same code, but I > have reused pgcrypto as it was already doing the init/update/final > split similarly to PostgreSQL. New APIs are designed to control MD5 > contexts, similarly to the work done for SHA2. Upon using this patch, > note that pgcrypto+OpenSSL uses our in-core implementation instead of > OpenSSL's one, but that's fixed in 0003. We have a set of three > convenience routines used to generate MD5-hashed passwords, that I > have moved to a new file in src/common/md5_common.c, aimed at being > shared between all the implementations. > - 0003 adds the MD5 implementation based on OpenSSL's EVP, ending the > work. The CF bot has been complaining on Windows and this issue is fixed in the attached. A refresh of src/tools/msvc for pgcrypto was just missing. -- Michael
Attachment
On Tue, Nov 10, 2020 at 01:28:09PM +0900, Michael Paquier wrote: > The CF bot has been complaining on Windows and this issue is fixed in > the attached. A refresh of src/tools/msvc for pgcrypto was just > missing. Now that HEAD has the necessary infrastructure to be able to plug in easily new cryptohash routines, here is a rebased patch for MD5. The basics are unchanged. Here is a summary: - The duplication with MD5 implementations (pgcrypto, src/common/) is removed, and gets only used when not building with OpenSSL. - MD5 uses EVP when building with OpenSSL. - Similarly to SHA2, the fallback implementation of MD5 is kept internal to src/common/, with an internal header called md5_int.h. The routines for init, update and final calls are similar to the SHA2 equivalents, making the changes of cryptohash.c straight-forward. The amount of code shaved is still nice: 13 files changed, 641 insertions(+), 775 deletions(-) -- Michael
Attachment
> On 4 Dec 2020, at 08:05, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Nov 10, 2020 at 01:28:09PM +0900, Michael Paquier wrote: >> The CF bot has been complaining on Windows and this issue is fixed in >> the attached. A refresh of src/tools/msvc for pgcrypto was just >> missing. > > Now that HEAD has the necessary infrastructure to be able to plug in > easily new cryptohash routines, here is a rebased patch for MD5. This is a pretty straightforward patch which given the cryptohash framework added previously does what it says on the tin. +1 on rolling MD5 into what we already have for SHA2 now. Applies clean and all tests pass. One (two-part) comment on the patch though: The re-arrangement of the code does lead to some attribution confusion however IMO. pgcrypto/md5.c and src/common/md5.c are combined with the actual MD5 implementation from pgcrypto/md5.c and the PG glue code from src/common/md5.c. That is in itself a good choice, but the file headers are intact and now claims two implementations of which only one remains. Further, bytesToHex was imported in commit 957613be18e6b7 with attribution to "Sverre H. Huseby <sverrehu@online.no>" without a PGDG defined copyright, which was later added in 2ca65f716aee9ec441dda91d91b88dd7a00bffa1. This patch moves bytesToHex from md5.c (where his attribution is) to md5_common.c with no maintained attribution. Off the cuff it seems we should either attribute in a comment, or leave the function and export it, but the usual "I'm not a lawyer" disclaimer applies. Do you have any thoughts? > The amount of code shaved is still nice: > 13 files changed, 641 insertions(+), 775 deletions(-) Always nice with a net minus patch with sustained functionality. cheers ./daniel
On Mon, Dec 07, 2020 at 02:15:58PM +0100, Daniel Gustafsson wrote: > This is a pretty straightforward patch which given the cryptohash framework > added previously does what it says on the tin. +1 on rolling MD5 into what we > already have for SHA2 now. Applies clean and all tests pass. Thanks. > One (two-part) comment on the patch though: > > The re-arrangement of the code does lead to some attribution confusion however > IMO. pgcrypto/md5.c and src/common/md5.c are combined with the actual MD5 > implementation from pgcrypto/md5.c and the PG glue code from src/common/md5.c. > That is in itself a good choice, but the file headers are intact and now claims > two implementations of which only one remains. I was pretty sure that at some point I got the attributions messed up. Thanks for looking at this stuff and ringing the bell. > Further, bytesToHex was imported in commit 957613be18e6b7 with attribution to > "Sverre H. Huseby <sverrehu@online.no>" without a PGDG defined copyright, > which was later added in 2ca65f716aee9ec441dda91d91b88dd7a00bffa1. This patch > moves bytesToHex from md5.c (where his attribution is) to md5_common.c with no > maintained attribution. Off the cuff it seems we should either attribute in a > comment, or leave the function and export it, but the usual "I'm not a lawyer" > disclaimer applies. Do you have any thoughts? Hmm. Looking at this area of this history, like d330f155, I think that what we just need to do is to switch the attribution of Sverre to md5_common.c for all the sub-functions dedicated to the generation of the MD5 passwords by fixing the header comment of the file as this is the remaining code coming from the file where this code was. In the new md5.c and md5_int.h, the fallback implementation that we get to use is the KAME one from pgcrypto so we should just mention KAME there. I have spent some time double-checking all this stuff, adjusting some comments, and making the style of the new files more consistent with the surroundings while minimizing the amount of noise diffs (pgindent has adjusted some things by itself for the new files). In short, this seems rather committable to me. -- Michael
Attachment
> On 9 Dec 2020, at 06:47, Michael Paquier <michael@paquier.xyz> wrote: > In short, this seems rather committable to me. Agreed, this version of the patch looks good to me. I've looked over the attributions for the code movement and it seems to match now, and all tests pass etc. +1 on going ahead with this version. The tiniest level of nitpicking would be that md5.h doesn't have the /* PG_MD5_H */ comment on the #endif line (and it's even tinier since it's not even the fault of this patch - I just happened to notice when looking at that file). cheers ./daniel
On Wed, Dec 09, 2020 at 01:52:32PM +0100, Daniel Gustafsson wrote: > The tiniest level of nitpicking would be that md5.h doesn't have the > /* PG_MD5_H */ comment on the #endif line (and it's even tinier since it's > not even the fault of this patch - I just happened to notice when looking > at that file). Good catch. I have fixed this one, looked again at the code this morning, did more tests on Linux/Windows with/without OpenSSL, and finally committed the patch. -- Michael