Thread: Patch: Don't set LoadedSSL unless secure_initialize succeeds
The initialization in PostmasterMain() blindly turns on LoadedSSL, irrespective of the outcome of secure_initialize(). I don't think that's how it should behave, primarily because of the pattern followed by the other places that call secure_initialize(). This patch makes PostmasterMain() behave identical to other places (SIGHUP handler, and SubPostmasterMain()) where LoadedSSL is turned on after checking success of secure_initialize() call. Upon failure of secure_initialize(), it now emits a log message, instead of setting LoadedSSL to true. Best regards, Gurjeet http://Gurje.et
Attachment
> On 22 May 2022, at 08:41, Gurjeet Singh <gurjeet@singh.im> wrote: > The initialization in PostmasterMain() blindly turns on LoadedSSL, > irrespective of the outcome of secure_initialize(). This call is invoked with isServerStart set to true so any error in secure_initialize should error out with ereport FATAL (in be_tls_init()). That could be explained in a comment though, which is currently isn't. Did you manage to get LoadedSSL set to true without SSL having been properly initialized? -- Daniel Gustafsson https://vmware.com/
On Sun, May 22, 2022 at 09:17:37AM +0200, Daniel Gustafsson wrote: > This call is invoked with isServerStart set to true so any error in > secure_initialize should error out with ereport FATAL (in be_tls_init()). That > could be explained in a comment though, which is currently isn't. All the inner routines of be_tls_init() would pull out a FATAL "goto error", and it does not look like we have a hole here, so I am a bit surprised by what's proposed, TBH. -- Michael
Attachment
Daniel Gustafsson <daniel@yesql.se> writes: >> On 22 May 2022, at 08:41, Gurjeet Singh <gurjeet@singh.im> wrote: >> The initialization in PostmasterMain() blindly turns on LoadedSSL, >> irrespective of the outcome of secure_initialize(). > This call is invoked with isServerStart set to true so any error in > secure_initialize should error out with ereport FATAL (in be_tls_init()). That > could be explained in a comment though, which is currently isn't. The comments for secure_initialize() and be_tls_init() both explain this already. It's not great that be_tls_init() implements two different error handling behaviors, perhaps. One could imagine separating those. But we've pretty much bought into such messes with the very fact that elog/ereport sometimes return and sometimes not. regards, tom lane
On Sun, May 22, 2022 at 12:17 AM Daniel Gustafsson <daniel@yesql.se> wrote: > > On 22 May 2022, at 08:41, Gurjeet Singh <gurjeet@singh.im> wrote: > > > The initialization in PostmasterMain() blindly turns on LoadedSSL, > > irrespective of the outcome of secure_initialize(). > > This call is invoked with isServerStart set to true so any error in > secure_initialize should error out with ereport FATAL (in be_tls_init()). That > could be explained in a comment though, which is currently isn't. That makes sense. I have attached a patch that adds a couple of lines of comments explaining this at call-site. > Did you manage to get LoadedSSL set to true without SSL having been properly > initialized? Fortunately, no. I'm trying to add a new network protocol, and caught this inconsistency while reading/adapting the code. If a committer sees some value in it, in the attached patch I have also attempted to improve the readability of the code a little bit in startup-packet handling, and in SSL functions. These are purely cosmetic changes, but I think they reduce repetition, and improve readability, by quite a bit. For example, every ereport call evaluates the same 'isServerStart ? FATAL : LOG', over and over again; replacing this with variable 'logLevel' reduces cognitive load for the reader. And I've replace one 'goto retry1' with a 'while' loop, like the GASSAPI does it below that occurrence. There's an symmetry, almost a diametric opposition, between how SSL initialization error is treated when it occurs during server startup, versus when the error occurs during a reload/SIGHUP. During startup an error in SSL initialization leads to FATAL, whereas during a SIGHUP it's merely a LOG message. I found this difference in treatment of SSL initialization errors quite bothersome, and there is no ready explanation for this. Either a properly initialized SSL stack is important for server operation, or it is not. What do we gain by letting the server operate normally after a reload that failed to initialize SSL stack. Conversely, why do we kill the server during startup on SSL initialization error, when it's okay to operate normally after a reload that is unable to initialize SSL. I have added a comment to be_tls_init(), which I hope explains this difference in treatment of errors. I have also added comments to be_tls_init(), explaining why we don't destroy/free the global SSL_context variable in case of an error in re-initialization of SSL; it's not just an optimization, it's essential to normal server operation. Best regards, Gurjeet http://Gurje.et
Attachment
On Mon, May 23, 2022 at 8:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Daniel Gustafsson <daniel@yesql.se> writes: > >> On 22 May 2022, at 08:41, Gurjeet Singh <gurjeet@singh.im> wrote: > >> The initialization in PostmasterMain() blindly turns on LoadedSSL, > >> irrespective of the outcome of secure_initialize(). > > > This call is invoked with isServerStart set to true so any error in > > secure_initialize should error out with ereport FATAL (in be_tls_init()). That > > could be explained in a comment though, which is currently isn't. > > The comments for secure_initialize() and be_tls_init() both explain > this already. The comments above secure_initialize() do, but there are no comments above be_tls_init(), and nothing in there attempts to explain the FATAL vs. LOG difference. I think it doesn't hurt, and may actively help, if we add a comment at the call-site just alluding to the fact that the function call will not return in case of an error, and that it's safe to assume certain state has been initialized if the function returns. The comments *inside* be_tls_init() attempt to explain allocation of a new SSL_context, but end up making it sound like it's an optimization to prevent memory leak. In the patch proposed a few minutes ago in this thread, I have tried to explain above be_tls_init() the error handling behavior, as well as the reason to retain the active SSL_context, if any. > It's not great that be_tls_init() implements two different error > handling behaviors, perhaps. One could imagine separating those. > But we've pretty much bought into such messes with the very fact > that elog/ereport sometimes return and sometimes not. I don't find the dual mode handling startling; I feel it's common in Postgres code, but it's been a while since I've touched it. What I would love to see improved around ereport() calls in SSL functions would be to shrink the 'ereport(); goto error;' pattern into one statement, so that we don't introduce an accidental "goto fail" bug [1]. [1]: https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-goto-fail-apples-ssl-bug-explained-plus-an-unofficial-patch/ Best regards, Gurjeet http://Gurje.et
On Wed, May 25, 2022 at 10:05 PM Gurjeet Singh <gurjeet@singh.im> wrote: > I have added a comment to be_tls_init(), which I hope explains this > difference in treatment of errors. I have also added comments to > be_tls_init(), explaining why we don't destroy/free the global > SSL_context variable in case of an error in re-initialization of SSL; > it's not just an optimization, it's essential to normal server > operation. Please see attached patch that reverts the unintentional removal of a comment in be_tls_init(). Forgot to put that comment back in after my edits. Best regards, Gurjeet http://Gurje.et
Attachment
Gurjeet Singh <gurjeet@singh.im> writes: > On Mon, May 23, 2022 at 8:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The comments for secure_initialize() and be_tls_init() both explain >> this already. > The comments above secure_initialize() do, but there are no comments > above be_tls_init(), and nothing in there attempts to explain the > FATAL vs. LOG difference. I was looking at the comments in libpq-be.h: /* * Initialize global SSL context. * * If isServerStart is true, report any errors as FATAL (so we don't return). * Otherwise, log errors at LOG level and return -1 to indicate trouble, * preserving the old SSL state if any. Returns 0 if OK. */ extern int be_tls_init(bool isServerStart); It isn't our usual practice to put such API comments with the extern rather than the function definition, so maybe those comments in libpq-be.h should be moved to their respective functions? In any case, I'm not excited about having three separate comments covering the same point. regards, tom lane
On Thu, May 26, 2022 at 1:05 AM Gurjeet Singh <gurjeet@singh.im> wrote: > There's an symmetry, almost a diametric opposition, between how SSL > initialization error is treated when it occurs during server startup, > versus when the error occurs during a reload/SIGHUP. During startup an > error in SSL initialization leads to FATAL, whereas during a SIGHUP > it's merely a LOG message. > > I found this difference in treatment of SSL initialization errors > quite bothersome, and there is no ready explanation for this. Either a > properly initialized SSL stack is important for server operation, or > it is not. What do we gain by letting the server operate normally > after a reload that failed to initialize SSL stack. Conversely, why do > we kill the server during startup on SSL initialization error, when > it's okay to operate normally after a reload that is unable to > initialize SSL. I think you're overreacting to a behavior that isn't really very surprising. If we don't initialize SSL the first time, we don't have a working SSL stack. If we didn't choose to die at that point, we'd be starting up a server that could not accept any SSL connections. I don't think users would like that. If we don't *reinitialize* it, we *do* have a working SSL stack. We haven't been able to load the updated configuration, but we still have the old one. We could fall over and die anyway, but I don't think users would like that either. People don't expect 'pg_ctl reload' to kill off a working server, even if the new configuration is bad. So I don't really know what behavior, other than what is actually implemented, would be reasonable. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, May 26, 2022 at 12:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Gurjeet Singh <gurjeet@singh.im> writes: > > On Mon, May 23, 2022 at 8:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> The comments for secure_initialize() and be_tls_init() both explain > >> this already. > > > The comments above secure_initialize() do, but there are no comments > > above be_tls_init(), and nothing in there attempts to explain the > > FATAL vs. LOG difference. > > I was looking at the comments in libpq-be.h: > > /* > * Initialize global SSL context. > * > * If isServerStart is true, report any errors as FATAL (so we don't return). > * Otherwise, log errors at LOG level and return -1 to indicate trouble, > * preserving the old SSL state if any. Returns 0 if OK. > */ > extern int be_tls_init(bool isServerStart); > > It isn't our usual practice to put such API comments with the extern > rather than the function definition, Yep, and I didn't notice these comments, or even bothered to look at the extern declaration, precisely because my knowledge of Postgres coding convention told me the comments are supposed to be on the function definition. > so maybe those comments in libpq-be.h > should be moved to their respective functions? In any case, I'm not > excited about having three separate comments covering the same point. By 3 locations, I suppose you're referring to the definition of secure_initialize(), extern declaration of be_tls_init(), and the definition of be_tls_init(). The comment on the extern declaration does not belong there, so that one definitely needs go. The other two locations need descriptive comments, even if they might sound duplicate. Because the one on secure_initialize() describes the abstraction's expectations, and the one on be_tls_init() should refer to it, and additionally mention any implementation details. Best regards, Gurjeet http://Gurje.et
On Thu, May 26, 2022 at 1:00 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, May 26, 2022 at 1:05 AM Gurjeet Singh <gurjeet@singh.im> wrote: > > There's an symmetry, almost a diametric opposition, between how SSL I meant "an asymmetry". > > initialization error is treated when it occurs during server startup, > > versus when the error occurs during a reload/SIGHUP. During startup an > > error in SSL initialization leads to FATAL, whereas during a SIGHUP > > it's merely a LOG message. > > > > I found this difference in treatment of SSL initialization errors > > quite bothersome, and there is no ready explanation for this. Either a > > properly initialized SSL stack is important for server operation, or > > it is not. What do we gain by letting the server operate normally > > after a reload that failed to initialize SSL stack. Conversely, why do > > we kill the server during startup on SSL initialization error, when > > it's okay to operate normally after a reload that is unable to > > initialize SSL. > > I think you're overreacting to a behavior that isn't really very surprising. The behaviour is not surprising. I developed those opposing views as I was reading the code. And I understood the behaviour after I was done reading the code. But I was irked that it wasn't clearly explained somewhere nearby in code. Hence my proposal: > > I have added a comment to be_tls_init(), which I hope explains this > > difference in treatment of errors. > So I don't really know what behavior, other than what is actually > implemented, would be reasonable. I just wasn't happy about the fact that I had wasted time trying to find holes (security holes!) in the behaviour. So my proposal is to improve the docs/comments about this behaviour, and not the behaviour itself. Best regards, Gurjeet http://Gurje.et
Robert Haas <robertmhaas@gmail.com> writes: > I think you're overreacting to a behavior that isn't really very surprising. > If we don't initialize SSL the first time, we don't have a working SSL > stack. If we didn't choose to die at that point, we'd be starting up a > server that could not accept any SSL connections. I don't think users > would like that. > If we don't *reinitialize* it, we *do* have a working SSL stack. We > haven't been able to load the updated configuration, but we still have > the old one. We could fall over and die anyway, but I don't think > users would like that either. People don't expect 'pg_ctl reload' to > kill off a working server, even if the new configuration is bad. The larger context here is that this is (or at least is supposed to be) exactly the same as our reaction to any other misconfiguration: die if it's detected at server start, but if it's detected during a later SIGHUP reload, soldier on with the known-good previous settings. I believe that's fairly well documented already. regards, tom lane
Gurjeet Singh <gurjeet@singh.im> writes: > On Thu, May 26, 2022 at 12:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> so maybe those comments in libpq-be.h >> should be moved to their respective functions? In any case, I'm not >> excited about having three separate comments covering the same point. > By 3 locations, I suppose you're referring to the definition of > secure_initialize(), extern declaration of be_tls_init(), and the > definition of be_tls_init(). No, I was counting the third comment as the one you proposed to add to secure_initialize's caller. I think it's not a great idea to add such comments to call sites, as they're very likely to not get maintained when somebody adjusts the API of the function. (We have a hard enough time getting people to update the comments directly next to the function :-(.) I think what we ought to do here is just move the oddly-placed comments in libpq-be.h to be adjacent to the function definitions, as attached. (I just deleted the .h comments for the GSSAPI functions, as they seem to have adequate comments in their .c file already.) regards, tom lane diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 3d0168a369..85ce9a4ee0 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -85,6 +85,13 @@ static const char *ssl_protocol_version_to_string(int v); /* Public interface */ /* ------------------------------------------------------------ */ +/* + * Initialize global SSL context. + * + * If isServerStart is true, report any errors as FATAL (so we don't return). + * Otherwise, log errors at LOG level and return -1 to indicate trouble, + * preserving the old SSL state if any. Returns 0 if OK. + */ int be_tls_init(bool isServerStart) { @@ -397,6 +404,9 @@ error: return -1; } +/* + * Destroy global SSL context, if any. + */ void be_tls_destroy(void) { @@ -406,6 +416,9 @@ be_tls_destroy(void) ssl_loaded_verify_locations = false; } +/* + * Attempt to negotiate SSL connection. + */ int be_tls_open_server(Port *port) { @@ -659,6 +672,9 @@ aloop: return 0; } +/* + * Close SSL connection. + */ void be_tls_close(Port *port) { @@ -689,6 +705,9 @@ be_tls_close(Port *port) } } +/* + * Read data from a secure connection. + */ ssize_t be_tls_read(Port *port, void *ptr, size_t len, int *waitfor) { @@ -748,6 +767,9 @@ be_tls_read(Port *port, void *ptr, size_t len, int *waitfor) return n; } +/* + * Write data to a secure connection. + */ ssize_t be_tls_write(Port *port, void *ptr, size_t len, int *waitfor) { @@ -1249,6 +1271,10 @@ SSLerrmessage(unsigned long ecode) return errbuf; } +/* + * The following functions return various information about the SSL connection. + */ + int be_tls_get_cipher_bits(Port *port) { @@ -1320,6 +1346,16 @@ be_tls_get_peer_serial(Port *port, char *ptr, size_t len) ptr[0] = '\0'; } +/* + * Get the server certificate hash for SCRAM channel binding type + * tls-server-end-point. + * + * The result is a palloc'd hash of the server certificate with its + * size, and NULL if there is no certificate available. + * + * This is not supported with old versions of OpenSSL that don't have + * the X509_get_signature_nid() function. + */ #ifdef HAVE_X509_GET_SIGNATURE_NID char * be_tls_get_certificate_hash(Port *port, size_t *len) diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h index 90c20da22b..fccdb6a586 100644 --- a/src/include/libpq/libpq-be.h +++ b/src/include/libpq/libpq-be.h @@ -246,43 +246,12 @@ fDKQXkYuNs474553LBgOhgObJ4Oi7Aeij7XFXfBvTFLJ3ivL9pVYFxg5lUl86pVq\n\ * SSL implementation (e.g. be-secure-openssl.c) */ -/* - * Initialize global SSL context. - * - * If isServerStart is true, report any errors as FATAL (so we don't return). - * Otherwise, log errors at LOG level and return -1 to indicate trouble, - * preserving the old SSL state if any. Returns 0 if OK. - */ extern int be_tls_init(bool isServerStart); - -/* - * Destroy global SSL context, if any. - */ extern void be_tls_destroy(void); - -/* - * Attempt to negotiate SSL connection. - */ extern int be_tls_open_server(Port *port); - -/* - * Close SSL connection. - */ extern void be_tls_close(Port *port); - -/* - * Read data from a secure connection. - */ extern ssize_t be_tls_read(Port *port, void *ptr, size_t len, int *waitfor); - -/* - * Write data to a secure connection. - */ extern ssize_t be_tls_write(Port *port, void *ptr, size_t len, int *waitfor); - -/* - * Return information about the SSL connection. - */ extern int be_tls_get_cipher_bits(Port *port); extern const char *be_tls_get_version(Port *port); extern const char *be_tls_get_cipher(Port *port); @@ -290,16 +259,6 @@ extern void be_tls_get_peer_subject_name(Port *port, char *ptr, size_t len); extern void be_tls_get_peer_issuer_name(Port *port, char *ptr, size_t len); extern void be_tls_get_peer_serial(Port *port, char *ptr, size_t len); -/* - * Get the server certificate hash for SCRAM channel binding type - * tls-server-end-point. - * - * The result is a palloc'd hash of the server certificate with its - * size, and NULL if there is no certificate available. - * - * This is not supported with old versions of OpenSSL that don't have - * the X509_get_signature_nid() function. - */ #if defined(USE_OPENSSL) && defined(HAVE_X509_GET_SIGNATURE_NID) #define HAVE_BE_TLS_GET_CERTIFICATE_HASH extern char *be_tls_get_certificate_hash(Port *port, size_t *len); @@ -314,16 +273,13 @@ extern PGDLLIMPORT openssl_tls_init_hook_typ openssl_tls_init_hook; #endif /* USE_SSL */ #ifdef ENABLE_GSS -/* - * Return information about the GSSAPI authenticated connection - */ + extern bool be_gssapi_get_auth(Port *port); extern bool be_gssapi_get_enc(Port *port); extern const char *be_gssapi_get_princ(Port *port); - -/* Read and write to a GSSAPI-encrypted connection. */ extern ssize_t be_gssapi_read(Port *port, void *ptr, size_t len); extern ssize_t be_gssapi_write(Port *port, void *ptr, size_t len); + #endif /* ENABLE_GSS */ extern PGDLLIMPORT ProtocolVersion FrontendProtocol;
On Thu, May 26, 2022 at 2:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > I think you're overreacting to a behavior that isn't really very surprising. > > > If we don't initialize SSL the first time, we don't have a working SSL > > stack. If we didn't choose to die at that point, we'd be starting up a > > server that could not accept any SSL connections. I don't think users > > would like that. > > > If we don't *reinitialize* it, we *do* have a working SSL stack. We > > haven't been able to load the updated configuration, but we still have > > the old one. We could fall over and die anyway, but I don't think > > users would like that either. People don't expect 'pg_ctl reload' to > > kill off a working server, even if the new configuration is bad. > > The larger context here is that this is (or at least is supposed to be) > exactly the same as our reaction to any other misconfiguration: die if > it's detected at server start, but if it's detected during a later SIGHUP > reload, soldier on with the known-good previous settings. I believe > that's fairly well documented already. This distinction (of server startup vs. reload) is precisely what I think should be conveyed and addressed in the comments of functions responsible for (re)initialization of resources. Such a comment, specifically calling out processing/logging/error-handling differences between startup and reload, would've definitely helped when I was trying to understand the code, and trying to figure out the different contexts these functions may be executed in. The fact that ProcessStartupPacket() function calls these functions, and then also calls itself recursively, made understanding the code and intent much harder. And since variable/parameter/function names also convey intent, their naming should also be as explicit as possible, rather than being vague. Calling variables/parameters 'isServerStart' leaves so much to interpretation; how many and what other cases is the code called in when isServerStart == false? I think a better scheme would've been to name the parameter as 'reason', and use an enum/constant to convey that there are exactly 2 higher-level cases that the code is called in context of: enum InitializationReason { ServerStartup, ServerReload }. In these functions, it's not important to know the distinction of whether the server is starting-up vs. already running (or whatever other states the server may be in) , instead it's important to know the distinction of whether the server is starting-up or being reloaded; other states of the server operation, if any, do not matter here. Best regards, Gurjeet http://Gurje.et
On Thu, May 26, 2022 at 4:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Gurjeet Singh <gurjeet@singh.im> writes: > > On Thu, May 26, 2022 at 12:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> so maybe those comments in libpq-be.h > >> should be moved to their respective functions? In any case, I'm not > >> excited about having three separate comments covering the same point. > > > By 3 locations, I suppose you're referring to the definition of > > secure_initialize(), extern declaration of be_tls_init(), and the > > definition of be_tls_init(). > > No, I was counting the third comment as the one you proposed to add to > secure_initialize's caller. I think a comment at that call-site is definitely warranted. Consider the code as it right now... if (EnableSSL) { (void) secure_initialize(true); LoadedSSL = true; } ... as a first-time reader. Reader> This is an important piece of code, not just because it is called from PostmasterMain(), early in the startup process, but also because it deals with SSL; it has 'SSL' and 'secure' plastered all over it. But wait, they don't care what the outcome of this important function call is??!! I might not have paid much attention to it, but the call is adorned with '(void)', which (i) attracts my attention more, and (ii) why are they choosing to throw away the result of such important function call?! And then, they declare SSL has been "Loaded"... somebody committed half-written code! Perhaps they we in a hurry. Perhaps this is a result of an automatic git-merge gone wrong. Let me dig through the code and see if I can find a vulnerability. <Many hours later, after learning about Postgres' weird ereport/error-handling, startup vs. reload, getting bashed on IRC or elsewhere> Duh, there's nothing wrong here. <Moves on>. Now, consider the same code, and the ensuing thought-process of the reader: if (EnableSSL) { /* Any failure during SSL initialization here will result in FATAL error. */ (void) secure_initialize(true); /* ... so here we know for sure that SSL was successfully initialized. */ LoadedSSL = true; } Reader> This is an important piece of code, not just because it is called from PostmasterMain(), early in the startup process, but also because it deals with SSL; it has 'SSL' and 'secure' plastered all over it. But wait, they don't care what the outcome of this important function call is??!! That's okay, because the explanation in the comment makes sense. <Learns about special-case handling of FATAL and above> There's nothing wrong here. <Moves on>. > I think it's not a great idea to add such > comments to call sites, as they're very likely to not get maintained > when somebody adjusts the API of the function. (We have a hard enough > time getting people to update the comments directly next to the > function :-(.) That's unfortunate. But I think we should continue to strive for more maintainable, readable, extensible code. > I think what we ought to do here is just move the oddly-placed comments > in libpq-be.h to be adjacent to the function definitions, as attached. > (I just deleted the .h comments for the GSSAPI functions, as they seem > to have adequate comments in their .c file already.) Please see if anything from my patches is usable. I did not get clarity around startup vs. reload handling until my previous email, so there may not be much of use in my patches. But I think a few words mentioning the difference in resource (re)initialization during startup vs reload would add a lot of value. Best regards, Gurjeet http://Gurje.et