Thread: [PATCH] Reload SSL certificates on SIGHUP
Hi, I have written a patch which makes it possible to change SSL certificates (and other SSL parameters, including the CRL) without restarting PostgreSQL. In fact this patch also makes it possible to turn on or off ssl entirely without restart. It does so by initializing a new SSL context when the postmaster receives a SIGHUP, and if the initialization succeeded the old context is replaced by the new. There was some previous discussion[1] on the mailing list about what the proper context should be for the SSL parameters, but as far as I can tell the discussion never reached a conclusion. I have changed the SSL GUCs to PGC_SIGUP since I felt that was the closest to the truth, but it is not a perfect fit (the backends wont reload the SSL context). Should we add a new context for the SSL GUCs? Notes 1. http://www.postgresql.org/message-id/flat/CAAS3tyLJcv-m0CqfMrrxUjwa9_FKscKuAKT9_L41wNuJZywM2Q@mail.gmail.com#CAAS3tyLJcv-m0CqfMrrxUjwa9_FKscKuAKT9_L41wNuJZywM2Q@mail.gmail.com Andreas
Attachment
On 5/30/15 10:14 PM, Andreas Karlsson wrote: > I have written a patch which makes it possible to change SSL > certificates (and other SSL parameters, including the CRL) without > restarting PostgreSQL. In fact this patch also makes it possible to turn > on or off ssl entirely without restart. It does so by initializing a new > SSL context when the postmaster receives a SIGHUP, and if the > initialization succeeded the old context is replaced by the new. I think this would be a useful feature, and the implementation looks sound. But I don't like how the reload is organized. Reinitializing the context in the sighup handler, aside from questions about how much work one should do in a signal handler, would cause SSL reinitialization for unrelated reloads. We have the GUC assign hook mechanism for handling this sort of thing. The trick would be that when multiple SSL-related settings change, you only want to do one reinitialization. You could either have the different assign hooks communicate with each other somehow, or have them set a "need SSL init" flag that is checked somewhere else. > There was some previous discussion[1] on the mailing list about what the > proper context should be for the SSL parameters, but as far as I can > tell the discussion never reached a conclusion. I have changed the SSL > GUCs to PGC_SIGUP since I felt that was the closest to the truth, but it > is not a perfect fit (the backends wont reload the SSL context). Should > we add a new context for the SSL GUCs? I think PGC_SIGHUP is fine for this.
On 07/02/2015 06:13 PM, Peter Eisentraut wrote: > I think this would be a useful feature, and the implementation looks > sound. But I don't like how the reload is organized. Reinitializing > the context in the sighup handler, aside from questions about how much > work one should do in a signal handler, would cause SSL reinitialization > for unrelated reloads. We have the GUC assign hook mechanism for > handling this sort of thing. The trick would be that when multiple > SSL-related settings change, you only want to do one reinitialization. > You could either have the different assign hooks communicate with each > other somehow, or have them set a "need SSL init" flag that is checked > somewhere else. It is not enough to just add a hook to the GUCs since I would guess most users would expect the certificate to be reloaded if just the file has been replaced and no GUC was changed. To support this we would need to also check the mtimes of the SSL files, would that complexity really be worth it? Andreas
On Wed, Jul 22, 2015 at 9:52 AM, Andreas Karlsson <andreas@proxel.se> wrote: > On 07/02/2015 06:13 PM, Peter Eisentraut wrote: >> >> I think this would be a useful feature, and the implementation looks >> sound. But I don't like how the reload is organized. Reinitializing >> the context in the sighup handler, aside from questions about how much >> work one should do in a signal handler, would cause SSL reinitialization >> for unrelated reloads. We have the GUC assign hook mechanism for >> handling this sort of thing. The trick would be that when multiple >> SSL-related settings change, you only want to do one reinitialization. >> You could either have the different assign hooks communicate with each >> other somehow, or have them set a "need SSL init" flag that is checked >> somewhere else. > > > It is not enough to just add a hook to the GUCs since I would guess most > users would expect the certificate to be reloaded if just the file has been > replaced and no GUC was changed. Why? It seems to me that the assign hook gets called once per process at reload for a SIGHUP parameter even if its value is not changed, no? > To support this we would need to also check > the mtimes of the SSL files, would that complexity really be worth it? Or we could reload the SSL context unconditionally once per reload loop. I am wondering how costly that may prove to be though. -- Michael
On 7/21/15 8:52 PM, Andreas Karlsson wrote: > It is not enough to just add a hook to the GUCs since I would guess most > users would expect the certificate to be reloaded if just the file has > been replaced and no GUC was changed. To support this we would need to > also check the mtimes of the SSL files, would that complexity really be > worth it? Actually, I misread your patch. I thought you only wanted to reload the SSL files when the GUC settings change, but of course we also want to reload them when the files are changed. I don't have a problem with rebuilding the SSL context on every reload cycle. We already do a lot of extra reloading every time, so a bit more shouldn't hurt. But I'm not so sure whether we should do that in the SIGHUP handler. I don't know how we got into the situation of doing all the file reloads directly in the handler, but at least we can control that code. Making a bunch of calls into an external library is a different thing, though. Can we find a way to do this differently?
Peter Eisentraut <peter_e@gmx.net> writes: > I don't have a problem with rebuilding the SSL context on every reload > cycle. We already do a lot of extra reloading every time, so a bit more > shouldn't hurt. But I'm not so sure whether we should do that in the > SIGHUP handler. I don't know how we got into the situation of doing all > the file reloads directly in the handler, but at least we can control > that code. Making a bunch of calls into an external library is a > different thing, though. Can we find a way to do this differently? Do we have an idea how expensive it is to load that data? A brute-force answer is to not have the postmaster load it at all, but to have new backends do so (if needed) during their connection acceptance/authentication phase. I'm not sure how much that would add to the SSL connection startup time though. It would also mean that problems with the SSL config files would only be reported during subsequent connection starts, not at SIGHUP time, and indeed that SIGHUP is more or less meaningless for SSL file changes: the instant you change a file, it's live for later connections. On the plus side, it would make Windows and Unix behavior closer, since (I suppose) we're reloading that stuff anyway in EXEC_BACKEND builds. I'm not entirely sure your concern is valid, though. We have always had the principle that almost everything of interest in the postmaster happens in signal handler functions. We could possibly change things so that reloading config files is done in the "main loop" of ServerLoop, but if we did, it would have to execute with all signals blocked, which seems like just about as much of a risk for third-party code as executing that code in a signal handler is. regards, tom lane
On 07/23/2015 07:19 AM, Michael Paquier wrote: > On Wed, Jul 22, 2015 at 9:52 AM, Andreas Karlsson <andreas@proxel.se> wrote: >> On 07/02/2015 06:13 PM, Peter Eisentraut wrote: >>> >>> I think this would be a useful feature, and the implementation looks >>> sound. But I don't like how the reload is organized. Reinitializing >>> the context in the sighup handler, aside from questions about how much >>> work one should do in a signal handler, would cause SSL reinitialization >>> for unrelated reloads. We have the GUC assign hook mechanism for >>> handling this sort of thing. The trick would be that when multiple >>> SSL-related settings change, you only want to do one reinitialization. >>> You could either have the different assign hooks communicate with each >>> other somehow, or have them set a "need SSL init" flag that is checked >>> somewhere else. >> >> >> It is not enough to just add a hook to the GUCs since I would guess most >> users would expect the certificate to be reloaded if just the file has been >> replaced and no GUC was changed. > > Why? It seems to me that the assign hook gets called once per process > at reload for a SIGHUP parameter even if its value is not changed, no? My bad, I tested it and you are correct. But I am not convinced moving the SSL initialization to a GUC assign hook would make anything clearer. It would not move any work out of the signal handler either since the assign hooks are ran inside it, and the hook will be ran in all backends which is not any interesting property for SSL initialization. Andreas
On Wed, Jul 29, 2015 at 10:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> I don't have a problem with rebuilding the SSL context on every reload >> cycle. We already do a lot of extra reloading every time, so a bit more >> shouldn't hurt. But I'm not so sure whether we should do that in the >> SIGHUP handler. I don't know how we got into the situation of doing all >> the file reloads directly in the handler, but at least we can control >> that code. Making a bunch of calls into an external library is a >> different thing, though. Can we find a way to do this differently? > > Do we have an idea how expensive it is to load that data? There are no numbers on this thread. And honestly I would be curious as well to see a run of pgbench with -C doing or similar to check how long it takes to establish a connection. I would expect it to be measurable though, but here I'm just hand-waving ;) > A brute-force answer is to not have the postmaster load it at all, > but to have new backends do so (if needed) during their connection > acceptance/authentication phase. I'm not sure how much that would > add to the SSL connection startup time though. It would also mean > that problems with the SSL config files would only be reported during > subsequent connection starts, not at SIGHUP time, and indeed that > SIGHUP is more or less meaningless for SSL file changes: the instant > you change a file, it's live for later connections. On the plus side, > it would make Windows and Unix behavior closer, since (I suppose) > we're reloading that stuff anyway in EXEC_BACKEND builds. Indeed. -- Michael
On 07/29/2015 03:24 AM, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> I don't have a problem with rebuilding the SSL context on every reload >> cycle. We already do a lot of extra reloading every time, so a bit more >> shouldn't hurt. But I'm not so sure whether we should do that in the >> SIGHUP handler. I don't know how we got into the situation of doing all >> the file reloads directly in the handler, but at least we can control >> that code. Making a bunch of calls into an external library is a >> different thing, though. Can we find a way to do this differently? > > Do we have an idea how expensive it is to load that data? > > A brute-force answer is to not have the postmaster load it at all, > but to have new backends do so (if needed) during their connection > acceptance/authentication phase. I'm not sure how much that would > add to the SSL connection startup time though. It would also mean > that problems with the SSL config files would only be reported during > subsequent connection starts, not at SIGHUP time, and indeed that > SIGHUP is more or less meaningless for SSL file changes: the instant > you change a file, it's live for later connections. On the plus side, > it would make Windows and Unix behavior closer, since (I suppose) > we're reloading that stuff anyway in EXEC_BACKEND builds. I measured it taking ~0.3ms to build the new SSL context in a simple benchmark (certificate + CA + small crl). Personally I do not think moving this to connection start would be worth it since reporting errors that late is not nice for people who have misconfigured their database, and even if my benchmarks indicates it is relatively cheap to reload SSL adding more work to connection establishing is something I would want to avoid unless it gives us a clear benefit. > I'm not entirely sure your concern is valid, though. We have always had > the principle that almost everything of interest in the postmaster happens > in signal handler functions. We could possibly change things so that > reloading config files is done in the "main loop" of ServerLoop, but > if we did, it would have to execute with all signals blocked, which seems > like just about as much of a risk for third-party code as executing that> code in a signal handler is. Agreed, I am not sure what moving it to the main loop would gain us. -- Andreas
[ moving this discussion back to the patch thread ] Andreas Karlsson <andreas@proxel.se> writes: > On 08/25/2015 09:39 AM, Michael Paquier wrote: >> -- Reload SSL certificates on SIGHUP: returned with feedback? I think >> that this patch needs more work to be in a commitable state. > Maybe I am being dense here, but I do not feel like I have gotten any > clear feedback which gives me a way forward with the patch. I do not > really see what more I can do here other than resubmit it to the next CF > which I feel would be poor etiquette by me. I think we pretty much rejected Peter's concern about doing the work in the SIGHUP handler. There's been some other discussion about refactoring the postmaster to not do all its work in signal handlers, but that is material for a different patch. Absent hard evidence that reloading SSL config in the handler actually fails, I don't think we should ask this patch to do a half-baked job of refactoring that. However ... a concern I've got is that there's a difference between how the Unix and Windows builds work, and this patch will move that from a back-burner issue to a significant concern. Namely, that on Unix we load the SSL data once and that's what you use, while on Windows (or any EXEC_BACKEND build) what you're going to get is whatever is in the files right now when a connection starts, whether it's good or bad. What this patch does, unless I missed something, is to persuade the Unix ports to implement "reload SSL data at SIGHUP", which is good; but the Windows behavior stays where it is. It didn't matter so much as long as changing the SSL config files wasn't considered a supported operation; but if that is supported, then people are going to complain. Is it unreasonable of me to ask for the Windows behavior to be fixed at the same time? I dunno. It's perhaps less broken than the Unix behavior, but that doesn't make it desirable. OTOH it might be a significantly larger patch, and I confess I'm not even too sure what we'd have to do. So I think the way to move this forward is to investigate how to hold the SSL config constant until SIGHUP in an EXEC_BACKEND build. If we find out that that's unreasonably difficult, maybe we'll decide that we can live without it; but I'd like to see the question investigated rather than ignored. regards, tom lane
On Wed, Aug 26, 2015 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [...] > So I think the way to move this forward is to investigate how to hold > the SSL config constant until SIGHUP in an EXEC_BACKEND build. If we > find out that that's unreasonably difficult, maybe we'll decide that > we can live without it; but I'd like to see the question investigated > rather than ignored. You have a point here. In EXEC_BACKEND, parameter updated via SIGHUP are only taken into account by newly-started backends, right? Hence, a way to do what we want is to actually copy the data needed to initialize the SSL context into alternate file(s). When postmaster starts up, or when SIGHUP shows up those alternate files are upserted by the postmaster. be-secure-openssl.c needs also to be changed such as with EXEC_BACKEND the context needs to be loaded from those alternate files. At quick glance this seems doable. For now I am moving the patch to the next CF, more investigation is surely needed. -- Michael
On Wed, Aug 26, 2015 at 12:24 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Aug 26, 2015 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> [...] >> So I think the way to move this forward is to investigate how to hold >> the SSL config constant until SIGHUP in an EXEC_BACKEND build. If we >> find out that that's unreasonably difficult, maybe we'll decide that >> we can live without it; but I'd like to see the question investigated >> rather than ignored. > > You have a point here. > > In EXEC_BACKEND, parameter updated via SIGHUP are only taken into > account by newly-started backends, right? Oops. I mistook with PGC_BACKEND here. Sorry for the noise. > Hence, a way to do what we > want is to actually copy the data needed to initialize the SSL context > into alternate file(s). When postmaster starts up, or when SIGHUP > shows up those alternate files are upserted by the postmaster. > be-secure-openssl.c needs also to be changed such as with EXEC_BACKEND > the context needs to be loaded from those alternate files. At quick > glance this seems doable. Still, this idea would be to use a set of alternate files in global/ to set the context, basically something like config_exec_ssl_cert_file, config_exec_ssl_key_file and config_exec_ssl_ca_file. It does not seem to be necessary to manipulate [read|write]_nondefault_variables() as the use of this metadata should be made only when SSL context is initialized on backend. Other thoughts welcome. -- Michael
On 08/26/2015 03:57 AM, Tom Lane wrote: > Is it unreasonable of me to ask for the Windows behavior to be fixed at > the same time? I dunno. It's perhaps less broken than the Unix behavior, > but that doesn't make it desirable. OTOH it might be a significantly > larger patch, and I confess I'm not even too sure what we'd have to do. > > So I think the way to move this forward is to investigate how to hold > the SSL config constant until SIGHUP in an EXEC_BACKEND build. If we > find out that that's unreasonably difficult, maybe we'll decide that > we can live without it; but I'd like to see the question investigated > rather than ignored. I think this is a real concern and one that I will look into, to see if it can be fixed with a reasonable amount of work. Andreas
On 2015-08-27 01:12:54 +0200, Andreas Karlsson wrote: > I think this is a real concern and one that I will look into, to see if it > can be fixed with a reasonable amount of work. This patch has been in waiting-for-author for a month. Marking it as returned-with-feedback. Greetings, Andres Freund
On 08/26/2015 07:46 AM, Michael Paquier wrote: > On Wed, Aug 26, 2015 at 12:24 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Wed, Aug 26, 2015 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> [...] >>> So I think the way to move this forward is to investigate how to hold >>> the SSL config constant until SIGHUP in an EXEC_BACKEND build. If we >>> find out that that's unreasonably difficult, maybe we'll decide that >>> we can live without it; but I'd like to see the question investigated >>> rather than ignored. >> >> You have a point here. >> >> In EXEC_BACKEND, parameter updated via SIGHUP are only taken into >> account by newly-started backends, right? > > Oops. I mistook with PGC_BACKEND here. Sorry for the noise. > >> Hence, a way to do what we >> want is to actually copy the data needed to initialize the SSL context >> into alternate file(s). When postmaster starts up, or when SIGHUP >> shows up those alternate files are upserted by the postmaster. >> be-secure-openssl.c needs also to be changed such as with EXEC_BACKEND >> the context needs to be loaded from those alternate files. At quick >> glance this seems doable. > > Still, this idea would be to use a set of alternate files in global/ > to set the context, basically something like > config_exec_ssl_cert_file, config_exec_ssl_key_file and > config_exec_ssl_ca_file. It does not seem to be necessary to > manipulate [read|write]_nondefault_variables() as the use of this > metadata should be made only when SSL context is initialized on > backend. Other thoughts welcome. Sorry for dropping this patch, but now I have started looking at it again. I started implementing your suggested solution, but realized that I do not like copying of the private key file. The private key might have been put by the DBA on another file system for security reasons and having PostgreSQL copy potentially sensitive data to somewhere under pg_data seems like a surprising behavior. Especially since this only happens on some platforms. I guess a possible solution would be to read the files into the postmaster (where we already have the private key today) and have OpenSSL read the keys from memory and re-implement something like SSL_CTX_use_certificate_chain_file() in our code, and similar things for the other functions which now take a path. This seems like a bit too much work to burden this patch with (and not obviously something we would want anyway) since the behavior is already different on Windows in the current code. Thoughts? I have attached a rebased version of the original patch which applies on current master. Andreas
Attachment
On Mon, Nov 23, 2015 at 12:29 PM, Andreas Karlsson <andreas@proxel.se> wrote: > On 08/26/2015 07:46 AM, Michael Paquier wrote: >> >> On Wed, Aug 26, 2015 at 12:24 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> >>> On Wed, Aug 26, 2015 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> >>>> [...] >>>> So I think the way to move this forward is to investigate how to hold >>>> the SSL config constant until SIGHUP in an EXEC_BACKEND build. If we >>>> find out that that's unreasonably difficult, maybe we'll decide that >>>> we can live without it; but I'd like to see the question investigated >>>> rather than ignored. >>> >>> >>> You have a point here. >>> >>> In EXEC_BACKEND, parameter updated via SIGHUP are only taken into >>> account by newly-started backends, right? >> >> >> Oops. I mistook with PGC_BACKEND here. Sorry for the noise. >> >>> Hence, a way to do what we >>> want is to actually copy the data needed to initialize the SSL context >>> into alternate file(s). When postmaster starts up, or when SIGHUP >>> shows up those alternate files are upserted by the postmaster. >>> be-secure-openssl.c needs also to be changed such as with EXEC_BACKEND >>> the context needs to be loaded from those alternate files. At quick >>> glance this seems doable. >> >> >> Still, this idea would be to use a set of alternate files in global/ >> to set the context, basically something like >> config_exec_ssl_cert_file, config_exec_ssl_key_file and >> config_exec_ssl_ca_file. It does not seem to be necessary to >> manipulate [read|write]_nondefault_variables() as the use of this >> metadata should be made only when SSL context is initialized on >> backend. Other thoughts welcome. > > I started implementing your suggested solution, but realized that I do not > like copying of the private key file. The private key might have been put by > the DBA on another file system for security reasons and having PostgreSQL > copy potentially sensitive data to somewhere under pg_data seems like a > surprising behavior. Especially since this only happens on some platforms. You are referring for example to Fedora/Ubuntu that use a symlink to point to those SSL files, right? Yes this approach may be a security concern in those cases... One idea may be that we actually encrypt this data > I guess a possible solution would be to read the files into the postmaster > (where we already have the private key today) and have OpenSSL read the keys > from memory and re-implement something like > SSL_CTX_use_certificate_chain_file() in our code, and similar things for the > other functions which now take a path. This seems like a bit too much work > to burden this patch with (and not obviously something we would want anyway) > since the behavior is already different on Windows in the current code. > > Thoughts? Reimplementing a twin of SSL_CTX_use_certificate_chain_file() would have benefits in this case, but that's really something to avoid. I may say something stupid here, but what if as you say we store the information of the certificate into a dedicated shared memory block when postmaster starts up, except that when we need to reload the keys we dump them into a temporary file with tmpfile or similar and then read it using SSL_CTX_use_certificate_chain_file(). For EXEC_BACKEND, the shared memory block will be reattached at startup using PGSharedMemoryReAttach() so it should have the data. For non-EXEC_BACKEND, the child processes will just inherit the shmem block with fork(). When SIGHUP is issued, all the processes unconditionally dump the data into a per-process tmp file and then reload it in the SSL context. -- Michael
On Sun, Nov 22, 2015 at 7:29 PM, Andreas Karlsson <andreas@proxel.se> wrote: > Sorry for dropping this patch, but now I have started looking at it again. Any chance of picking this up again soon, Andreas? I think it's an important project. I would like to review it. -- Peter Geoghegan
On 08/31/2016 11:34 PM, Peter Geoghegan wrote: > On Sun, Nov 22, 2015 at 7:29 PM, Andreas Karlsson <andreas@proxel.se> wrote: >> Sorry for dropping this patch, but now I have started looking at it again. > > Any chance of picking this up again soon, Andreas? I think it's an > important project. I would like to review it. I do not really have any good ideas for how to fix it for Windows, but if anyone would like to discuss solutions I am interested in working on this patch again. The alternatives as I see them now: 1) Serialize the certificates, key, and CRL and write them to the backend_var temp file and then deserialize everything in the backends. Sounds like you would need to write some code for every SSL library to support the serialization and deserialization, which I am not a fan of doing just for one platform since I worry about little used code paths. Additionally this would mean that we write a copy of the private key to potentially another file system than the one where the private key is stored, this sounds like a bad idea from a security point of view. 2) Copy all the SSL related files into the data directory at SIGHUP, before loading them. While this does not require any serialization of certificates it still has the problem of writing private keys to disk. 3) Leave my patch as it is now. This means the postmaster will reload certificates on SIGHUP while the backends will also load them when spawning. This means windows will continue to work the same as before my patch. Is there any other way to pass the current set of loaded certificates and keys from the postmaster to the backends on Windows? I guess you could use a pipe, but if so we should probably send all data on this pipe, not just the SSL stuff. I am leaning towards doing (3) but I know I am biased since it is less work and I do not care much for Windows. Andreas
On Sun, Sep 4, 2016 at 11:39 PM, Andreas Karlsson <andreas@proxel.se> wrote: > 1) Serialize the certificates, key, and CRL and write them to the > backend_var temp file and then deserialize everything in the backends. > > Sounds like you would need to write some code for every SSL library to > support the serialization and deserialization, which I am not a fan of doing > just for one platform since I worry about little used code paths. > Additionally this would mean that we write a copy of the private key to > potentially another file system than the one where the private key is > stored, this sounds like a bad idea from a security point of view. Yeah... This would result in something that is heavily SSL-dependent, which would be an additional maintenance pain when trying to support future OpenSSL versions. > 2) Copy all the SSL related files into the data directory at SIGHUP, before > loading them. While this does not require any serialization of certificates > it still has the problem of writing private keys to disk. You expressed enough concern about that upthread, copying private keys into PGDATA is a security concern. > 3) Leave my patch as it is now. This means the postmaster will reload > certificates on SIGHUP while the backends will also load them when spawning. > This means windows will continue to work the same as before my patch. > > Is there any other way to pass the current set of loaded certificates and > keys from the postmaster to the backends on Windows? I guess you could use a > pipe, but if so we should probably send all data on this pipe, not just the > SSL stuff. > > I am leaning towards doing (3) but I know I am biased since it is less work > and I do not care much for Windows. Seriously... The benefit of this feature is clear for a lot of people. And the implementation dedicated only to Windows would just result in a grotty thing anyway. So I'd say that at this point we could just push for 3) and facilitate the life of most with SSL configuration. The behavior across platforms needs to be properly documented for sure. -- Michael
On Wed, 7 Sep 2016 17:09:17 +0900 Michael Paquier <michael.paquier@gmail.com> wrote: > On Sun, Sep 4, 2016 at 11:39 PM, Andreas Karlsson <andreas@proxel.se> > wrote: > > 1) Serialize the certificates, key, and CRL and write them to the > > backend_var temp file and then deserialize everything in the > > backends. > > > > Sounds like you would need to write some code for every SSL library > > to support the serialization and deserialization, which I am not a > > fan of doing just for one platform since I worry about little used > > code paths. Additionally this would mean that we write a copy of > > the private key to potentially another file system than the one > > where the private key is stored, this sounds like a bad idea from a > > security point of view. > > Yeah... This would result in something that is heavily SSL-dependent, > which would be an additional maintenance pain when trying to support > future OpenSSL versions. OpenSSL has documented API for serializing/deserializing each and every cryptographic format it supports. And this API quite unlikely to change in future OpenSSL versions. Moreover, LibreSSL is compatible with this API as far as I know. Really, Apache does simular thing for ages - it starts as root, loads certificates and keys, serializes them in memory, then forks and drops privileges, and then uses these keys and certificates. There are two problems with this approach 1. You have to carefully design data structures to store serialized keys. Apache made a mistake there and didn't allow for future invention of new public key algorithms. So, in 2008 I had problems adding russian GOST cryptography there. 2. You keys and certificates might not be stored in the filesystem at all. They can live in some hardware cryptography module, which don't let private keys out, just provide some handle. (OpenSSL supports it via lodable engine modules, and Postgres allows to use engine modules since 8.2, so people can use it with postgres). Really OpenSSL/LibreSSL provide useful abstraction of memory BIO (Where BIO stands for basic input/output) which can be used to store serialized cryptographic data. And serializing/deserializing API is designed to work with BIO anyway.
I have attached a version of the patch rebased on top of the OpenSSL 1.1 changes. Andreas
Attachment
Hi, this is a first review of this patch. As a feature, I think this functionality is very welcome. Having to schedule a downtime in order to enable SSL or change the SSL certificate is a nuisance and it might make admins think twice, reducing security. The patch applies cleanly (modulo fuzz in the last hunk, but see below) to master as of 00a86856, it compiles cleanly when using --with-openssl and without that option and passes make check for both cases. It does not update the documentation, I think at least section 18.9 "Secure TCP/IP Connections with SSL" needs updating as it says: "The files server.key, server.crt, root.crt, and root.crl (or their configured alternative names) are only examined during server start; so you must restart the server for changes in them to take effect". I (so far) lightly tested the functionality, and I could not find any serious logic issues up till now. Changing or replacing the server certificate with an expired one would make psql no longer connect with sslmode=require. However see below for segfaults during testing. Some code comments on the patch: On Mon, Oct 31, 2016 at 10:40:18AM +0100, Andreas Karlsson wrote: > diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c > index 668f217..a1b582f 100644 > --- a/src/backend/libpq/be-secure-openssl.c > +++ b/src/backend/libpq/be-secure-openssl.c > @@ -77,12 +77,14 @@ static DH *generate_dh_parameters(int prime_len, int generator); [...] > + if ((context = initialize_context()) != NULL) > { [...] > + be_tls_destroy(); I am getting segfaults on reloading the configuration for this call. |Program received signal SIGSEGV, Segmentation fault. |X509_VERIFY_PARAM_free (param=0x21) at x509_vpm.c:105 |105 x509_vpm.c: Datei oder Verzeichnis nicht gefunden. |(gdb) bt full |#0 X509_VERIFY_PARAM_free (param=0x21) at x509_vpm.c:105 |No locals. |#1 0x00007f3b59745a09 in SSL_CTX_free (a=0x21) at ssl_lib.c:1955 | i = -1 |#2 0x00000000005ee314 in be_tls_destroy () at be-secure-openssl.c:197 |No locals. |#3 be_tls_init () at be-secure-openssl.c:181 |No locals. |#4 0x00000000005e3f55 in secure_initialize () at be-secure.c:74 |No locals. |#5 0x000000000065f377 in SIGHUP_handler (postgres_signal_arg=<optimized out>) at postmaster.c:2524 | save_errno = 4 | __func__ = "SIGHUP_handler" |#6 <signal handler called> |No locals. |#7 0x00007f3b58938873 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 |No locals. |#8 0x000000000046e027 in ServerLoop () at postmaster.c:1672 | timeout = {tv_sec = 54, tv_usec = 568533} | rmask = {fds_bits = {56, 0 <repeats 15 times>}} | selres = <optimized out> | now = <optimized out> | readmask = {fds_bits = {56, 0 <repeats 15 times>}} | last_lockfile_recheck_time = 1478169442 | last_touch_time = 1478169442 | __func__ = "ServerLoop" |#9 0x000000000066263e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2405df0) | at postmaster.c:1316 | opt = <optimized out> | status = <optimized out> | userDoption = <optimized out> | listen_addr_saved = <optimized out> | i = <optimized out> | output_config_variable = <optimized out> | __func__ = "PostmasterMain" |#10 0x000000000046f6b7 in main (argc=3, argv=0x2405df0) at main.c:228 |No locals. What I'm doing is: |postgres=# SHOW ssl; | ssl |----- | on |(1 row) | |postgres=# ALTER SYSTEM SET ssl = off; |ALTER SYSTEM |postgres=# SELECT pg_reload_conf(); | pg_reload_conf |---------------- | t |(1 row) | |postgres=# SHOW ssl; | ssl |----- | off |(1 row) | |postgres=# ALTER SYSTEM SET ssl = on; |ALTER SYSTEM |postgres=# SELECT pg_reload_conf(); | pg_reload_conf |---------------- | t |(1 row) | |postgres=# SHOW ssl; |FATAL: terminating connection due to unexpected postmaster exit |server closed the connection unexpectedly | This probably means the server terminated abnormally | before or while processing the request. |The connection to the server was lost. Attempting reset: Failed. |!> The other one I got: |Program received signal SIGSEGV, Segmentation fault. |sk_pop_free (st=0x100007f00000002, func=0x7fdbac018ed0 <ASN1_OBJECT_free>) at stack.c:291 |291 stack.c: Datei oder Verzeichnis nicht gefunden. |(gdb) bt full |#0 sk_pop_free (st=0x100007f00000002, func=0x7fdbac018ed0 <ASN1_OBJECT_free>) at stack.c:291 | i = <optimized out> |#1 0x00007fdbac04324a in x509_verify_param_zero (param=0x10d4b40) at x509_vpm.c:85 |No locals. |#2 X509_VERIFY_PARAM_free (param=0x10d4b40) at x509_vpm.c:105 |No locals. |#3 0x00007fdbac33ca09 in SSL_CTX_free (a=0x100007f00000002) at ssl_lib.c:1955 | i = -1 |#4 0x00000000005ee84c in be_tls_destroy () at be-secure-openssl.c:197 |No locals. |#5 0x00000000005e3f65 in secure_destroy () at be-secure.c:87 |No locals. |#6 0x000000000065f395 in SIGHUP_handler (postgres_signal_arg=<optimized out>) at postmaster.c:2532 | save_errno = 4 | __func__ = "SIGHUP_handler" |#7 <signal handler called> |No locals. |#8 0x00007fdbab52f873 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 |No locals. |#9 0x000000000046e027 in ServerLoop () at postmaster.c:1672 | timeout = {tv_sec = 54, tv_usec = 411921} | rmask = {fds_bits = {56, 0 <repeats 15 times>}} | selres = <optimized out> | now = <optimized out> | readmask = {fds_bits = {56, 0 <repeats 15 times>}} | last_lockfile_recheck_time = 1478182425 | last_touch_time = 1478182425 | __func__ = "ServerLoop" |#10 0x000000000066263e in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x108de00) | at postmaster.c:1316 | opt = <optimized out> | status = <optimized out> | userDoption = <optimized out> | listen_addr_saved = <optimized out> | i = <optimized out> | output_config_variable = <optimized out> | __func__ = "PostmasterMain" |#11 0x000000000046f6b7 in main (argc=4, argv=0x108de00) at main.c:228 |No locals. I ran pg_reload_conf() twice after setting ssl to off here: |postgres=# SHOW ssl; | ssl |----- | on |(1 row) | |postgres=# ALTER SYSTEM SET ssl = off; |ALTER SYSTEM |postgres=# SELECT pg_reload_conf(); | pg_reload_conf |---------------- | t |(1 row) | |postgres=# SHOW ssl; | ssl |----- | off |(1 row) | |postgres=# SELECT pg_reload_conf(); | pg_reload_conf |---------------- | t |(1 row) | |postgres=# SHOW ssl; |FATAL: terminating connection due to unexpected postmaster exit |server closed the connection unexpectedly | This probably means the server terminated abnormally | before or while processing the request. |The connection to the server was lost. Attempting reset: Failed. I set a breakpoint in be_tls_destroy() and dumped SSL_context in gdb for both pg_reload_conf() calls and the differences are: -$17 = {method = 0x7f4faf6bce80 <SSLv23_method_data.16018>, cipher_list = 0x298d3a0, - cipher_list_by_id = 0x298f140, cert_store = 0x298c4c0, sessions = 0x298c370, +$18 = {method = 0x7f4fae955b28 <main_arena+1288>, cipher_list = 0x7f4fae955b28 <main_arena+1288>, + cipher_list_by_id = 0x298bf40, cert_store = 0x298bf40, sessions = 0x298c370, [...] - sess_timeout = 0, sess_cache_full = 0, sess_hit = 0, sess_cb_hit = 0}, references = 1, + sess_timeout = 0, sess_cache_full = 0, sess_hit = 0, sess_cb_hit = 0}, references = 0, [...] - comp_methods = 0x2969920, info_callback = 0x0, client_CA = 0x298cb80, options = 56098820, mode = 2, + comp_methods = 0x0, info_callback = 0x0, client_CA = 0x298cb80, options = 56098820, mode = 2, Not sure whether this might be an issue with my openssl library (this is on Debian stable). > + SSL_context = context; > + /* Set flag to remember if CA store is successfully loaded */ > + ssl_loaded_verify_locations = !!ssl_ca_file[0]; I am not sure this '!!' operator is according to project policy, it seems to be not used elsewhere in the codebase except for imported or auto-generated code. At least there should be a comment here, methinks. [...] > + > + /* set up the allowed cipher list */ > + if (SSL_CTX_set_cipher_list(context, SSLCipherSuites) != 1) > + INIT_CONTEXT_ERROR((errmsg("could not set the cipher list (no valid ciphers available)"))) Missing semicolon at the end of the line. [...] > +static bool > +initialize_ecdh(SSL_CTX *context) > { > #ifndef OPENSSL_NO_ECDH > EC_KEY *ecdh; > int nid; > > nid = OBJ_sn2nid(SSLECDHCurve); > - if (!nid) > - ereport(FATAL, > + if (!nid) { > + ereport(LOG, Opening braces should be on the next line. > (errmsg("ECDH: unrecognized curve name: %s", SSLECDHCurve))); > + return false; > + } > > ecdh = EC_KEY_new_by_curve_name(nid); > - if (!ecdh) > - ereport(FATAL, > + if (!ecdh) { > + ereport(LOG, Same. > diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c > index 24add74..c997581 100644 > --- a/src/backend/postmaster/postmaster.c > +++ b/src/backend/postmaster/postmaster.c > @@ -241,6 +241,8 @@ bool enable_bonjour = false; > char *bonjour_name; > bool restart_after_crash = true; > > +static bool LoadedSSL = false; > + All the delarations above this one are global variables for GUCs (see postmaster.h, lines 16-31). So ISTM this static variable declaration should be introduced by a comment in order to logically seperate it from the previous ones, like the following static variables are: > /* PIDs of special child processes; 0 when not running */ > static pid_t StartupPID = 0, > BgWriterPID = 0, [...] > --- a/src/include/libpq/libpq.h > +++ b/src/include/libpq/libpq.h > @@ -85,6 +85,7 @@ extern int (*pq_putmessage_hook) (char msgtype, const char *s, size_t len); > extern int (*pq_flush_hook) (void); > > extern int secure_initialize(void); > +extern void secure_destroy(void); > extern bool secure_loaded_verify_locations(void); > extern void secure_destroy(void); > extern int secure_open_server(Port *port); > This hunk baffled me at first because two lines below your added secure_destroy() declaration, the same line is already present. I did some digging and it turns out we had a secure_destroy() in the ancient past, but its implementation got removed in 2008 in 4e8162865 as there were no (more) users of it, however, the declaration was kept on until now. So this hunk should be removed I guess. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Thanks for the review! I have fixed all your feedback in the attached patch. On 11/03/2016 04:24 PM, Michael Banck wrote: > It does not update the documentation, I think at least section 18.9 > "Secure TCP/IP Connections with SSL" needs updating as it says: "The > files server.key, server.crt, root.crt, and root.crl (or their > configured alternative names) are only examined during server start; so > you must restart the server for changes in them to take effect". Changed this. > However see below for segfaults during testing. Fixed, this was due to me not setting SSL_Context to NULL after freeing it. >> [...] > > I am not sure this '!!' operator is according to project policy, it > seems to be not used elsewhere in the codebase except for imported or > auto-generated code. At least there should be a comment here, methinks. I changed the code to compare with '\0' instead. > [...] > Missing semicolon at the end of the line. > [...] > Opening braces should be on the next line. > [...] > Same. Fixed. >> [...] > > All the delarations above this one are global variables for GUCs (see > postmaster.h, lines 16-31). So ISTM this static variable declaration > should be introduced by a comment in order to logically seperate it from > the previous ones, like the following static variables are: Fixed. >> [...] > > This hunk baffled me at first because two lines below your added > secure_destroy() declaration, the same line is already present. I did > some digging and it turns out we had a secure_destroy() in the ancient > past, but its implementation got removed in 2008 in 4e8162865 as there > were no (more) users of it, however, the declaration was kept on until > now. > > So this hunk should be removed I guess. Removed. Andreas
Attachment
Hi, On Mon, Nov 07, 2016 at 02:36:00AM +0100, Andreas Karlsson wrote: > Thanks for the review! I have fixed all your feedback in the attached > patch. Thanks! I couldn't find furhter faults in my testing. I guess the question what to do about this on Windows is possibly still open, but as I am not familiar with the Windows port at all I've marked it Ready for Committer for now. Thanks, Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
On Tue, Nov 8, 2016 at 9:22 PM, Michael Banck <michael.banck@credativ.de> wrote: > Thanks! I couldn't find furhter faults in my testing. I guess the > question what to do about this on Windows is possibly still open, but as > I am not familiar with the Windows port at all I've marked it Ready for > Committer for now. The conclusion on Windows per upthread is "do nothing", because there is no way to get a good balance between usability and security. To be honest, I am fine with that. -- Michael
On 11/08/2016 01:22 PM, Michael Banck wrote: > Thanks! I couldn't find furhter faults in my testing. I guess the > question what to do about this on Windows is possibly still open, but as > I am not familiar with the Windows port at all I've marked it Ready for > Committer for now. Thanks again for the review! Andreas
On Wed, Nov 9, 2016 at 3:48 AM, Andreas Karlsson <andreas@proxel.se> wrote: > On 11/08/2016 01:22 PM, Michael Banck wrote: >> >> Thanks! I couldn't find furhter faults in my testing. I guess the >> question what to do about this on Windows is possibly still open, but as >> I am not familiar with the Windows port at all I've marked it Ready for >> Committer for now. > > Thanks again for the review! It seems to me that this patch is missing something... To begin with, src/test/ssl/ServerSetup.pm should be patched so as the new SSL configuration is reloaded after pg_ctl reload, and not after an instance restart. That's straight-forward: --- a/src/test/ssl/ServerSetup.pm +++ b/src/test/ssl/ServerSetup.pm @@ -96,7 +96,7 @@ sub configure_test_server_for_ssl close HBA; } -# Change the configuration to use given server cert file, and restart +# Change the configuration to use given server cert file, and reload # the server so that the configuration takes effect. sub switch_server_cert { @@ -115,6 +115,6 @@ sub switch_server_cert print SSLCONF "ssl_crl_file='root+client.crl'\n"; close SSLCONF; - # Stop and restart server to reload the new config. - $node->restart; + # Reload the new configuration set. + $node->reload; } Once I did that, half of the tests are failing. And I would have expected all of them to work properly. -- Michael
Attachment
On 11/09/2016 06:54 AM, Michael Paquier wrote: > It seems to me that this patch is missing something... To begin with, > src/test/ssl/ServerSetup.pm should be patched so as the new SSL > configuration is reloaded after pg_ctl reload, and not after an > instance restart. That's straight-forward: > --- a/src/test/ssl/ServerSetup.pm > +++ b/src/test/ssl/ServerSetup.pm > @@ -96,7 +96,7 @@ sub configure_test_server_for_ssl > close HBA; > } > > -# Change the configuration to use given server cert file, and restart > +# Change the configuration to use given server cert file, and reload > # the server so that the configuration takes effect. > sub switch_server_cert > { > @@ -115,6 +115,6 @@ sub switch_server_cert > print SSLCONF "ssl_crl_file='root+client.crl'\n"; > close SSLCONF; > > - # Stop and restart server to reload the new config. > - $node->restart; > + # Reload the new configuration set. > + $node->reload; > } > > Once I did that, half of the tests are failing. And I would have > expected all of them to work properly. Those tests fail due to that listen_addresses cannot be changed on reload so none of the test cases can even connect to the database. When I hacked ServerSetup.pm to set the correct listen_address before starting all tests pass. It is a bit annoying that if pg_hba.conf contains hostssl then postgres will refuse to start. Maybe this is something we should also fix in this patch since now when we can enable SSL after starting it becomes more useful to not bail on hostssl. What do you think? I will look into writing a cleaner patch for ServerSetup.pm some time later this week. Andreas
On Wed, Nov 9, 2016 at 7:46 PM, Andreas Karlsson <andreas@proxel.se> wrote: > Those tests fail due to that listen_addresses cannot be changed on reload so > none of the test cases can even connect to the database. When I hacked > ServerSetup.pm to set the correct listen_address before starting all tests > pass. Hm... listen_addresses remain constant at 127.0.0.1 and setting up listen_addresses = '*' does not work either.. Perhaps I am missing something? > It is a bit annoying that if pg_hba.conf contains hostssl then postgres will > refuse to start. Maybe this is something we should also fix in this patch > since now when we can enable SSL after starting it becomes more useful to > not bail on hostssl. What do you think? I forgot that... There is the same problem today when updating postgresql.conf and restarting the server if there is an hostssl entry. Do you have in mind to relax things? It seems to be that the safest bet is to not reload parameters if ssl is switched from on to off and if pg_hba.conf has a hostssl entry, right? That complicates the code though. > I will look into writing a cleaner patch for ServerSetup.pm some time later > this week. Thanks. Making the restart/reload OS-dependent will be necessary. src/test/ssl can run on Windows. -- Michael
On 11/10/2016 07:16 AM, Michael Paquier wrote: > On Wed, Nov 9, 2016 at 7:46 PM, Andreas Karlsson <andreas@proxel.se> wrote: >> Those tests fail due to that listen_addresses cannot be changed on reload so >> none of the test cases can even connect to the database. When I hacked >> ServerSetup.pm to set the correct listen_address before starting all tests >> pass. > > Hm... listen_addresses remain constant at 127.0.0.1 and setting up > listen_addresses = '*' does not work either.. Perhaps I am missing > something? When PostgreSQL is started in the tests it by default only listens to a unix socket (except on Windows). It is the call to the restart function in the SSL tests which allows PostgreSQL to receive TCP connections. Fixing this in the SSL tests will require some refactoring. >> It is a bit annoying that if pg_hba.conf contains hostssl then postgres will >> refuse to start. Maybe this is something we should also fix in this patch >> since now when we can enable SSL after starting it becomes more useful to >> not bail on hostssl. What do you think? > > I forgot that... There is the same problem today when updating > postgresql.conf and restarting the server if there is an hostssl > entry. Do you have in mind to relax things? It seems to be that the > safest bet is to not reload parameters if ssl is switched from on to > off and if pg_hba.conf has a hostssl entry, right? That complicates > the code though. I personally think that it would be cleaner and easier to understand if we just do not fail on hostssl lines just because SSL is disabled. A warning should be enough. But I do not have any strong opinions here, and would be fine with leaving the code as-is. >> I will look into writing a cleaner patch for ServerSetup.pm some time later >> this week. > > Thanks. Making the restart/reload OS-dependent will be necessary. > src/test/ssl can run on Windows. I do not think that this will be an issue with the current design, but I do not have access to a Windows machine for testing. Andreas
Hi, Here is a new version of the patch with the only differences; 1) The SSL tests have been changed to use reload rather than restart 2) Rebased on master Please take a look. Andreas
Attachment
On 11/11/2016 07:40 PM, Andreas Karlsson wrote: > Hi, > > Here is a new version of the patch with the only differences; > > 1) The SSL tests have been changed to use reload rather than restart > > 2) Rebased on master And here with a fix to a comment. Andreas
Attachment
Hi, On Fri, Nov 11, 2016 at 07:42:05PM +0100, Andreas Karlsson wrote: > On 11/11/2016 07:40 PM, Andreas Karlsson wrote: > >Here is a new version of the patch with the only differences; > > > >1) The SSL tests have been changed to use reload rather than restart > > > >2) Rebased on master > > And here with a fix to a comment. Michael, as you brought up the issues with the SSL tests, do you plan to review the latest version (and add yourself as reviewer)? Michael -- Michael Banck Projektleiter / Senior Berater Tel: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
On Wed, Nov 23, 2016 at 5:48 AM, Michael Banck <michael.banck@credativ.de> wrote: > On Fri, Nov 11, 2016 at 07:42:05PM +0100, Andreas Karlsson wrote: >> On 11/11/2016 07:40 PM, Andreas Karlsson wrote: >> >Here is a new version of the patch with the only differences; >> > >> >1) The SSL tests have been changed to use reload rather than restart >> > >> >2) Rebased on master >> >> And here with a fix to a comment. > > Michael, as you brought up the issues with the SSL tests, do you plan to > review the latest version (and add yourself as reviewer)? Yes. I need two days. -- Michael
On Sat, Nov 12, 2016 at 3:42 AM, Andreas Karlsson <andreas@proxel.se> wrote: > On 11/11/2016 07:40 PM, Andreas Karlsson wrote: >> Here is a new version of the patch with the only differences; >> >> 1) The SSL tests have been changed to use reload rather than restart Did you check if the tests pass? I am getting a couple of failures like this one: psql: server certificate for "common-name.pg-ssltest.test" does not match host name "127.0.0.1" not ok 11 - sslrootcert=ssl/root+server_ca.crt sslmode=verify-full Attached are the logs of the run I did, and the same behavior shows for macOS and Linux. The shape of the tests look correct to me after review. Still, seeing failing tests with sslmode=verify-full is a problem that needs to be addressed. This may be pointing to an incorrect CA load handling, though I could not spot a problem when going through the code. >> 2) Rebased on master > > And here with a fix to a comment. config.sgml needs an update as it still mentions that SSL parameter require a restart when updated. I have done a couple of tests on Linux, switching ssl mode between on and off and testing connection attempts with sslmode. Things are proving to work as I would expect them to be, so basically that's nice: - switching to off with sslmode=require triggers an error: psql: server does not support SSL, but SSL was required - switching to on with sslmode=require connects with SSL. - switching to off with sslmode=prefer connects without SSL. - switching to on with sslmode=prefer connects with SSL. I have done as well a couple of tests with Windows, where switching ssl between on and off is proving to work properly for each new connection. There is no surprise here, and that's as documented in the patch. -- Michael
Attachment
On 11/24/2016 08:46 AM, Michael Paquier wrote: > On Sat, Nov 12, 2016 at 3:42 AM, Andreas Karlsson <andreas@proxel.se> wrote: >> On 11/11/2016 07:40 PM, Andreas Karlsson wrote: >>> Here is a new version of the patch with the only differences; >>> >>> 1) The SSL tests have been changed to use reload rather than restart > > Did you check if the tests pass? I am getting a couple of failures > like this one: > psql: server certificate for "common-name.pg-ssltest.test" does not > match host name "127.0.0.1" > not ok 11 - sslrootcert=ssl/root+server_ca.crt sslmode=verify-full > Attached are the logs of the run I did, and the same behavior shows > for macOS and Linux. The shape of the tests look correct to me after > review. Still, seeing failing tests with sslmode=verify-full is a > problem that needs to be addressed. This may be pointing to an > incorrect CA load handling, though I could not spot a problem when > going through the code. Thanks for finding this. I will look at this more once I get home, but the tests do not fail on my computer. I wonder what I do differently. What versions of Perl and OpenSSL do you run and how did you run the tests when the failed? I ran the tests by running "make check" inside "src/test/ssl". Andreas
On 11/24/2016 02:49 PM, Andreas Karlsson wrote: > Thanks for finding this. I will look at this more once I get home, but > the tests do not fail on my computer. I wonder what I do differently. > > What versions of Perl and OpenSSL do you run and how did you run the > tests when the failed? I ran the tests by running "make check" inside > "src/test/ssl". Never mind, I had not fetched the latest master. Once I had done so these tests were both broken in my rebased branch and in the master branch without any modifications. I guess I will have to bisect this once I get home. Inof for myself or anyone else who feels like bisecting this: the last known good commit is 67dc4ccbb2e1c27da823eced66d9217a5652cbb0. Andreas
On Thu, Nov 24, 2016 at 11:15 PM, Andreas Karlsson <andreas@proxel.se> wrote: > Never mind, I had not fetched the latest master. Once I had done so these > tests were both broken in my rebased branch and in the master branch without > any modifications. I guess I will have to bisect this once I get home. > > Inof for myself or anyone else who feels like bisecting this: the last known > good commit is 67dc4ccbb2e1c27da823eced66d9217a5652cbb0. Actually, it is a bit stupid of me to not move on with this patch as this is backend-only, and the commit causing the regression failures is 9a1d0af which is a frontend-only changes. So I have done a review of this patch after reverting 9a1d0af, and things are working honestly well. Looking at the latest patch at code-level, there is some refactoring to introduce initialize_context(). But it is actually not necessary (perhaps this is the remnant of a past version?) as be_tls_init() is its only caller. This patch makes hard to look at the diffs, and it makes future back-patching more complicated, so I would suggest simplifying the patch as much as possible. You could use for example a goto block for the error code path to free the context with SSL_CTX_free(), and set up ssl_loaded_verify_locations once the CA is loaded. The same applies to initialize_ecdh(). + if (secure_initialize() != 0) + ereport(FATAL, + (errmsg("could not load ssl context"))); + LoadedSSL = true; In case of a failure, a LOG message would have been already generated, so this duplicates the information. Worse, if log_min_messages is set to a level higher than LOG, users *lose* information on what has happened. I think that secure_initialize() should have an argument to define elevel and that this routine should be in charge of generating an error message. Having it return a status code is necessary, but you could cast secure_initialize() with (void) to show that we don't care about the status code in this case. There is no need to care about freeing the new context when the FATAL code path is used as process would just shut down. config.sgml needs to be updated to reflect that the SSL parameters are updated at server reload (mentioned already upthread, just re-mentioning it to group all the comments in one place). -- Michael
On Mon, Nov 28, 2016 at 2:01 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Nov 24, 2016 at 11:15 PM, Andreas Karlsson <andreas@proxel.se> wrote: >> Never mind, I had not fetched the latest master. Once I had done so these >> tests were both broken in my rebased branch and in the master branch without >> any modifications. I guess I will have to bisect this once I get home. >> >> Inof for myself or anyone else who feels like bisecting this: the last known >> good commit is 67dc4ccbb2e1c27da823eced66d9217a5652cbb0. > > Actually, it is a bit stupid of me to not move on with this patch as > this is backend-only, and the commit causing the regression failures > is 9a1d0af which is a frontend-only changes. So I have done a review > of this patch after reverting 9a1d0af, and things are working honestly > well. > > Looking at the latest patch at code-level, there is some refactoring > to introduce initialize_context(). But it is actually not necessary > (perhaps this is the remnant of a past version?) as be_tls_init() is > its only caller. This patch makes hard to look at the diffs, and it > makes future back-patching more complicated, so I would suggest > simplifying the patch as much as possible. You could use for example a > goto block for the error code path to free the context with > SSL_CTX_free(), and set up ssl_loaded_verify_locations once the CA is > loaded. The same applies to initialize_ecdh(). > > + if (secure_initialize() != 0) > + ereport(FATAL, > + (errmsg("could not load ssl context"))); > + LoadedSSL = true; > In case of a failure, a LOG message would have been already generated, > so this duplicates the information. Worse, if log_min_messages is set > to a level higher than LOG, users *lose* information on what has > happened. I think that secure_initialize() should have an argument to > define elevel and that this routine should be in charge of generating > an error message. Having it return a status code is necessary, but you > could cast secure_initialize() with (void) to show that we don't care > about the status code in this case. There is no need to care about > freeing the new context when the FATAL code path is used as process > would just shut down. > > config.sgml needs to be updated to reflect that the SSL parameters are > updated at server reload (mentioned already upthread, just > re-mentioning it to group all the comments in one place). As this patch could be really simplified this way, I am marking is as returned with feedback. -- Michael
On 11/30/2016 06:52 AM, Michael Paquier wrote: > On Mon, Nov 28, 2016 at 2:01 PM, Michael Paquier >> Looking at the latest patch at code-level, there is some refactoring >> to introduce initialize_context(). But it is actually not necessary >> (perhaps this is the remnant of a past version?) as be_tls_init() is >> its only caller. This patch makes hard to look at the diffs, and it >> makes future back-patching more complicated, so I would suggest >> simplifying the patch as much as possible. You could use for example a >> goto block for the error code path to free the context with >> SSL_CTX_free(), and set up ssl_loaded_verify_locations once the CA is >> loaded. The same applies to initialize_ecdh(). >> >> + if (secure_initialize() != 0) >> + ereport(FATAL, >> + (errmsg("could not load ssl context"))); >> + LoadedSSL = true; >> In case of a failure, a LOG message would have been already generated, >> so this duplicates the information. Worse, if log_min_messages is set >> to a level higher than LOG, users *lose* information on what has >> happened. I think that secure_initialize() should have an argument to >> define elevel and that this routine should be in charge of generating >> an error message. Having it return a status code is necessary, but you >> could cast secure_initialize() with (void) to show that we don't care >> about the status code in this case. There is no need to care about >> freeing the new context when the FATAL code path is used as process >> would just shut down. Thanks, this is a really good suggestion which made the diff much cleaner. I removed my new macro too now since I felt it mostly made the code more cryptic just to gain a few lines of code. >> config.sgml needs to be updated to reflect that the SSL parameters are >> updated at server reload (mentioned already upthread, just >> re-mentioning it to group all the comments in one place). Thanks, fixed this. > As this patch could be really simplified this way, I am marking is as > returned with feedback. I have attached a new version. The commitfest should technically have been closed by now, so do what you like with the status. I can always submit the patch to the next commitfest. Andreas
Attachment
On Fri, Dec 2, 2016 at 11:26 AM, Andreas Karlsson <andreas@proxel.se> wrote: > I have attached a new version. The commitfest should technically have been > closed by now, so do what you like with the status. I can always submit the > patch to the next commitfest. I have just moved it to the next CF. Will look at it in couple of days, that's more or less at the top of my TODO list. -- Michael
On Fri, Dec 2, 2016 at 1:02 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Dec 2, 2016 at 11:26 AM, Andreas Karlsson <andreas@proxel.se> wrote: >> I have attached a new version. The commitfest should technically have been >> closed by now, so do what you like with the status. I can always submit the >> patch to the next commitfest. > > I have just moved it to the next CF. Will look at it in couple of > days, that's more or less at the top of my TODO list. Thanks for the new version, it is far easier to check that there is no regression with the new code. /* Public interface *//* ------------------------------------------------------------*/ +/* Useless noise. +void +be_tls_destroy(void) +{ + SSL_CTX_free(SSL_context); + SSL_context = NULL;} Perhaps we had better leave immediately if SSL_context is already NULL? I have tested the error code paths by enforcing manually an error and I could not see any failures, still I am wondering if calling SSL_CTX_free(NULL) would be safe, particularly if there is a callback added in the future. + if (secure_initialize(false) != 0) + ereport(WARNING, + (errmsg("ssl context not reloaded"))); SSL should be upper-case here. One last thing that I think is missing in this patch is for users the possibility to check via SQL if the SSL context is actually loaded or not. As the context is reloaded after all the new values are available, with the current patch users may see that ssl is set to on but no context is loaded. So why not adding for example a read-only parameter that maps with SSLLoaded? Once those issues are addressed, I think that this will be ready for committer. -- Michael
On 12/04/2016 02:12 PM, Michael Paquier wrote: > One last thing that I think is missing in this patch is for users the > possibility to check via SQL if the SSL context is actually loaded or > not. As the context is reloaded after all the new values are > available, with the current patch users may see that ssl is set to on > but no context is loaded. So why not adding for example a read-only > parameter that maps with SSLLoaded? The other three issues are easy to fix, but this one is a bit trickier. Do you mean that we should add another GUC here which is read-only? Does this have a precedent in the code? Andreas
On Sun, Dec 4, 2016 at 11:11 PM, Andreas Karlsson <andreas@proxel.se> wrote: > On 12/04/2016 02:12 PM, Michael Paquier wrote: >> >> One last thing that I think is missing in this patch is for users the >> possibility to check via SQL if the SSL context is actually loaded or >> not. As the context is reloaded after all the new values are >> available, with the current patch users may see that ssl is set to on >> but no context is loaded. So why not adding for example a read-only >> parameter that maps with SSLLoaded? > > > The other three issues are easy to fix, but this one is a bit trickier. Do > you mean that we should add another GUC here which is read-only? Yes, that's what I meant. It is hard to track if the reloading has been effective or not. > Does this have a precedent in the code? data_checksums in guc.c is an example, it is marked with GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE and its value is updated with SetConfigOption() when the control file is read. The idea would be to do something like that with LoadedSSL. -- Michael
On 12/04/2016 03:20 PM, Michael Paquier wrote: > On Sun, Dec 4, 2016 at 11:11 PM, Andreas Karlsson <andreas@proxel.se> wrote: >> On 12/04/2016 02:12 PM, Michael Paquier wrote: >>> >>> One last thing that I think is missing in this patch is for users the >>> possibility to check via SQL if the SSL context is actually loaded or >>> not. As the context is reloaded after all the new values are >>> available, with the current patch users may see that ssl is set to on >>> but no context is loaded. So why not adding for example a read-only >>> parameter that maps with SSLLoaded? >> >> >> The other three issues are easy to fix, but this one is a bit trickier. Do >> you mean that we should add another GUC here which is read-only? > > Yes, that's what I meant. It is hard to track if the reloading has > been effective or not. > >> Does this have a precedent in the code? > > data_checksums in guc.c is an example, it is marked with > GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE and its value is updated with > SetConfigOption() when the control file is read. The idea would be to > do something like that with LoadedSSL. Thanks. I will be quite busy the upcoming couple of weeks so there will be a while until I can sit down with this. Feel free to contribute to the patch. Andreas
On Mon, Dec 5, 2016 at 9:50 AM, Andreas Karlsson <andreas@proxel.se> wrote: > On 12/04/2016 03:20 PM, Michael Paquier wrote: >> On Sun, Dec 4, 2016 at 11:11 PM, Andreas Karlsson <andreas@proxel.se> >> wrote: >>> On 12/04/2016 02:12 PM, Michael Paquier wrote: >>> Does this have a precedent in the code? >> >> >> data_checksums in guc.c is an example, it is marked with >> GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE and its value is updated with >> SetConfigOption() when the control file is read. The idea would be to >> do something like that with LoadedSSL. OK, here is attached what I had in mind as reload-ssl-v08-02.patch for reference. This applies on top of the main patch reload-ssl-v08-01.patch that is the same version as v7 with the issues I reported previously as addressed. LoadedSSL is mapped with a read-only GUC parameter that new sessions can query after connecting. The only use case where that would be useful would be when using sslmode=prefer to check whether the SSL context is loaded even if ssl has been switched from off to on. But let's be honest, pg_stat_ssl reports already this kind of information, making this patch at the end useless because LoadedSSL does not change for an already-spawned backend. > Thanks. I will be quite busy the upcoming couple of weeks so there will be a > while until I can sit down with this. Feel free to contribute to the patch. I am switching the patch as ready for committer. I have no more comments about this patch. Note to committer-san: please look only at v08-01. -- Michael
Attachment
On 12/5/16 12:17 AM, Michael Paquier wrote: > OK, here is attached what I had in mind as reload-ssl-v08-02.patch for > reference. This applies on top of the main patch > reload-ssl-v08-01.patch that is the same version as v7 with the issues > I reported previously as addressed. LoadedSSL is mapped with a > read-only GUC parameter that new sessions can query after connecting. > The only use case where that would be useful would be when using > sslmode=prefer to check whether the SSL context is loaded even if ssl > has been switched from off to on. But let's be honest, pg_stat_ssl > reports already this kind of information, making this patch at the end > useless because LoadedSSL does not change for an already-spawned > backend. Yeah, it seems that if you want to know whether you are using SSL, then we already have that. I don't see the need for this new read-only setting. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > Yeah, it seems that if you want to know whether you are using SSL, then > we already have that. I don't see the need for this new read-only setting. I concur --- there might be use for more reporting about SSL status, but that patch doesn't seem like quite the right thing. I've pushed the main patch with some small adjustments; one notable one that I made the EXEC_BACKEND reload path work like SIGHUP reload (press on without SSL if fail) not server start (abort if fail). As it stood, if you fat-fingered a change to your SSL files on Windows, the postmaster would keep running but all backends would die instantly, whether or not an SSL connection was being requested. That didn't seem helpful. I also went ahead and downgraded the "hostssl record with SSL turned off" case to a warning. Before we leave this area, though, there is a loose end that requires more thought. That is, what about passphrase-protected server keys? Our documentation suggests that if you have one, the server will demand the passphrase just once at server start and then all is good. I'm not sure if that's at all practical in modern usage, but in any case it's not going to be reasonable to put a passphrase in again at every SIGHUP. On Windows things are even worse; you'd have to give the passphrase again to every spawned backend. (But that was true already.) I can think of at least three things we might do about this: 1. Leave the code as it stands, and change the documentation to state that you cannot use a passphrase-protected server key, period. 2. Add a password callback function that would supply the passphrase when needed. The question is, where would it get that from? It'd be straightforward to supply it from a string GUC, but from a security POV it seems pretty silly to have the password in postgresql.conf. 3. Add a password callback function that just returns an empty string, thereby ensuring a clean failure if the server tries to load a passphrase-protected key. We'd still need to change the documentation but at least the behavior would be reasonably clean. I'm kinda leaning to the last choice; I don't want to leave things as they are, but actually making password-protected keys work in a useful way seems like a lot more effort than is justified. regards, tom lane
On Tue, Jan 3, 2017 at 4:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
-- Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> Yeah, it seems that if you want to know whether you are using SSL, then
> we already have that. I don't see the need for this new read-only setting.
I concur --- there might be use for more reporting about SSL status, but
that patch doesn't seem like quite the right thing.
I've pushed the main patch with some small adjustments; one notable one
that I made the EXEC_BACKEND reload path work like SIGHUP reload (press on
without SSL if fail) not server start (abort if fail). As it stood, if
you fat-fingered a change to your SSL files on Windows, the postmaster
would keep running but all backends would die instantly, whether or not
an SSL connection was being requested. That didn't seem helpful.
I also went ahead and downgraded the "hostssl record with SSL turned off"
case to a warning.
Before we leave this area, though, there is a loose end that requires
more thought. That is, what about passphrase-protected server keys?
Our documentation suggests that if you have one, the server will demand
the passphrase just once at server start and then all is good. I'm not
sure if that's at all practical in modern usage, but in any case it's
not going to be reasonable to put a passphrase in again at every SIGHUP.
On Windows things are even worse; you'd have to give the passphrase again
to every spawned backend. (But that was true already.)
I can think of at least three things we might do about this:
1. Leave the code as it stands, and change the documentation to state
that you cannot use a passphrase-protected server key, period.
2. Add a password callback function that would supply the passphrase
when needed. The question is, where would it get that from? It'd
be straightforward to supply it from a string GUC, but from a security
POV it seems pretty silly to have the password in postgresql.conf.
Yeah, that seems like a really bad idea. If you want to do that, then you might as well just remove the passphrase from the key.
3. Add a password callback function that just returns an empty string,
thereby ensuring a clean failure if the server tries to load a
passphrase-protected key. We'd still need to change the documentation
but at least the behavior would be reasonably clean.
I'm kinda leaning to the last choice; I don't want to leave things as they
are, but actually making password-protected keys work in a useful way
seems like a lot more effort than is justified.
The effort to deal with it may well be justified. IIRC, Greg Stark had some ideas of what he wanted to do with encryption keys (or maybe that was either somebody else or some other keys?), which also included getting the private key out of the general postmaster address space to protect it. But it *is* a bigger option, and in particular it is well out of scope of this patch.
Of the three options I agree 3 is probably the best.
Another option would be to use a callback to get the key value the first time, and then cache it so we can re-use it. That means we can make it work if the new key has the same password, but it also means we need to take care of protecting that passphrase. But I'm not sure that's any worse than the fact that we keep the private key around unlocked today?
That said, they passphrase should only be required if the key changes, not if the certificate changes, I believe. Do we take advantage of this today (sorry, haven't looked at the code)? Because the most common operation is probably the renewal of a certificate, which does not change the key, for example...
On 1/2/17 10:02 PM, Tom Lane wrote: > Before we leave this area, though, there is a loose end that requires > more thought. That is, what about passphrase-protected server keys? I don't have experience with this in practice, but my hunch would be that you can continue to use passphrases as before, but the new reload functionality is effectively not supported. That is, if you use passphrases and make a key change, you need to do a full restart. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 1/2/17 10:02 PM, Tom Lane wrote: >> Before we leave this area, though, there is a loose end that requires >> more thought. That is, what about passphrase-protected server keys? > I don't have experience with this in practice, but my hunch would be > that you can continue to use passphrases as before, but the new reload > functionality is effectively not supported. That is, if you use > passphrases and make a key change, you need to do a full restart. Um, no, that's not the situation. Since we reload the SSL_CTX each time we get SIGHUP, and no information is carried forward from the old SSL_CTX, a passphrase-protected server key causes the postmaster to freeze at SIGHUP until somebody types a new passphrase on its /dev/tty. (I've just verified this: it stops accepting connections or indeed doing anything at all.) In the likely event that /dev/tty no longer opens successfully, it would report failure of the SSL reload and stop accepting new SSL connections. Even if you thought the previous behavior was useful, this completely isn't. If there were a way to carry forward the decrypted key from the old SSL_CTX to the new one, we could perhaps preserve the old behavior. But I don't know of one, and even if there is one, it's likely very OpenSSL-specific --- do we want to assume that every TLS implementation will support that? regards, tom lane
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Jan 3, 2017 at 4:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Before we leave this area, though, there is a loose end that requires >> more thought. That is, what about passphrase-protected server keys? >> ... >> 2. Add a password callback function that would supply the passphrase >> when needed. The question is, where would it get that from? It'd >> be straightforward to supply it from a string GUC, but from a security >> POV it seems pretty silly to have the password in postgresql.conf. > Yeah, that seems like a really bad idea. If you want to do that, then you > might as well just remove the passphrase from the key. Agreed. It's difficult to imagine a situation in which postgresql.conf could be considered more secure than server.key, and usually it'd be less so. >> 3. Add a password callback function that just returns an empty string, >> thereby ensuring a clean failure if the server tries to load a >> passphrase-protected key. We'd still need to change the documentation >> but at least the behavior would be reasonably clean. > Another option would be to use a callback to get the key value the first > time, and then cache it so we can re-use it. That means we can make it work > if the new key has the same password, but it also means we need to take > care of protecting that passphrase. But I'm not sure that's any worse than > the fact that we keep the private key around unlocked today? Yeah, I'm not terribly fussed about having the passphrase sitting in postmaster memory. But the above is work I don't care to do ATM. I think probably the right thing for now is to install a do-nothing callback, so that at least we don't have the issue of the postmaster freezing at SIGHUP. If someone feels like trying to revive support of passphrase-protected server keys, that would be a perfectly fine base to build on; they'd need a callback there anyway. > That said, they passphrase should only be required if the key changes, not > if the certificate changes, I believe. Do we take advantage of this today > (sorry, haven't looked at the code)? Because the most common operation is > probably the renewal of a certificate, which does not change the key, for > example... As I just explained to Peter, we don't have any way to exploit that given the design of loading a whole new SSL_CTX. regards, tom lane
On Tue, Jan 3, 2017 at 4:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jan 3, 2017 at 4:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Before we leave this area, though, there is a loose end that requires
>> more thought. That is, what about passphrase-protected server keys?
>> ...
>> 2. Add a password callback function that would supply the passphrase
>> when needed. The question is, where would it get that from? It'd
>> be straightforward to supply it from a string GUC, but from a security
>> POV it seems pretty silly to have the password in postgresql.conf.
> Yeah, that seems like a really bad idea. If you want to do that, then you
> might as well just remove the passphrase from the key.
Agreed. It's difficult to imagine a situation in which postgresql.conf
could be considered more secure than server.key, and usually it'd be less
so.
>> 3. Add a password callback function that just returns an empty string,
>> thereby ensuring a clean failure if the server tries to load a
>> passphrase-protected key. We'd still need to change the documentation
>> but at least the behavior would be reasonably clean.
> Another option would be to use a callback to get the key value the first
> time, and then cache it so we can re-use it. That means we can make it work
> if the new key has the same password, but it also means we need to take
> care of protecting that passphrase. But I'm not sure that's any worse than
> the fact that we keep the private key around unlocked today?
Yeah, I'm not terribly fussed about having the passphrase sitting in
postmaster memory. But the above is work I don't care to do ATM.
I think probably the right thing for now is to install a do-nothing
callback, so that at least we don't have the issue of the postmaster
freezing at SIGHUP. If someone feels like trying to revive support
of passphrase-protected server keys, that would be a perfectly fine
base to build on; they'd need a callback there anyway.
Does it still support passphrase protected ones on startup, or did that get thrown out with the bathwater? I think that's definitely a separate thing and there are a nontrivial number of people who would be interested in a setup where they can use a passphrase to protect it initially, just not reload it.
> That said, they passphrase should only be required if the key changes, not
> if the certificate changes, I believe. Do we take advantage of this today
> (sorry, haven't looked at the code)? Because the most common operation is
> probably the renewal of a certificate, which does not change the key, for
> example...
As I just explained to Peter, we don't have any way to exploit that given
the design of loading a whole new SSL_CTX.
Bummer.
* Magnus Hagander (magnus@hagander.net) wrote: > On Tue, Jan 3, 2017 at 4:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Magnus Hagander <magnus@hagander.net> writes: > > > On Tue, Jan 3, 2017 at 4:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> Before we leave this area, though, there is a loose end that requires > > >> more thought. That is, what about passphrase-protected server keys? > > >> ... > > >> 2. Add a password callback function that would supply the passphrase > > >> when needed. The question is, where would it get that from? It'd > > >> be straightforward to supply it from a string GUC, but from a security > > >> POV it seems pretty silly to have the password in postgresql.conf. > > > > > Yeah, that seems like a really bad idea. If you want to do that, then you > > > might as well just remove the passphrase from the key. > > > > Agreed. It's difficult to imagine a situation in which postgresql.conf > > could be considered more secure than server.key, and usually it'd be less > > so. > > > > >> 3. Add a password callback function that just returns an empty string, > > >> thereby ensuring a clean failure if the server tries to load a > > >> passphrase-protected key. We'd still need to change the documentation > > >> but at least the behavior would be reasonably clean. > > > > > Another option would be to use a callback to get the key value the first > > > time, and then cache it so we can re-use it. That means we can make it > > work > > > if the new key has the same password, but it also means we need to take > > > care of protecting that passphrase. But I'm not sure that's any worse > > than > > > the fact that we keep the private key around unlocked today? > > > > Yeah, I'm not terribly fussed about having the passphrase sitting in > > postmaster memory. But the above is work I don't care to do ATM. > > > > I think probably the right thing for now is to install a do-nothing > > callback, so that at least we don't have the issue of the postmaster > > freezing at SIGHUP. If someone feels like trying to revive support > > of passphrase-protected server keys, that would be a perfectly fine > > base to build on; they'd need a callback there anyway. > > Does it still support passphrase protected ones on startup, or did that get > thrown out with the bathwater? I think that's definitely a separate thing > and there are a nontrivial number of people who would be interested in a > setup where they can use a passphrase to protect it initially, just not > reload it. +1 Thanks! Stephen
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Jan 3, 2017 at 4:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think probably the right thing for now is to install a do-nothing >> callback, so that at least we don't have the issue of the postmaster >> freezing at SIGHUP. If someone feels like trying to revive support >> of passphrase-protected server keys, that would be a perfectly fine >> base to build on; they'd need a callback there anyway. > Does it still support passphrase protected ones on startup, or did that get > thrown out with the bathwater? It does not; what would be the point, if the key would be lost at SIGHUP? > I think that's definitely a separate thing > and there are a nontrivial number of people who would be interested in a > setup where they can use a passphrase to protect it initially, just not > reload it. If any of those number of people want to step up and design/implement a non-broken solution for passphrases, that'd be fine with me. But I would want to see something that's actually a credible solution, allowing the postmaster to be started as a normal daemon. And working on Windows. regards, tom lane
On Wed, Jan 4, 2017 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jan 3, 2017 at 4:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think probably the right thing for now is to install a do-nothing
>> callback, so that at least we don't have the issue of the postmaster
>> freezing at SIGHUP. If someone feels like trying to revive support
>> of passphrase-protected server keys, that would be a perfectly fine
>> base to build on; they'd need a callback there anyway.
> Does it still support passphrase protected ones on startup, or did that get
> thrown out with the bathwater?
It does not; what would be the point, if the key would be lost at SIGHUP?
If we lost it, yes. But we could keep the old key around if it hasn't changed, thus behave just like we did in <= 9.6.
> I think that's definitely a separate thing
> and there are a nontrivial number of people who would be interested in a
> setup where they can use a passphrase to protect it initially, just not
> reload it.
If any of those number of people want to step up and design/implement
a non-broken solution for passphrases, that'd be fine with me. But
I would want to see something that's actually a credible solution,
allowing the postmaster to be started as a normal daemon. And working
on Windows.
Well, for all those people 9.6 worked significantly better... Because they could reload *other* config parameters without failure.
* Magnus Hagander (magnus@hagander.net) wrote: > On Wed, Jan 4, 2017 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Magnus Hagander <magnus@hagander.net> writes: > > > On Tue, Jan 3, 2017 at 4:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> I think probably the right thing for now is to install a do-nothing > > >> callback, so that at least we don't have the issue of the postmaster > > >> freezing at SIGHUP. If someone feels like trying to revive support > > >> of passphrase-protected server keys, that would be a perfectly fine > > >> base to build on; they'd need a callback there anyway. > > > > > Does it still support passphrase protected ones on startup, or did that > > get > > > thrown out with the bathwater? > > > > It does not; what would be the point, if the key would be lost at SIGHUP? > > If we lost it, yes. But we could keep the old key around if it hasn't > changed, thus behave just like we did in <= 9.6. Either that, or throw the same warning that we can't reload the SSL bits if we had to ask for a passphrase at startup, again like how it worked in 9.6. > > > I think that's definitely a separate thing > > > and there are a nontrivial number of people who would be interested in a > > > setup where they can use a passphrase to protect it initially, just not > > > reload it. > > > > If any of those number of people want to step up and design/implement > > a non-broken solution for passphrases, that'd be fine with me. But > > I would want to see something that's actually a credible solution, > > allowing the postmaster to be started as a normal daemon. And working > > on Windows. > > Well, for all those people 9.6 worked significantly better... Because they > could reload *other* config parameters without failure. Indeed, this is important functionality that people are using. I don't agree with simply removing it because we want to make these options able to be changed on a reload, that's not a good trade-off. Thanks! Stephen
On 01/04/2017 03:48 PM, Magnus Hagander wrote: > On Wed, Jan 4, 2017 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us > It does not; what would be the point, if the key would be lost at > SIGHUP? > > If we lost it, yes. But we could keep the old key around if it hasn't > changed, thus behave just like we did in <= 9.6. That means storing the pass phrase in the memory of the postmaster, which does not sound like a terribly good idea to me, but I have never used keys with pass phrases for daemons so it might be a common solution which is acceptable by many. > If any of those number of people want to step up and design/implement > a non-broken solution for passphrases, that'd be fine with me. But > I would want to see something that's actually a credible solution, > allowing the postmaster to be started as a normal daemon. And working > on Windows. > > Well, for all those people 9.6 worked significantly better... Because > they could reload *other* config parameters without failure. A possible solution might be to only add the error throwing hook when loading certificates during SIGHUP (and at Windows) and to work as before on startup. Would that be an acceptable solution? I could write a patch for this if people are interested. Andreas
* Andreas Karlsson (andreas@proxel.se) wrote: > On 01/04/2017 03:48 PM, Magnus Hagander wrote: > >On Wed, Jan 4, 2017 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us > > It does not; what would be the point, if the key would be lost at > > SIGHUP? > > > >If we lost it, yes. But we could keep the old key around if it hasn't > >changed, thus behave just like we did in <= 9.6. > > That means storing the pass phrase in the memory of the postmaster, > which does not sound like a terribly good idea to me, but I have > never used keys with pass phrases for daemons so it might be a > common solution which is acceptable by many. I'm not sure I see that as a big deal- if you can access the postmaster's memory then you can just pull out the actual private key itself, no? There's a bigger issue here though, which has to do with what happens if the user actually does change the passphrase on the key and we then reload, what do we do then? > > If any of those number of people want to step up and design/implement > > a non-broken solution for passphrases, that'd be fine with me. But > > I would want to see something that's actually a credible solution, > > allowing the postmaster to be started as a normal daemon. And working > > on Windows. > > > >Well, for all those people 9.6 worked significantly better... Because > >they could reload *other* config parameters without failure. > > A possible solution might be to only add the error throwing hook > when loading certificates during SIGHUP (and at Windows) and to work > as before on startup. Would that be an acceptable solution? I could > write a patch for this if people are interested. I'm not sure I see how that's a solution..? Wouldn't that mean that a SIGHUP with an encrypted key would result in a failure? The solution, at least in my view, seems to be to say "sorry, we can't reload the SSL stuff if you used a passphrase to unlock the key on startup, you will have to perform a restart if you want the SSL bits to be changed." No, I've not looked at what that would require in terms of code. Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes: > Indeed, this is important functionality that people are using. Who exactly are these people, and why haven't they complained about how crappy the support is now? I'm *completely* unconvinced by the argument that the way it has worked in the past is an important feature that we have to preserve. It's an accident that it worked at all, and as far as I can tell it didn't work very well. Have you tried it? On my machine, it could not accept a passphrase at all unless I didn't detach the postmaster from the terminal, which is entirely silly as a production solution. In short, I reject the above argument 100%. I'm all for inventing a solution in which passphrases work usefully, but don't tell me that what we had was such a solution. regards, tom lane
On 01/04/2017 04:14 PM, Stephen Frost wrote: > * Andreas Karlsson (andreas@proxel.se) wrote: >> A possible solution might be to only add the error throwing hook >> when loading certificates during SIGHUP (and at Windows) and to work >> as before on startup. Would that be an acceptable solution? I could >> write a patch for this if people are interested. > > I'm not sure I see how that's a solution..? Wouldn't that mean that a > SIGHUP with an encrypted key would result in a failure? > > The solution, at least in my view, seems to be to say "sorry, we can't > reload the SSL stuff if you used a passphrase to unlock the key on > startup, you will have to perform a restart if you want the SSL bits to > be changed." Sorry, I was very unclear. I meant refusing the reload the SSL context if there is a pass phrase, but that the rest of the config will be reloaded just fine. This will lead to some log spam on every SIGHUP for people with a pass phrase but should otherwise work as before. Andreas
* Andreas Karlsson (andreas@proxel.se) wrote: > On 01/04/2017 04:14 PM, Stephen Frost wrote: > >* Andreas Karlsson (andreas@proxel.se) wrote: > >>A possible solution might be to only add the error throwing hook > >>when loading certificates during SIGHUP (and at Windows) and to work > >>as before on startup. Would that be an acceptable solution? I could > >>write a patch for this if people are interested. > > > >I'm not sure I see how that's a solution..? Wouldn't that mean that a > >SIGHUP with an encrypted key would result in a failure? > > > >The solution, at least in my view, seems to be to say "sorry, we can't > >reload the SSL stuff if you used a passphrase to unlock the key on > >startup, you will have to perform a restart if you want the SSL bits to > >be changed." > > Sorry, I was very unclear. I meant refusing the reload the SSL > context if there is a pass phrase, but that the rest of the config > will be reloaded just fine. This will lead to some log spam on every > SIGHUP for people with a pass phrase but should otherwise work as > before. Right, that sounds like it'd work for me, at least. Thanks! Stephen
Andreas Karlsson <andreas@proxel.se> writes: > Sorry, I was very unclear. I meant refusing the reload the SSL context > if there is a pass phrase, but that the rest of the config will be > reloaded just fine. This will lead to some log spam on every SIGHUP for > people with a pass phrase but should otherwise work as before. How will you know whether there's a pass phrase? regards, tom lane
On 1/4/17 10:26 AM, Tom Lane wrote: > Andreas Karlsson <andreas@proxel.se> writes: >> Sorry, I was very unclear. I meant refusing the reload the SSL context >> if there is a pass phrase, but that the rest of the config will be >> reloaded just fine. This will lead to some log spam on every SIGHUP for >> people with a pass phrase but should otherwise work as before. > > How will you know whether there's a pass phrase? One could register a password callback that remembers whether it was called. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 1/4/17 10:26 AM, Tom Lane wrote: >> How will you know whether there's a pass phrase? > One could register a password callback that remembers whether it was called. Hmm ... actually, we don't even need to work that hard. If we simply use the callback that's there now, but only during reloads not server start, then we get the desired behavior. Reloads will fail because the wrong passphrase was returned by the callback, and we'll keep the current SSL state. It would probably be worth tweaking things to minimize the amount of log spam that you get from that; but it would work, for values of "work" similar to what was there before. I still maintain that the existing solution for passphrases is useless, but in the interest of removing objections to the current patch, I'll go make that happen. regards, tom lane
On 1/4/17 10:57 AM, Tom Lane wrote: > I still maintain that the existing solution for passphrases is useless, > but in the interest of removing objections to the current patch, I'll > go make that happen. Sounds good. Looking around briefly (e.g., Apache, nginx), the standard approach appears to be a configuration setting that gets the password from an external program or file. (Although the default still appears to be to get from tty.) systemd has support for getting passwords to services without tty. So if someone is interested, there is some room for enhancement here. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote: > On 1/4/17 10:57 AM, Tom Lane wrote: > > I still maintain that the existing solution for passphrases is useless, > > but in the interest of removing objections to the current patch, I'll > > go make that happen. > > Sounds good. Agreed, thanks. > Looking around briefly (e.g., Apache, nginx), the standard approach > appears to be a configuration setting that gets the password from an > external program or file. (Although the default still appears to be to > get from tty.) Right, the MIT Kerberos daemon will definitely prompt for the passphrase for the master key on the terminal also. They might also have a way to get it from a program now, not sure, it's been a while, but it was a requirement from NIST 800-53 to not have unencrypted keys on the filesystem and I had to address that for the MIT Kerberos master key and the private keys for various SSL-using services. > systemd has support for getting passwords to services without tty. Oh, that's interesting, I wasn't aware of that. > So if someone is interested, there is some room for enhancement here. Agreed. Thanks! Stephen
On Wed, Jan 4, 2017 at 11:49 AM, Stephen Frost <sfrost@snowman.net> wrote: >> systemd has support for getting passwords to services without tty. > > Oh, that's interesting, I wasn't aware of that. > >> So if someone is interested, there is some room for enhancement here. > > Agreed. The first thing that pops into my head is that we could add a GUC ssl_cert_passphrase_command whose contents get executed as a shell command when we need a passphrase; that program is expected to emit the passphrase and nothing else on standard output and then exit(0). Blah blah logging blah blah failure handling. That's not trivial to implement if you want the postmaster to still be responsive while the command is running, but I think it could be done. (I'm not volunteering.) Of course, if there's some sort of commonly-used library out there for this sort of thing where we can just link against it and call whatever APIs it exposes, that might be a better alternative, or something to support in addition, but I don't really know whether there's any standardization in this area. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Of course, if there's some sort of commonly-used library out there for > this sort of thing where we can just link against it and call whatever > APIs it exposes, that might be a better alternative, or something to > support in addition, but I don't really know whether there's any > standardization in this area. I was wondering if we could make use of ssh-agent. But it seems to want to hold the private key itself, so that you have to communicate with it every time you need an operation done with the key. I'm not sure what the performance of that is like, and I am sure that the code would look a whole lot different from the path where we hold the key locally. It might be workable if OpenSSL already incorporates library routines for talking to ssh-agent, but I haven't looked. regards, tom lane