Thread: [PATCH] fix race condition in libpq (related to ssl connections)
Hi, I've found a race condition in libpq. It is about the initialization of the my_bio_methods static variable in fe-secure-openssl.c, which is not protected by any lock. The race condition may make the initialization of the connection fail, and as an additional weird consequence, it might cause openssl call close(0), so stdin of the client application gets closed. I've prepared a patch to protect the initialization of my_bio_methods from the race. This is my first patch submission to the postgresql project, so I hope I didn't miss anything. Any comments and suggestions are of course very welcome. I also prepared a testcase. In the testcase tarball, there is a patch that adds sleeps at the right positions to make the close(0) problem occur practically always. It also includes comments to explain how the race can end up calling close(0). Concerning the patch, it is only tested on Linux. I'm unsure about whether the simple initialization of the mutex would work nowadays also on Windows or whether the more complicated initialization also to be found for the ssl_config_mutex in the same source file needs to be used. Let me know whether I should adapt that. We discovered the problem with release 11.5, but the patch and the testcase are against the master branch. Regards, Willi -- ___________________________________________________ Dr. Willi Mann | Principal Software Engineer, Tech Lead PQL Celonis SE | Theresienstrasse 6 | 80333 Munich, Germany F: +4989416159679 w.mann@celonis.com | www.celonis.com | LinkedIn | Twitter | Xing AG Munich HRB 225439 | Management: Martin Klenk, Bastian Nominacher, Alexander Rinke
Attachment
Hi, > I've found a race condition in libpq. It is about the initialization of > the my_bio_methods static variable in fe-secure-openssl.c, which is not > protected by any lock. The race condition may make the initialization of > the connection fail, and as an additional weird consequence, it might > cause openssl call close(0), so stdin of the client application gets > closed. Thanks for the patch. Interestingly enough we have PQisthreadsafe() [1], but it's current implementation is: ``` /* libpq is thread-safe? */ int PQisthreadsafe(void) { return true; } ``` I wonder if we should just document that libpq is thread safe as of PG v??? and deprecate PQisthreadsafe() at some point. Currently the documentation gives an impression that the library may or may not be thread safe depending on the circumstances. > I've prepared a patch to protect the initialization of my_bio_methods > from the race. This is my first patch submission to the postgresql > project, so I hope I didn't miss anything. Any comments and suggestions > are of course very welcome. > > I also prepared a testcase. In the testcase tarball, there is a patch > that adds sleeps at the right positions to make the close(0) problem > occur practically always. It also includes comments to explain how the > race can end up calling close(0). > > Concerning the patch, it is only tested on Linux. I'm unsure about > whether the simple initialization of the mutex would work nowadays also > on Windows or whether the more complicated initialization also to be > found for the ssl_config_mutex in the same source file needs to be used. > Let me know whether I should adapt that. Please add the patch to the nearest open commit fest [2]. The patch will be automatically picked up by cfbot [3] and tested on different platforms. Also this way it will not be lost among other patches. The code looks OK but I would appreciate a second opinion from cfbot. Also maybe a few comments before my_BIO_methods_init_mutex and/or pthread_mutex_lock would be appropriate. Personally I am inclined to think that the automatic test in this particular case is redundant. [1]: https://www.postgresql.org/docs/current/libpq-threading.html [2]: https://commitfest.postgresql.org/ [3]: http://cfbot.cputube.org/ -- Best regards, Aleksander Alekseev
On Tue, Nov 21, 2023 at 12:14:16PM +0300, Aleksander Alekseev wrote: Thanks for the report, Willi, and the test case! Thanks Aleksander for the reply. > I wonder if we should just document that libpq is thread safe as of PG > v??? and deprecate PQisthreadsafe() at some point. Currently the > documentation gives an impression that the library may or may not be > thread safe depending on the circumstances. Because --disable-thread-safe has been removed recently in 68a4b58eca03. The routine could be marked as deprecated on top of saying that it always returns 1 for 17~. > Please add the patch to the nearest open commit fest [2]. The patch > will be automatically picked up by cfbot [3] and tested on different > platforms. Also this way it will not be lost among other patches. > > The code looks OK but I would appreciate a second opinion from cfbot. > Also maybe a few comments before my_BIO_methods_init_mutex and/or > pthread_mutex_lock would be appropriate. Personally I am inclined to > think that the automatic test in this particular case is redundant. I am not really convinced that we require a second mutex here, as there is always a concern with inter-locking changes. I may be missing something, of course, but BIO_s_socket() is just a pointer to a set of callbacks hidden in bss_sock.c with BIO_meth_new() and BIO_get_new_index() assigning some centralized data to handle the methods in a controlled way in OpenSSL. We only case about initializing once for the sake of libpq's threads, so wouldn't it be better to move my_BIO_s_socket() in pgtls_init() where the initialization of the BIO methods would be protected by ssl_config_mutex? That's basically what Willi means in his first message, no? -- Michael
Attachment
On Tue, Nov 21, 2023 at 12:14:16PM +0300, Aleksander Alekseev wrote: > Please add the patch to the nearest open commit fest [2]. The patch > will be automatically picked up by cfbot [3] and tested on different > platforms. Also this way it will not be lost among other patches. I have noticed that this was not tracked yet, so I have added an entry here: https://commitfest.postgresql.org/46/4670/ Willi, note that this requires a PostgreSQL community account, and it does not seem like you have one, or I would have added you as author ;) -- Michael
Attachment
On Wed, Nov 22, 2023 at 2:44 PM Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Nov 21, 2023 at 12:14:16PM +0300, Aleksander Alekseev wrote: > > I wonder if we should just document that libpq is thread safe as of PG > > v??? and deprecate PQisthreadsafe() at some point. Currently the > > documentation gives an impression that the library may or may not be > > thread safe depending on the circumstances. > > Because --disable-thread-safe has been removed recently in > 68a4b58eca03. The routine could be marked as deprecated on top of > saying that it always returns 1 for 17~. See also commit ce0b0fa3 "Doc: Adjust libpq docs about thread safety."
On Wed, Nov 22, 2023 at 02:58:15PM +1300, Thomas Munro wrote: > On Wed, Nov 22, 2023 at 2:44 PM Michael Paquier <michael@paquier.xyz> wrote: >> Because --disable-thread-safe has been removed recently in >> 68a4b58eca03. The routine could be marked as deprecated on top of >> saying that it always returns 1 for 17~. > > See also commit ce0b0fa3 "Doc: Adjust libpq docs about thread safety." Sure, I've noticed that in the docs, but declaring it as deprecated is a different topic, and we don't actually use this term in the docs for this routine, no? -- Michael
Attachment
On Wed, Nov 22, 2023 at 10:43:32AM +0900, Michael Paquier wrote: > I am not really convinced that we require a second mutex here, as > there is always a concern with inter-locking changes. I may be > missing something, of course, but BIO_s_socket() is just a pointer to > a set of callbacks hidden in bss_sock.c with BIO_meth_new() and > BIO_get_new_index() assigning some centralized data to handle the > methods in a controlled way in OpenSSL. I was looking at the origin of this one, and this is an issue coming down to 8bb14cdd33de that has removed the ssl_config_mutex taken in pgtls_open_client() when we called my_SSL_set_fd(). The commit has accidentally missed that path with the static BIO method where the mutex mattered. > We only case about > initializing once for the sake of libpq's threads, so wouldn't it be > better to move my_BIO_s_socket() in pgtls_init() where the > initialization of the BIO methods would be protected by > ssl_config_mutex? That's basically what Willi means in his first > message, no? I've looked at this idea, and finished by being unhappy with the error handling that we are currently assuming in my_SSL_set_fd() in the event of an error in the bio method setup, which would be most likely an OOM, so let's use ssl_config_mutex in my_BIO_s_socket(). Another thing is that I have minimized the manipulation of my_bio_methods in the setup routine. I've been also testing the risks of collusion, and it takes me quite a a few tries with hundred threads to reproduce the failure even without any forced sleep, so that seems really hard to reach. Please find attached a patch (still need to do more checks with older versions of OpenSSL). Any thoughts or comments? -- Michael
Attachment
On Fri, Nov 24, 2023 at 04:48:58PM +0900, Michael Paquier wrote: > I've looked at this idea, and finished by being unhappy with the error > handling that we are currently assuming in my_SSL_set_fd() in the > event of an error in the bio method setup, which would be most likely > an OOM, so let's use ssl_config_mutex in my_BIO_s_socket(). Another > thing is that I have minimized the manipulation of my_bio_methods in > the setup routine. I've spent more time on that today, and the patch I've posted on Friday had a small mistake in the non-HAVE_BIO_METH_NEW path when saving the BIO_METHODs causing the SSL tests to fail with older OpenSSL versions. I've fixed that and the patch was straight-forward, so applied it down to v12. I didn't use Willi's patch at the end, still credited him as author as his original patch is rather close to the result committed and it feels that he has spent a good deal of time on this issue. -- Michael