Re: libpq's multi-threaded SSL callback handling is busted - Mailing list pgsql-hackers
From | Jan Urbański |
---|---|
Subject | Re: libpq's multi-threaded SSL callback handling is busted |
Date | |
Msg-id | 87a90kmjyc.fsf@wulczer.org Whole thread Raw |
In response to | libpq's multi-threaded SSL callback handling is busted (Jan Urbański <wulczer@wulczer.org>) |
List | pgsql-hackers |
Jan Urbański writes: > I did some more digging on bug > http://www.postgresql.org/message-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com > which describes a deadlock when using libpq with SSL in a multi-threaded > environment with other threads doing SSL independently. > > [reproducing instructions] > > I posit we should remove all CRYPTO_set_*_callback functions and associated > cruft from libpq. > > I could submit a patch to get rid of the crazy CRYPTO_*_callback dance in > libpq, but at the very least this will require a warning in the release notes Attached is a patch doing just that. > I would very much like to have this change back-patched, since setting and > resetting the callback makes using libpq in a threaded OpenSSL-enabled app > arguably less safe than if it didn't use any locking. Also attached is a patch for 9.4 and all previous supported releases, which is the same thing, but adjusted for when we didn't have a separate fe-secure.c and fe-secure-openssl.c If committed, this change will warrant a notice in the release notes. I could try drafting it, if that'd be helpful. Cheers, Jan commit 62f7433f697d49ab005ad22822dc943661a4e48a Author: Jan Urbański <wulczer@wulczer.org> Date: Wed Feb 11 17:47:09 2015 +0100 Do not attempt to manage OpenSSL locking callbacks in libpq. According to OpenSSL documentation, threaded programs using SSL should register locking callbacks before starting work. In 6daf396879b6502c41146cdb1013568654f52ff6 libpq started doing that, registering its callbacks when the connection would switch to SSL mode. This would lead to overwriting any locking callback a properly written threaded SSL application might have already set, but only got uncovered as a bug in http://www.postgresql.org/message-id/48620925.6070806@pws.com.au where it turned out that unloading libpq would leave a dangling function pointer. Specifically, the PHP runtime would segfault after unloading libpq, as reported in https://bugs.php.net/bug.php?id=40926 Commit 4e816286533dd34c10b368487d4079595a3e1418 made libpq unset the callback after the last SSL connection was closed. This solved the segfault issue, but introduced a potential for deadlocks when one thread using libpq with SSL and another doing an unrelated SSL operation. T1 (libpq) T2 (other) start libpq connection add locking callback start SSL operation take lock finish libpq connection remove locking callback (!) release lock, noop since no callback This got reported in bug http://www.postgresql.org/message-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com The reality is that it's not libpq's responsibility to manage those OpenSSL callbacks. It's up to application code to set them up before using OpenSSL in threads and trying to handle them in libpq is just stomping on the ones that should already by in place by the time we start using shared OpenSSL structures. This commit gets rid of all OpenSSL callback handling from libpq. For well-behaved applications, this should increase reliability, since they will now have certainty that the callbacks they set up before attempting to use OpenSSL will be used throughout the program execution. Applications that relied on libpq to set up locking callbacks will need to be fixed. diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c index a32af34..5d0b132 100644 --- a/src/interfaces/libpq/fe-secure-openssl.c +++ b/src/interfaces/libpq/fe-secure-openssl.c @@ -67,7 +67,6 @@ static int verify_cb(int ok, X509_STORE_CTX *ctx); static int verify_peer_name_matches_certificate_name(PGconn *conn, ASN1_STRING *name, char **store_name); -static void destroy_ssl_system(void); static int initialize_SSL(PGconn *conn); static PostgresPollingStatusType open_client_SSL(PGconn *); static char *SSLerrmessage(void); @@ -90,8 +89,6 @@ static bool pq_init_crypto_lib = true; static SSL_CTX *SSL_context = NULL; #ifdef ENABLE_THREAD_SAFETY -static long ssl_open_connections = 0; - #ifndef WIN32 static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER; #else @@ -112,16 +109,6 @@ static long win32_ssl_create_mutex = 0; void pgtls_init_library(bool do_ssl, int do_crypto) { -#ifdef ENABLE_THREAD_SAFETY - - /* - * Disallow changing the flags while we have open connections, else we'd - * get completely confused. - */ - if (ssl_open_connections != 0) - return; -#endif - pq_init_ssl_lib = do_ssl; pq_init_crypto_lib = do_crypto; } @@ -705,40 +692,6 @@ verify_peer_name_matches_certificate(PGconn *conn) return found_match && !got_error; } -#ifdef ENABLE_THREAD_SAFETY -/* - * Callback functions for OpenSSL internal locking - */ - -static unsigned long -pq_threadidcallback(void) -{ - /* - * This is not standards-compliant. pthread_self() returns pthread_t, and - * shouldn't be cast to unsigned long, but CRYPTO_set_id_callback requires - * it, so we have to do it. - */ - return (unsigned long) pthread_self(); -} - -static pthread_mutex_t *pq_lockarray; - -static void -pq_lockingcallback(int mode, int n, const char *file, int line) -{ - if (mode & CRYPTO_LOCK) - { - if (pthread_mutex_lock(&pq_lockarray[n])) - PGTHREAD_ERROR("failed to lock mutex"); - } - else - { - if (pthread_mutex_unlock(&pq_lockarray[n])) - PGTHREAD_ERROR("failed to unlock mutex"); - } -} -#endif /* ENABLE_THREAD_SAFETY */ - /* * Initialize SSL system, in particular creating the SSL_context object * that will be shared by all SSL-using connections in this process. @@ -776,41 +729,6 @@ pgtls_init(PGconn *conn) if (pthread_mutex_lock(&ssl_config_mutex)) return -1; - if (pq_init_crypto_lib) - { - /* - * If necessary, set up an array to hold locks for libcrypto. - * libcrypto will tell us how big to make this array. - */ - if (pq_lockarray == NULL) - { - int i; - - pq_lockarray = malloc(sizeof(pthread_mutex_t) * CRYPTO_num_locks()); - if (!pq_lockarray) - { - pthread_mutex_unlock(&ssl_config_mutex); - return -1; - } - for (i = 0; i < CRYPTO_num_locks(); i++) - { - if (pthread_mutex_init(&pq_lockarray[i], NULL)) - { - free(pq_lockarray); - pq_lockarray = NULL; - pthread_mutex_unlock(&ssl_config_mutex); - return -1; - } - } - } - - if (ssl_open_connections++ == 0) - { - /* These are only required for threaded libcrypto applications */ - CRYPTO_set_id_callback(pq_threadidcallback); - CRYPTO_set_locking_callback(pq_lockingcallback); - } - } #endif /* ENABLE_THREAD_SAFETY */ if (!SSL_context) @@ -862,48 +780,6 @@ pgtls_init(PGconn *conn) } /* - * This function is needed because if the libpq library is unloaded - * from the application, the callback functions will no longer exist when - * libcrypto is used by other parts of the system. For this reason, - * we unregister the callback functions when the last libpq - * connection is closed. (The same would apply for OpenSSL callbacks - * if we had any.) - * - * Callbacks are only set when we're compiled in threadsafe mode, so - * we only need to remove them in this case. - */ -static void -destroy_ssl_system(void) -{ -#ifdef ENABLE_THREAD_SAFETY - /* Mutex is created in initialize_ssl_system() */ - if (pthread_mutex_lock(&ssl_config_mutex)) - return; - - if (pq_init_crypto_lib && ssl_open_connections > 0) - --ssl_open_connections; - - if (pq_init_crypto_lib && ssl_open_connections == 0) - { - /* No connections left, unregister libcrypto callbacks */ - CRYPTO_set_locking_callback(NULL); - CRYPTO_set_id_callback(NULL); - - /* - * We don't free the lock array or the SSL_context. If we get another - * connection in this process, we will just re-use them with the - * existing mutexes. - * - * This means we leak a little memory on repeated load/unload of the - * library. - */ - } - - pthread_mutex_unlock(&ssl_config_mutex); -#endif -} - -/* * Initialize (potentially) per-connection SSL data, namely the * client certificate, private key, and trusted CA certs. * @@ -1399,17 +1275,8 @@ open_client_SSL(PGconn *conn) void pgtls_close(PGconn *conn) { - bool destroy_needed = false; - if (conn->ssl) { - /* - * We can't destroy everything SSL-related here due to the possible - * later calls to OpenSSL routines which may need our thread - * callbacks, so set a flag here and check at the end. - */ - destroy_needed = true; - SSL_shutdown(conn->ssl); SSL_free(conn->ssl); conn->ssl = NULL; @@ -1430,17 +1297,6 @@ pgtls_close(PGconn *conn) conn->engine = NULL; } #endif - - /* - * This will remove our SSL locking hooks, if this is the last SSL - * connection, which means we must wait to call it until after all SSL - * calls have been made, otherwise we can end up with a race condition and - * possible deadlocks. - * - * See comments above destroy_ssl_system(). - */ - if (destroy_needed) - destroy_ssl_system(); } commit 97847bf888e5b7a2c2aed2ef5e4eb92b0fa3c9e6 Author: Jan Urbański <wulczer@wulczer.org> Date: Wed Feb 11 17:47:09 2015 +0100 Do not attempt to manage OpenSSL locking callbacks in libpq. According to OpenSSL documentation, threaded programs using SSL should register locking callbacks before starting work. In 6daf396879b6502c41146cdb1013568654f52ff6 libpq started doing that, registering its callbacks when the connection would switch to SSL mode. This would lead to overwriting any locking callback a properly written threaded SSL application might have already set, but only got uncovered as a bug in http://www.postgresql.org/message-id/48620925.6070806@pws.com.au where it turned out that unloading libpq would leave a dangling function pointer. Specifically, the PHP runtime would segfault after unloading libpq, as reported in https://bugs.php.net/bug.php?id=40926 Commit 4e816286533dd34c10b368487d4079595a3e1418 made libpq unset the callback after the last SSL connection was closed. This solved the segfault issue, but introduced a potential for deadlocks when one thread using libpq with SSL and another doing an unrelated SSL operation. T1 (libpq) T2 (other) start libpq connection add locking callback start SSL operation take lock finish libpq connection remove locking callback (!) release lock, noop since no callback This got reported in bug http://www.postgresql.org/message-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com The reality is that it's not libpq's responsibility to manage those OpenSSL callbacks. It's up to application code to set them up before using OpenSSL in threads and trying to handle them in libpq is just stomping on the ones that should already by in place by the time we start using shared OpenSSL structures. This commit gets rid of all OpenSSL callback handling from libpq. For well-behaved applications, this should increase reliability, since they will now have certainty that the callbacks they set up before attempting to use OpenSSL will be used throughout the program execution. Applications that relied on libpq to set up locking callbacks will need to be fixed. diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c index 9ba3567..56367b8 100644 --- a/src/interfaces/libpq/fe-secure.c +++ b/src/interfaces/libpq/fe-secure.c @@ -82,9 +82,7 @@ static bool verify_peer_name_matches_certificate(PGconn *); static int verify_cb(int ok, X509_STORE_CTX *ctx); static int init_ssl_system(PGconn *conn); -static void destroy_ssl_system(void); static int initialize_SSL(PGconn *conn); -static void destroySSL(void); static PostgresPollingStatusType open_client_SSL(PGconn *); static void close_SSL(PGconn *); static char *SSLerrmessage(void); @@ -101,8 +99,6 @@ static bool pq_init_crypto_lib = true; static SSL_CTX *SSL_context = NULL; #ifdef ENABLE_THREAD_SAFETY -static long ssl_open_connections = 0; - #ifndef WIN32 static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER; #else @@ -205,16 +201,6 @@ void PQinitOpenSSL(int do_ssl, int do_crypto) { #ifdef USE_SSL -#ifdef ENABLE_THREAD_SAFETY - - /* - * Disallow changing the flags while we have open connections, else we'd - * get completely confused. - */ - if (ssl_open_connections != 0) - return; -#endif - pq_init_ssl_lib = do_ssl; pq_init_crypto_lib = do_crypto; #endif @@ -236,17 +222,6 @@ pqsecure_initialize(PGconn *conn) } /* - * Destroy global context - */ -void -pqsecure_destroy(void) -{ -#ifdef USE_SSL - destroySSL(); -#endif -} - -/* * Begin or continue negotiating a secure session. */ PostgresPollingStatusType @@ -848,40 +823,6 @@ verify_peer_name_matches_certificate(PGconn *conn) return result; } -#ifdef ENABLE_THREAD_SAFETY -/* - * Callback functions for OpenSSL internal locking - */ - -static unsigned long -pq_threadidcallback(void) -{ - /* - * This is not standards-compliant. pthread_self() returns pthread_t, and - * shouldn't be cast to unsigned long, but CRYPTO_set_id_callback requires - * it, so we have to do it. - */ - return (unsigned long) pthread_self(); -} - -static pthread_mutex_t *pq_lockarray; - -static void -pq_lockingcallback(int mode, int n, const char *file, int line) -{ - if (mode & CRYPTO_LOCK) - { - if (pthread_mutex_lock(&pq_lockarray[n])) - PGTHREAD_ERROR("failed to lock mutex"); - } - else - { - if (pthread_mutex_unlock(&pq_lockarray[n])) - PGTHREAD_ERROR("failed to unlock mutex"); - } -} -#endif /* ENABLE_THREAD_SAFETY */ - /* * Initialize SSL system, in particular creating the SSL_context object * that will be shared by all SSL-using connections in this process. @@ -919,41 +860,6 @@ init_ssl_system(PGconn *conn) if (pthread_mutex_lock(&ssl_config_mutex)) return -1; - if (pq_init_crypto_lib) - { - /* - * If necessary, set up an array to hold locks for libcrypto. - * libcrypto will tell us how big to make this array. - */ - if (pq_lockarray == NULL) - { - int i; - - pq_lockarray = malloc(sizeof(pthread_mutex_t) * CRYPTO_num_locks()); - if (!pq_lockarray) - { - pthread_mutex_unlock(&ssl_config_mutex); - return -1; - } - for (i = 0; i < CRYPTO_num_locks(); i++) - { - if (pthread_mutex_init(&pq_lockarray[i], NULL)) - { - free(pq_lockarray); - pq_lockarray = NULL; - pthread_mutex_unlock(&ssl_config_mutex); - return -1; - } - } - } - - if (ssl_open_connections++ == 0) - { - /* These are only required for threaded libcrypto applications */ - CRYPTO_set_id_callback(pq_threadidcallback); - CRYPTO_set_locking_callback(pq_lockingcallback); - } - } #endif /* ENABLE_THREAD_SAFETY */ if (!SSL_context) @@ -1005,48 +911,6 @@ init_ssl_system(PGconn *conn) } /* - * This function is needed because if the libpq library is unloaded - * from the application, the callback functions will no longer exist when - * libcrypto is used by other parts of the system. For this reason, - * we unregister the callback functions when the last libpq - * connection is closed. (The same would apply for OpenSSL callbacks - * if we had any.) - * - * Callbacks are only set when we're compiled in threadsafe mode, so - * we only need to remove them in this case. - */ -static void -destroy_ssl_system(void) -{ -#ifdef ENABLE_THREAD_SAFETY - /* Mutex is created in initialize_ssl_system() */ - if (pthread_mutex_lock(&ssl_config_mutex)) - return; - - if (pq_init_crypto_lib && ssl_open_connections > 0) - --ssl_open_connections; - - if (pq_init_crypto_lib && ssl_open_connections == 0) - { - /* No connections left, unregister libcrypto callbacks */ - CRYPTO_set_locking_callback(NULL); - CRYPTO_set_id_callback(NULL); - - /* - * We don't free the lock array or the SSL_context. If we get another - * connection in this process, we will just re-use them with the - * existing mutexes. - * - * This means we leak a little memory on repeated load/unload of the - * library. - */ - } - - pthread_mutex_unlock(&ssl_config_mutex); -#endif -} - -/* * Initialize (potentially) per-connection SSL data, namely the * client certificate, private key, and trusted CA certs. * @@ -1451,12 +1315,6 @@ initialize_SSL(PGconn *conn) return 0; } -static void -destroySSL(void) -{ - destroy_ssl_system(); -} - /* * Attempt to negotiate SSL connection. */ @@ -1548,19 +1406,10 @@ open_client_SSL(PGconn *conn) static void close_SSL(PGconn *conn) { - bool destroy_needed = false; - if (conn->ssl) { DECLARE_SIGPIPE_INFO(spinfo); - /* - * We can't destroy everything SSL-related here due to the possible - * later calls to OpenSSL routines which may need our thread - * callbacks, so set a flag here and check at the end. - */ - destroy_needed = true; - DISABLE_SIGPIPE(conn, spinfo, (void) 0); SSL_shutdown(conn->ssl); SSL_free(conn->ssl); @@ -1584,17 +1433,6 @@ close_SSL(PGconn *conn) conn->engine = NULL; } #endif - - /* - * This will remove our SSL locking hooks, if this is the last SSL - * connection, which means we must wait to call it until after all SSL - * calls have been made, otherwise we can end up with a race condition and - * possible deadlocks. - * - * See comments above destroy_ssl_system(). - */ - if (destroy_needed) - pqsecure_destroy(); } /* diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index 4aeb4fa..e10af9e 100644 --- a/src/interfaces/libpq/libpq-int.h +++ b/src/interfaces/libpq/libpq-int.h @@ -598,7 +598,6 @@ extern int pqWriteReady(PGconn *conn); /* === in fe-secure.c === */ extern int pqsecure_initialize(PGconn *); -extern void pqsecure_destroy(void); extern PostgresPollingStatusType pqsecure_open_client(PGconn *); extern void pqsecure_close(PGconn *); extern ssize_t pqsecure_read(PGconn *, void *ptr, size_t len);
pgsql-hackers by date: