Re: libpq's multi-threaded SSL callback handling is busted - Mailing list pgsql-hackers

From Jan Urbański
Subject Re: libpq's multi-threaded SSL callback handling is busted
Date
Msg-id 87a90kmjyc.fsf@wulczer.org
Whole thread Raw
In response to libpq's multi-threaded SSL callback handling is busted  (Jan Urbański <wulczer@wulczer.org>)
List pgsql-hackers
Jan Urbański writes:

> I did some more digging on bug
> http://www.postgresql.org/message-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com
> which describes a deadlock when using libpq with SSL in a multi-threaded
> environment with other threads doing SSL independently.
>
> [reproducing instructions]
>
> I posit we should remove all CRYPTO_set_*_callback functions and associated
> cruft from libpq.
>
> I could submit a patch to get rid of the crazy CRYPTO_*_callback dance in
> libpq, but at the very least this will require a warning in the release notes

Attached is a patch doing just that.

> I would very much like to have this change back-patched, since setting and
> resetting the callback makes using libpq in a threaded OpenSSL-enabled app
> arguably less safe than if it didn't use any locking.

Also attached is a patch for 9.4 and all previous supported releases, which is
the same thing, but adjusted for when we didn't have a separate fe-secure.c and
fe-secure-openssl.c

If committed, this change will warrant a notice in the release notes. I could
try drafting it, if that'd be helpful.

Cheers,
Jan
commit 62f7433f697d49ab005ad22822dc943661a4e48a
Author: Jan Urbański <wulczer@wulczer.org>
Date:   Wed Feb 11 17:47:09 2015 +0100

    Do not attempt to manage OpenSSL locking callbacks in libpq.

    According to OpenSSL documentation, threaded programs using SSL should register
    locking callbacks before starting work. In 6daf396879b6502c41146cdb1013568654f52ff6
    libpq started doing that, registering its callbacks when the connection would
    switch to SSL mode.

    This would lead to overwriting any locking callback a properly written threaded
    SSL application might have already set, but only got uncovered as a bug in
    http://www.postgresql.org/message-id/48620925.6070806@pws.com.au where it
    turned out that unloading libpq would leave a dangling function
    pointer. Specifically, the PHP runtime would segfault after unloading libpq, as
    reported in https://bugs.php.net/bug.php?id=40926

    Commit 4e816286533dd34c10b368487d4079595a3e1418 made libpq unset the callback
    after the last SSL connection was closed. This solved the segfault issue, but
    introduced a potential for deadlocks when one thread using libpq with SSL and
    another doing an unrelated SSL operation.

    T1 (libpq)                     T2 (other)

    start libpq connection
    add locking callback

                                   start SSL operation
                                   take lock

    finish libpq connection
    remove locking callback

                                   (!) release lock, noop since no callback

    This got reported in bug
    http://www.postgresql.org/message-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com

    The reality is that it's not libpq's responsibility to manage those OpenSSL
    callbacks. It's up to application code to set them up before using OpenSSL in
    threads and trying to handle them in libpq is just stomping on the ones that
    should already by in place by the time we start using shared OpenSSL
    structures.

    This commit gets rid of all OpenSSL callback handling from libpq. For
    well-behaved applications, this should increase reliability, since they will
    now have certainty that the callbacks they set up before attempting to use
    OpenSSL will be used throughout the program execution.

    Applications that relied on libpq to set up locking callbacks will need to be
    fixed.

diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..5d0b132 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -67,7 +67,6 @@ static int    verify_cb(int ok, X509_STORE_CTX *ctx);
 static int    verify_peer_name_matches_certificate_name(PGconn *conn,
                                                   ASN1_STRING *name,
                                                   char **store_name);
-static void destroy_ssl_system(void);
 static int    initialize_SSL(PGconn *conn);
 static PostgresPollingStatusType open_client_SSL(PGconn *);
 static char *SSLerrmessage(void);
@@ -90,8 +89,6 @@ static bool pq_init_crypto_lib = true;
 static SSL_CTX *SSL_context = NULL;

 #ifdef ENABLE_THREAD_SAFETY
-static long ssl_open_connections = 0;
-
 #ifndef WIN32
 static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER;
 #else
@@ -112,16 +109,6 @@ static long win32_ssl_create_mutex = 0;
 void
 pgtls_init_library(bool do_ssl, int do_crypto)
 {
-#ifdef ENABLE_THREAD_SAFETY
-
-    /*
-     * Disallow changing the flags while we have open connections, else we'd
-     * get completely confused.
-     */
-    if (ssl_open_connections != 0)
-        return;
-#endif
-
     pq_init_ssl_lib = do_ssl;
     pq_init_crypto_lib = do_crypto;
 }
@@ -705,40 +692,6 @@ verify_peer_name_matches_certificate(PGconn *conn)
     return found_match && !got_error;
 }

-#ifdef ENABLE_THREAD_SAFETY
-/*
- *    Callback functions for OpenSSL internal locking
- */
-
-static unsigned long
-pq_threadidcallback(void)
-{
-    /*
-     * This is not standards-compliant.  pthread_self() returns pthread_t, and
-     * shouldn't be cast to unsigned long, but CRYPTO_set_id_callback requires
-     * it, so we have to do it.
-     */
-    return (unsigned long) pthread_self();
-}
-
-static pthread_mutex_t *pq_lockarray;
-
-static void
-pq_lockingcallback(int mode, int n, const char *file, int line)
-{
-    if (mode & CRYPTO_LOCK)
-    {
-        if (pthread_mutex_lock(&pq_lockarray[n]))
-            PGTHREAD_ERROR("failed to lock mutex");
-    }
-    else
-    {
-        if (pthread_mutex_unlock(&pq_lockarray[n]))
-            PGTHREAD_ERROR("failed to unlock mutex");
-    }
-}
-#endif   /* ENABLE_THREAD_SAFETY */
-
 /*
  * Initialize SSL system, in particular creating the SSL_context object
  * that will be shared by all SSL-using connections in this process.
@@ -776,41 +729,6 @@ pgtls_init(PGconn *conn)
     if (pthread_mutex_lock(&ssl_config_mutex))
         return -1;

-    if (pq_init_crypto_lib)
-    {
-        /*
-         * If necessary, set up an array to hold locks for libcrypto.
-         * libcrypto will tell us how big to make this array.
-         */
-        if (pq_lockarray == NULL)
-        {
-            int            i;
-
-            pq_lockarray = malloc(sizeof(pthread_mutex_t) * CRYPTO_num_locks());
-            if (!pq_lockarray)
-            {
-                pthread_mutex_unlock(&ssl_config_mutex);
-                return -1;
-            }
-            for (i = 0; i < CRYPTO_num_locks(); i++)
-            {
-                if (pthread_mutex_init(&pq_lockarray[i], NULL))
-                {
-                    free(pq_lockarray);
-                    pq_lockarray = NULL;
-                    pthread_mutex_unlock(&ssl_config_mutex);
-                    return -1;
-                }
-            }
-        }
-
-        if (ssl_open_connections++ == 0)
-        {
-            /* These are only required for threaded libcrypto applications */
-            CRYPTO_set_id_callback(pq_threadidcallback);
-            CRYPTO_set_locking_callback(pq_lockingcallback);
-        }
-    }
 #endif   /* ENABLE_THREAD_SAFETY */

     if (!SSL_context)
@@ -862,48 +780,6 @@ pgtls_init(PGconn *conn)
 }

 /*
- *    This function is needed because if the libpq library is unloaded
- *    from the application, the callback functions will no longer exist when
- *    libcrypto is used by other parts of the system.  For this reason,
- *    we unregister the callback functions when the last libpq
- *    connection is closed.  (The same would apply for OpenSSL callbacks
- *    if we had any.)
- *
- *    Callbacks are only set when we're compiled in threadsafe mode, so
- *    we only need to remove them in this case.
- */
-static void
-destroy_ssl_system(void)
-{
-#ifdef ENABLE_THREAD_SAFETY
-    /* Mutex is created in initialize_ssl_system() */
-    if (pthread_mutex_lock(&ssl_config_mutex))
-        return;
-
-    if (pq_init_crypto_lib && ssl_open_connections > 0)
-        --ssl_open_connections;
-
-    if (pq_init_crypto_lib && ssl_open_connections == 0)
-    {
-        /* No connections left, unregister libcrypto callbacks */
-        CRYPTO_set_locking_callback(NULL);
-        CRYPTO_set_id_callback(NULL);
-
-        /*
-         * We don't free the lock array or the SSL_context. If we get another
-         * connection in this process, we will just re-use them with the
-         * existing mutexes.
-         *
-         * This means we leak a little memory on repeated load/unload of the
-         * library.
-         */
-    }
-
-    pthread_mutex_unlock(&ssl_config_mutex);
-#endif
-}
-
-/*
  *    Initialize (potentially) per-connection SSL data, namely the
  *    client certificate, private key, and trusted CA certs.
  *
@@ -1399,17 +1275,8 @@ open_client_SSL(PGconn *conn)
 void
 pgtls_close(PGconn *conn)
 {
-    bool        destroy_needed = false;
-
     if (conn->ssl)
     {
-        /*
-         * We can't destroy everything SSL-related here due to the possible
-         * later calls to OpenSSL routines which may need our thread
-         * callbacks, so set a flag here and check at the end.
-         */
-        destroy_needed = true;
-
         SSL_shutdown(conn->ssl);
         SSL_free(conn->ssl);
         conn->ssl = NULL;
@@ -1430,17 +1297,6 @@ pgtls_close(PGconn *conn)
         conn->engine = NULL;
     }
 #endif
-
-    /*
-     * This will remove our SSL locking hooks, if this is the last SSL
-     * connection, which means we must wait to call it until after all SSL
-     * calls have been made, otherwise we can end up with a race condition and
-     * possible deadlocks.
-     *
-     * See comments above destroy_ssl_system().
-     */
-    if (destroy_needed)
-        destroy_ssl_system();
 }


commit 97847bf888e5b7a2c2aed2ef5e4eb92b0fa3c9e6
Author: Jan Urbański <wulczer@wulczer.org>
Date:   Wed Feb 11 17:47:09 2015 +0100

    Do not attempt to manage OpenSSL locking callbacks in libpq.

    According to OpenSSL documentation, threaded programs using SSL should register
    locking callbacks before starting work. In 6daf396879b6502c41146cdb1013568654f52ff6
    libpq started doing that, registering its callbacks when the connection would
    switch to SSL mode.

    This would lead to overwriting any locking callback a properly written threaded
    SSL application might have already set, but only got uncovered as a bug in
    http://www.postgresql.org/message-id/48620925.6070806@pws.com.au where it
    turned out that unloading libpq would leave a dangling function
    pointer. Specifically, the PHP runtime would segfault after unloading libpq, as
    reported in https://bugs.php.net/bug.php?id=40926

    Commit 4e816286533dd34c10b368487d4079595a3e1418 made libpq unset the callback
    after the last SSL connection was closed. This solved the segfault issue, but
    introduced a potential for deadlocks when one thread using libpq with SSL and
    another doing an unrelated SSL operation.

    T1 (libpq)                     T2 (other)

    start libpq connection
    add locking callback

                                   start SSL operation
                                   take lock

    finish libpq connection
    remove locking callback

                                   (!) release lock, noop since no callback

    This got reported in bug
    http://www.postgresql.org/message-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com

    The reality is that it's not libpq's responsibility to manage those OpenSSL
    callbacks. It's up to application code to set them up before using OpenSSL in
    threads and trying to handle them in libpq is just stomping on the ones that
    should already by in place by the time we start using shared OpenSSL
    structures.

    This commit gets rid of all OpenSSL callback handling from libpq. For
    well-behaved applications, this should increase reliability, since they will
    now have certainty that the callbacks they set up before attempting to use
    OpenSSL will be used throughout the program execution.

    Applications that relied on libpq to set up locking callbacks will need to be
    fixed.

diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index 9ba3567..56367b8 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -82,9 +82,7 @@
 static bool verify_peer_name_matches_certificate(PGconn *);
 static int    verify_cb(int ok, X509_STORE_CTX *ctx);
 static int    init_ssl_system(PGconn *conn);
-static void destroy_ssl_system(void);
 static int    initialize_SSL(PGconn *conn);
-static void destroySSL(void);
 static PostgresPollingStatusType open_client_SSL(PGconn *);
 static void close_SSL(PGconn *);
 static char *SSLerrmessage(void);
@@ -101,8 +99,6 @@ static bool pq_init_crypto_lib = true;
 static SSL_CTX *SSL_context = NULL;

 #ifdef ENABLE_THREAD_SAFETY
-static long ssl_open_connections = 0;
-
 #ifndef WIN32
 static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER;
 #else
@@ -205,16 +201,6 @@ void
 PQinitOpenSSL(int do_ssl, int do_crypto)
 {
 #ifdef USE_SSL
-#ifdef ENABLE_THREAD_SAFETY
-
-    /*
-     * Disallow changing the flags while we have open connections, else we'd
-     * get completely confused.
-     */
-    if (ssl_open_connections != 0)
-        return;
-#endif
-
     pq_init_ssl_lib = do_ssl;
     pq_init_crypto_lib = do_crypto;
 #endif
@@ -236,17 +222,6 @@ pqsecure_initialize(PGconn *conn)
 }

 /*
- *    Destroy global context
- */
-void
-pqsecure_destroy(void)
-{
-#ifdef USE_SSL
-    destroySSL();
-#endif
-}
-
-/*
  *    Begin or continue negotiating a secure session.
  */
 PostgresPollingStatusType
@@ -848,40 +823,6 @@ verify_peer_name_matches_certificate(PGconn *conn)
     return result;
 }

-#ifdef ENABLE_THREAD_SAFETY
-/*
- *    Callback functions for OpenSSL internal locking
- */
-
-static unsigned long
-pq_threadidcallback(void)
-{
-    /*
-     * This is not standards-compliant.  pthread_self() returns pthread_t, and
-     * shouldn't be cast to unsigned long, but CRYPTO_set_id_callback requires
-     * it, so we have to do it.
-     */
-    return (unsigned long) pthread_self();
-}
-
-static pthread_mutex_t *pq_lockarray;
-
-static void
-pq_lockingcallback(int mode, int n, const char *file, int line)
-{
-    if (mode & CRYPTO_LOCK)
-    {
-        if (pthread_mutex_lock(&pq_lockarray[n]))
-            PGTHREAD_ERROR("failed to lock mutex");
-    }
-    else
-    {
-        if (pthread_mutex_unlock(&pq_lockarray[n]))
-            PGTHREAD_ERROR("failed to unlock mutex");
-    }
-}
-#endif   /* ENABLE_THREAD_SAFETY */
-
 /*
  * Initialize SSL system, in particular creating the SSL_context object
  * that will be shared by all SSL-using connections in this process.
@@ -919,41 +860,6 @@ init_ssl_system(PGconn *conn)
     if (pthread_mutex_lock(&ssl_config_mutex))
         return -1;

-    if (pq_init_crypto_lib)
-    {
-        /*
-         * If necessary, set up an array to hold locks for libcrypto.
-         * libcrypto will tell us how big to make this array.
-         */
-        if (pq_lockarray == NULL)
-        {
-            int            i;
-
-            pq_lockarray = malloc(sizeof(pthread_mutex_t) * CRYPTO_num_locks());
-            if (!pq_lockarray)
-            {
-                pthread_mutex_unlock(&ssl_config_mutex);
-                return -1;
-            }
-            for (i = 0; i < CRYPTO_num_locks(); i++)
-            {
-                if (pthread_mutex_init(&pq_lockarray[i], NULL))
-                {
-                    free(pq_lockarray);
-                    pq_lockarray = NULL;
-                    pthread_mutex_unlock(&ssl_config_mutex);
-                    return -1;
-                }
-            }
-        }
-
-        if (ssl_open_connections++ == 0)
-        {
-            /* These are only required for threaded libcrypto applications */
-            CRYPTO_set_id_callback(pq_threadidcallback);
-            CRYPTO_set_locking_callback(pq_lockingcallback);
-        }
-    }
 #endif   /* ENABLE_THREAD_SAFETY */

     if (!SSL_context)
@@ -1005,48 +911,6 @@ init_ssl_system(PGconn *conn)
 }

 /*
- *    This function is needed because if the libpq library is unloaded
- *    from the application, the callback functions will no longer exist when
- *    libcrypto is used by other parts of the system.  For this reason,
- *    we unregister the callback functions when the last libpq
- *    connection is closed.  (The same would apply for OpenSSL callbacks
- *    if we had any.)
- *
- *    Callbacks are only set when we're compiled in threadsafe mode, so
- *    we only need to remove them in this case.
- */
-static void
-destroy_ssl_system(void)
-{
-#ifdef ENABLE_THREAD_SAFETY
-    /* Mutex is created in initialize_ssl_system() */
-    if (pthread_mutex_lock(&ssl_config_mutex))
-        return;
-
-    if (pq_init_crypto_lib && ssl_open_connections > 0)
-        --ssl_open_connections;
-
-    if (pq_init_crypto_lib && ssl_open_connections == 0)
-    {
-        /* No connections left, unregister libcrypto callbacks */
-        CRYPTO_set_locking_callback(NULL);
-        CRYPTO_set_id_callback(NULL);
-
-        /*
-         * We don't free the lock array or the SSL_context. If we get another
-         * connection in this process, we will just re-use them with the
-         * existing mutexes.
-         *
-         * This means we leak a little memory on repeated load/unload of the
-         * library.
-         */
-    }
-
-    pthread_mutex_unlock(&ssl_config_mutex);
-#endif
-}
-
-/*
  *    Initialize (potentially) per-connection SSL data, namely the
  *    client certificate, private key, and trusted CA certs.
  *
@@ -1451,12 +1315,6 @@ initialize_SSL(PGconn *conn)
     return 0;
 }

-static void
-destroySSL(void)
-{
-    destroy_ssl_system();
-}
-
 /*
  *    Attempt to negotiate SSL connection.
  */
@@ -1548,19 +1406,10 @@ open_client_SSL(PGconn *conn)
 static void
 close_SSL(PGconn *conn)
 {
-    bool        destroy_needed = false;
-
     if (conn->ssl)
     {
         DECLARE_SIGPIPE_INFO(spinfo);

-        /*
-         * We can't destroy everything SSL-related here due to the possible
-         * later calls to OpenSSL routines which may need our thread
-         * callbacks, so set a flag here and check at the end.
-         */
-        destroy_needed = true;
-
         DISABLE_SIGPIPE(conn, spinfo, (void) 0);
         SSL_shutdown(conn->ssl);
         SSL_free(conn->ssl);
@@ -1584,17 +1433,6 @@ close_SSL(PGconn *conn)
         conn->engine = NULL;
     }
 #endif
-
-    /*
-     * This will remove our SSL locking hooks, if this is the last SSL
-     * connection, which means we must wait to call it until after all SSL
-     * calls have been made, otherwise we can end up with a race condition and
-     * possible deadlocks.
-     *
-     * See comments above destroy_ssl_system().
-     */
-    if (destroy_needed)
-        pqsecure_destroy();
 }

 /*
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 4aeb4fa..e10af9e 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -598,7 +598,6 @@ extern int    pqWriteReady(PGconn *conn);
 /* === in fe-secure.c === */

 extern int    pqsecure_initialize(PGconn *);
-extern void pqsecure_destroy(void);
 extern PostgresPollingStatusType pqsecure_open_client(PGconn *);
 extern void pqsecure_close(PGconn *);
 extern ssize_t pqsecure_read(PGconn *, void *ptr, size_t len);

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: reducing our reliance on MD5
Next
From: Thom Brown
Date:
Subject: Standby receiving part of missing WAL segment