Thread: libpq's multi-threaded SSL callback handling is busted

libpq's multi-threaded SSL callback handling is busted

From
Jan Urbański
Date:
I did some more digging on bug
http://www.postgresql.org/message-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com
which describes a deadlock when using libpq with SSL in a multi-threaded
environment with other threads doing SSL independently.

Attached is a reproducing Python script in my experience is faster at showing
the problem. Run it with

python -u pg_lock.py

As Heikki correctly diagnosed, the problem is with libpq unsetting the OpenSSL
locking callback while another thread is holding one of the locks. The other
thread then never releases the lock and the next time anyone tries to take it,
the application deadlocks.

The exact way it goes down is like this:



T1 (libpq)                     T2 (Python)

start libpq connection
init ssl system
add locking callback

                               start ssl operation
                               take lock

finish libpq connection
destroy ssl system
remove locking callback

                               (!) release lock, noop since no callback



And the next time any thread tries to take the lock, it deadlocks.

We added unsetting the locking callback in
4e816286533dd34c10b368487d4079595a3e1418 due to this bug report:
http://www.postgresql.org/message-id/48620925.6070806@pws.com.au

Indeed, commenting out the CRYPTO_set_locking_callback(NULL) call in
fe-secure-openssl.c gets rid of the deadlock. However, it makes php segfault
with the (attached) reproduction script from the original 2008 bug report. If
your php.ini loads both the pgsql and curl extensions, reproduce the segfault with:

php -f pg_segfault.php

The most difficult part about fixing this bug is to determine *who's at
fault*. I now lean towards the opinion that we shouldn't be messing with
OpenSSL callbacks *at all*.

First of all, the current behaviour is crazy. We're setting and unsetting the
locking callback every time a connection is made/closed, which is not how
OpenSSL is supposed to be used. The *application* using libpq should set a
callback before it starts threads, it's no business of the library's.

The old behaviour was slightly less insane (set callbacks first time we're
engaging OpenSSL code, never touch them again). The real sane solution is to
leave it up to the application.

I posit we should remove all CRYPTO_set_*_callback functions and associated
cruft from libpq. This unfortunately means that multi-threaded applications
using libpq and SSL will break if they haven't been setting their own callbacks
(if they have, well, tough luck! libpq will just stomp over them the first time
it connects to Postgres, but at least *some* callbacks are left present after
that).

However, AFAICS if your app is not in C, then runtimes already handle that for
you (as they should).

Python:

https://hg.python.org/cpython/file/dc820b44ce21/Modules/_ssl.c#l4284

PHP:

https://github.com/php/php-src/blob/master/ext/curl/interface.c#L1235

Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
libpq was setting them on its own. If we remove the callback handling from
libpq, PHP will need to add them. By the way, the MySQL extension for PHP also
does not set those callbacks.

Let me reiterate: I now believe the callbacks should be set by the application,
libraries should not touch them, since they don't know what will they be
stomping on. If the application is run through a VM like Python or PHP, it's
the VM that should make sure the callbacks are set.

I could submit a patch to get rid of the crazy CRYPTO_*_callback dance in
libpq, but at the very least this will require a warning in the release notes
about how you can't assume that libpq will take care of making sure your app is
multi-threaded safe when using OpenSSL. I also don't know how far that's
back-patcheable.

I would very much like to have this change back-patched, since setting and
resetting the callback makes using libpq in a threaded OpenSSL-enabled app
arguably less safe than if it didn't use any locking. If the app is written
correctly, it will have set locking callbacks before starting. Then libpq will
happily stomp on them. If the app hasn't set callbacks, it wasn't written
correctly in the first place and it will get segfaults instead of deadlocks.

Thanks,
Jan
#!/usr/bin/env python
import sys
import time
import threading
import urllib

import psycopg2


DB_HOST = '127.0.0.1'
DB_USER = 'postgres'
DB_NAME = 'postgres'

HTTPS_URL = 'https://google.com'
NUM_HTTPS_THREADS = 10


def connect():
    conn = psycopg2.connect(
        host=DB_HOST, user=DB_USER,
        database=DB_NAME, sslmode='require')
    return conn


class Worker(threading.Thread):

    def __init__(self):
        super(Worker, self).__init__()
        self._stop = threading.Event()

    def stop(self):
        self._stop.set()

    def stopped(self):
        return self._stop.isSet()

    def run(self):
        i = 0
        while not self.stopped():
            i += 1
            self.do_work(i)


class Libpq(Worker):

    def do_work(self, i):
        conn = connect()
        cur = conn.cursor()
        cur.execute('SELECT 1')
        cur.close()
        conn.close()
        if not i % 50:
            sys.stdout.write('+')


class HTTPS(Worker):

    def do_work(self, i):
        urllib.urlopen(HTTPS_URL).read()
        if not i % 50:
            sys.stdout.write('=')


def main():
    # make sure libpq has set its locking callbacks
    conn = connect()
    cur = conn.cursor()
    cur.execute('SELECT 1')
    cur.close()
    conn.close()

    threads = [Libpq()]
    threads += [HTTPS() for _ in range(NUM_HTTPS_THREADS)]

    print 'starting worker threads'

    try:
        for t in threads:
            t.start()
        while True:
            time.sleep(1000)
    except KeyboardInterrupt:
        for t in threads:
            t.stop()

    for t in threads:
        t.join()


if __name__ == "__main__":
    sys.exit(main())

Attachment

Re: libpq's multi-threaded SSL callback handling is busted

From
Jan Urbański
Date:
Jan Urbański writes:

> I did some more digging on bug
> http://www.postgresql.org/message-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com
> which describes a deadlock when using libpq with SSL in a multi-threaded
> environment with other threads doing SSL independently.
>
> [reproducing instructions]
>
> I posit we should remove all CRYPTO_set_*_callback functions and associated
> cruft from libpq.
>
> I could submit a patch to get rid of the crazy CRYPTO_*_callback dance in
> libpq, but at the very least this will require a warning in the release notes

Attached is a patch doing just that.

> I would very much like to have this change back-patched, since setting and
> resetting the callback makes using libpq in a threaded OpenSSL-enabled app
> arguably less safe than if it didn't use any locking.

Also attached is a patch for 9.4 and all previous supported releases, which is
the same thing, but adjusted for when we didn't have a separate fe-secure.c and
fe-secure-openssl.c

If committed, this change will warrant a notice in the release notes. I could
try drafting it, if that'd be helpful.

Cheers,
Jan
commit 62f7433f697d49ab005ad22822dc943661a4e48a
Author: Jan Urbański <wulczer@wulczer.org>
Date:   Wed Feb 11 17:47:09 2015 +0100

    Do not attempt to manage OpenSSL locking callbacks in libpq.

    According to OpenSSL documentation, threaded programs using SSL should register
    locking callbacks before starting work. In 6daf396879b6502c41146cdb1013568654f52ff6
    libpq started doing that, registering its callbacks when the connection would
    switch to SSL mode.

    This would lead to overwriting any locking callback a properly written threaded
    SSL application might have already set, but only got uncovered as a bug in
    http://www.postgresql.org/message-id/48620925.6070806@pws.com.au where it
    turned out that unloading libpq would leave a dangling function
    pointer. Specifically, the PHP runtime would segfault after unloading libpq, as
    reported in https://bugs.php.net/bug.php?id=40926

    Commit 4e816286533dd34c10b368487d4079595a3e1418 made libpq unset the callback
    after the last SSL connection was closed. This solved the segfault issue, but
    introduced a potential for deadlocks when one thread using libpq with SSL and
    another doing an unrelated SSL operation.

    T1 (libpq)                     T2 (other)

    start libpq connection
    add locking callback

                                   start SSL operation
                                   take lock

    finish libpq connection
    remove locking callback

                                   (!) release lock, noop since no callback

    This got reported in bug
    http://www.postgresql.org/message-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com

    The reality is that it's not libpq's responsibility to manage those OpenSSL
    callbacks. It's up to application code to set them up before using OpenSSL in
    threads and trying to handle them in libpq is just stomping on the ones that
    should already by in place by the time we start using shared OpenSSL
    structures.

    This commit gets rid of all OpenSSL callback handling from libpq. For
    well-behaved applications, this should increase reliability, since they will
    now have certainty that the callbacks they set up before attempting to use
    OpenSSL will be used throughout the program execution.

    Applications that relied on libpq to set up locking callbacks will need to be
    fixed.

diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..5d0b132 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -67,7 +67,6 @@ static int    verify_cb(int ok, X509_STORE_CTX *ctx);
 static int    verify_peer_name_matches_certificate_name(PGconn *conn,
                                                   ASN1_STRING *name,
                                                   char **store_name);
-static void destroy_ssl_system(void);
 static int    initialize_SSL(PGconn *conn);
 static PostgresPollingStatusType open_client_SSL(PGconn *);
 static char *SSLerrmessage(void);
@@ -90,8 +89,6 @@ static bool pq_init_crypto_lib = true;
 static SSL_CTX *SSL_context = NULL;

 #ifdef ENABLE_THREAD_SAFETY
-static long ssl_open_connections = 0;
-
 #ifndef WIN32
 static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER;
 #else
@@ -112,16 +109,6 @@ static long win32_ssl_create_mutex = 0;
 void
 pgtls_init_library(bool do_ssl, int do_crypto)
 {
-#ifdef ENABLE_THREAD_SAFETY
-
-    /*
-     * Disallow changing the flags while we have open connections, else we'd
-     * get completely confused.
-     */
-    if (ssl_open_connections != 0)
-        return;
-#endif
-
     pq_init_ssl_lib = do_ssl;
     pq_init_crypto_lib = do_crypto;
 }
@@ -705,40 +692,6 @@ verify_peer_name_matches_certificate(PGconn *conn)
     return found_match && !got_error;
 }

-#ifdef ENABLE_THREAD_SAFETY
-/*
- *    Callback functions for OpenSSL internal locking
- */
-
-static unsigned long
-pq_threadidcallback(void)
-{
-    /*
-     * This is not standards-compliant.  pthread_self() returns pthread_t, and
-     * shouldn't be cast to unsigned long, but CRYPTO_set_id_callback requires
-     * it, so we have to do it.
-     */
-    return (unsigned long) pthread_self();
-}
-
-static pthread_mutex_t *pq_lockarray;
-
-static void
-pq_lockingcallback(int mode, int n, const char *file, int line)
-{
-    if (mode & CRYPTO_LOCK)
-    {
-        if (pthread_mutex_lock(&pq_lockarray[n]))
-            PGTHREAD_ERROR("failed to lock mutex");
-    }
-    else
-    {
-        if (pthread_mutex_unlock(&pq_lockarray[n]))
-            PGTHREAD_ERROR("failed to unlock mutex");
-    }
-}
-#endif   /* ENABLE_THREAD_SAFETY */
-
 /*
  * Initialize SSL system, in particular creating the SSL_context object
  * that will be shared by all SSL-using connections in this process.
@@ -776,41 +729,6 @@ pgtls_init(PGconn *conn)
     if (pthread_mutex_lock(&ssl_config_mutex))
         return -1;

-    if (pq_init_crypto_lib)
-    {
-        /*
-         * If necessary, set up an array to hold locks for libcrypto.
-         * libcrypto will tell us how big to make this array.
-         */
-        if (pq_lockarray == NULL)
-        {
-            int            i;
-
-            pq_lockarray = malloc(sizeof(pthread_mutex_t) * CRYPTO_num_locks());
-            if (!pq_lockarray)
-            {
-                pthread_mutex_unlock(&ssl_config_mutex);
-                return -1;
-            }
-            for (i = 0; i < CRYPTO_num_locks(); i++)
-            {
-                if (pthread_mutex_init(&pq_lockarray[i], NULL))
-                {
-                    free(pq_lockarray);
-                    pq_lockarray = NULL;
-                    pthread_mutex_unlock(&ssl_config_mutex);
-                    return -1;
-                }
-            }
-        }
-
-        if (ssl_open_connections++ == 0)
-        {
-            /* These are only required for threaded libcrypto applications */
-            CRYPTO_set_id_callback(pq_threadidcallback);
-            CRYPTO_set_locking_callback(pq_lockingcallback);
-        }
-    }
 #endif   /* ENABLE_THREAD_SAFETY */

     if (!SSL_context)
@@ -862,48 +780,6 @@ pgtls_init(PGconn *conn)
 }

 /*
- *    This function is needed because if the libpq library is unloaded
- *    from the application, the callback functions will no longer exist when
- *    libcrypto is used by other parts of the system.  For this reason,
- *    we unregister the callback functions when the last libpq
- *    connection is closed.  (The same would apply for OpenSSL callbacks
- *    if we had any.)
- *
- *    Callbacks are only set when we're compiled in threadsafe mode, so
- *    we only need to remove them in this case.
- */
-static void
-destroy_ssl_system(void)
-{
-#ifdef ENABLE_THREAD_SAFETY
-    /* Mutex is created in initialize_ssl_system() */
-    if (pthread_mutex_lock(&ssl_config_mutex))
-        return;
-
-    if (pq_init_crypto_lib && ssl_open_connections > 0)
-        --ssl_open_connections;
-
-    if (pq_init_crypto_lib && ssl_open_connections == 0)
-    {
-        /* No connections left, unregister libcrypto callbacks */
-        CRYPTO_set_locking_callback(NULL);
-        CRYPTO_set_id_callback(NULL);
-
-        /*
-         * We don't free the lock array or the SSL_context. If we get another
-         * connection in this process, we will just re-use them with the
-         * existing mutexes.
-         *
-         * This means we leak a little memory on repeated load/unload of the
-         * library.
-         */
-    }
-
-    pthread_mutex_unlock(&ssl_config_mutex);
-#endif
-}
-
-/*
  *    Initialize (potentially) per-connection SSL data, namely the
  *    client certificate, private key, and trusted CA certs.
  *
@@ -1399,17 +1275,8 @@ open_client_SSL(PGconn *conn)
 void
 pgtls_close(PGconn *conn)
 {
-    bool        destroy_needed = false;
-
     if (conn->ssl)
     {
-        /*
-         * We can't destroy everything SSL-related here due to the possible
-         * later calls to OpenSSL routines which may need our thread
-         * callbacks, so set a flag here and check at the end.
-         */
-        destroy_needed = true;
-
         SSL_shutdown(conn->ssl);
         SSL_free(conn->ssl);
         conn->ssl = NULL;
@@ -1430,17 +1297,6 @@ pgtls_close(PGconn *conn)
         conn->engine = NULL;
     }
 #endif
-
-    /*
-     * This will remove our SSL locking hooks, if this is the last SSL
-     * connection, which means we must wait to call it until after all SSL
-     * calls have been made, otherwise we can end up with a race condition and
-     * possible deadlocks.
-     *
-     * See comments above destroy_ssl_system().
-     */
-    if (destroy_needed)
-        destroy_ssl_system();
 }


commit 97847bf888e5b7a2c2aed2ef5e4eb92b0fa3c9e6
Author: Jan Urbański <wulczer@wulczer.org>
Date:   Wed Feb 11 17:47:09 2015 +0100

    Do not attempt to manage OpenSSL locking callbacks in libpq.

    According to OpenSSL documentation, threaded programs using SSL should register
    locking callbacks before starting work. In 6daf396879b6502c41146cdb1013568654f52ff6
    libpq started doing that, registering its callbacks when the connection would
    switch to SSL mode.

    This would lead to overwriting any locking callback a properly written threaded
    SSL application might have already set, but only got uncovered as a bug in
    http://www.postgresql.org/message-id/48620925.6070806@pws.com.au where it
    turned out that unloading libpq would leave a dangling function
    pointer. Specifically, the PHP runtime would segfault after unloading libpq, as
    reported in https://bugs.php.net/bug.php?id=40926

    Commit 4e816286533dd34c10b368487d4079595a3e1418 made libpq unset the callback
    after the last SSL connection was closed. This solved the segfault issue, but
    introduced a potential for deadlocks when one thread using libpq with SSL and
    another doing an unrelated SSL operation.

    T1 (libpq)                     T2 (other)

    start libpq connection
    add locking callback

                                   start SSL operation
                                   take lock

    finish libpq connection
    remove locking callback

                                   (!) release lock, noop since no callback

    This got reported in bug
    http://www.postgresql.org/message-id/CAHUL3dpWYFnUgdgo95OHYDQ4kugdnBKPTjq0mNbTuBhCMG4xvQ@mail.gmail.com

    The reality is that it's not libpq's responsibility to manage those OpenSSL
    callbacks. It's up to application code to set them up before using OpenSSL in
    threads and trying to handle them in libpq is just stomping on the ones that
    should already by in place by the time we start using shared OpenSSL
    structures.

    This commit gets rid of all OpenSSL callback handling from libpq. For
    well-behaved applications, this should increase reliability, since they will
    now have certainty that the callbacks they set up before attempting to use
    OpenSSL will be used throughout the program execution.

    Applications that relied on libpq to set up locking callbacks will need to be
    fixed.

diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index 9ba3567..56367b8 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -82,9 +82,7 @@
 static bool verify_peer_name_matches_certificate(PGconn *);
 static int    verify_cb(int ok, X509_STORE_CTX *ctx);
 static int    init_ssl_system(PGconn *conn);
-static void destroy_ssl_system(void);
 static int    initialize_SSL(PGconn *conn);
-static void destroySSL(void);
 static PostgresPollingStatusType open_client_SSL(PGconn *);
 static void close_SSL(PGconn *);
 static char *SSLerrmessage(void);
@@ -101,8 +99,6 @@ static bool pq_init_crypto_lib = true;
 static SSL_CTX *SSL_context = NULL;

 #ifdef ENABLE_THREAD_SAFETY
-static long ssl_open_connections = 0;
-
 #ifndef WIN32
 static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER;
 #else
@@ -205,16 +201,6 @@ void
 PQinitOpenSSL(int do_ssl, int do_crypto)
 {
 #ifdef USE_SSL
-#ifdef ENABLE_THREAD_SAFETY
-
-    /*
-     * Disallow changing the flags while we have open connections, else we'd
-     * get completely confused.
-     */
-    if (ssl_open_connections != 0)
-        return;
-#endif
-
     pq_init_ssl_lib = do_ssl;
     pq_init_crypto_lib = do_crypto;
 #endif
@@ -236,17 +222,6 @@ pqsecure_initialize(PGconn *conn)
 }

 /*
- *    Destroy global context
- */
-void
-pqsecure_destroy(void)
-{
-#ifdef USE_SSL
-    destroySSL();
-#endif
-}
-
-/*
  *    Begin or continue negotiating a secure session.
  */
 PostgresPollingStatusType
@@ -848,40 +823,6 @@ verify_peer_name_matches_certificate(PGconn *conn)
     return result;
 }

-#ifdef ENABLE_THREAD_SAFETY
-/*
- *    Callback functions for OpenSSL internal locking
- */
-
-static unsigned long
-pq_threadidcallback(void)
-{
-    /*
-     * This is not standards-compliant.  pthread_self() returns pthread_t, and
-     * shouldn't be cast to unsigned long, but CRYPTO_set_id_callback requires
-     * it, so we have to do it.
-     */
-    return (unsigned long) pthread_self();
-}
-
-static pthread_mutex_t *pq_lockarray;
-
-static void
-pq_lockingcallback(int mode, int n, const char *file, int line)
-{
-    if (mode & CRYPTO_LOCK)
-    {
-        if (pthread_mutex_lock(&pq_lockarray[n]))
-            PGTHREAD_ERROR("failed to lock mutex");
-    }
-    else
-    {
-        if (pthread_mutex_unlock(&pq_lockarray[n]))
-            PGTHREAD_ERROR("failed to unlock mutex");
-    }
-}
-#endif   /* ENABLE_THREAD_SAFETY */
-
 /*
  * Initialize SSL system, in particular creating the SSL_context object
  * that will be shared by all SSL-using connections in this process.
@@ -919,41 +860,6 @@ init_ssl_system(PGconn *conn)
     if (pthread_mutex_lock(&ssl_config_mutex))
         return -1;

-    if (pq_init_crypto_lib)
-    {
-        /*
-         * If necessary, set up an array to hold locks for libcrypto.
-         * libcrypto will tell us how big to make this array.
-         */
-        if (pq_lockarray == NULL)
-        {
-            int            i;
-
-            pq_lockarray = malloc(sizeof(pthread_mutex_t) * CRYPTO_num_locks());
-            if (!pq_lockarray)
-            {
-                pthread_mutex_unlock(&ssl_config_mutex);
-                return -1;
-            }
-            for (i = 0; i < CRYPTO_num_locks(); i++)
-            {
-                if (pthread_mutex_init(&pq_lockarray[i], NULL))
-                {
-                    free(pq_lockarray);
-                    pq_lockarray = NULL;
-                    pthread_mutex_unlock(&ssl_config_mutex);
-                    return -1;
-                }
-            }
-        }
-
-        if (ssl_open_connections++ == 0)
-        {
-            /* These are only required for threaded libcrypto applications */
-            CRYPTO_set_id_callback(pq_threadidcallback);
-            CRYPTO_set_locking_callback(pq_lockingcallback);
-        }
-    }
 #endif   /* ENABLE_THREAD_SAFETY */

     if (!SSL_context)
@@ -1005,48 +911,6 @@ init_ssl_system(PGconn *conn)
 }

 /*
- *    This function is needed because if the libpq library is unloaded
- *    from the application, the callback functions will no longer exist when
- *    libcrypto is used by other parts of the system.  For this reason,
- *    we unregister the callback functions when the last libpq
- *    connection is closed.  (The same would apply for OpenSSL callbacks
- *    if we had any.)
- *
- *    Callbacks are only set when we're compiled in threadsafe mode, so
- *    we only need to remove them in this case.
- */
-static void
-destroy_ssl_system(void)
-{
-#ifdef ENABLE_THREAD_SAFETY
-    /* Mutex is created in initialize_ssl_system() */
-    if (pthread_mutex_lock(&ssl_config_mutex))
-        return;
-
-    if (pq_init_crypto_lib && ssl_open_connections > 0)
-        --ssl_open_connections;
-
-    if (pq_init_crypto_lib && ssl_open_connections == 0)
-    {
-        /* No connections left, unregister libcrypto callbacks */
-        CRYPTO_set_locking_callback(NULL);
-        CRYPTO_set_id_callback(NULL);
-
-        /*
-         * We don't free the lock array or the SSL_context. If we get another
-         * connection in this process, we will just re-use them with the
-         * existing mutexes.
-         *
-         * This means we leak a little memory on repeated load/unload of the
-         * library.
-         */
-    }
-
-    pthread_mutex_unlock(&ssl_config_mutex);
-#endif
-}
-
-/*
  *    Initialize (potentially) per-connection SSL data, namely the
  *    client certificate, private key, and trusted CA certs.
  *
@@ -1451,12 +1315,6 @@ initialize_SSL(PGconn *conn)
     return 0;
 }

-static void
-destroySSL(void)
-{
-    destroy_ssl_system();
-}
-
 /*
  *    Attempt to negotiate SSL connection.
  */
@@ -1548,19 +1406,10 @@ open_client_SSL(PGconn *conn)
 static void
 close_SSL(PGconn *conn)
 {
-    bool        destroy_needed = false;
-
     if (conn->ssl)
     {
         DECLARE_SIGPIPE_INFO(spinfo);

-        /*
-         * We can't destroy everything SSL-related here due to the possible
-         * later calls to OpenSSL routines which may need our thread
-         * callbacks, so set a flag here and check at the end.
-         */
-        destroy_needed = true;
-
         DISABLE_SIGPIPE(conn, spinfo, (void) 0);
         SSL_shutdown(conn->ssl);
         SSL_free(conn->ssl);
@@ -1584,17 +1433,6 @@ close_SSL(PGconn *conn)
         conn->engine = NULL;
     }
 #endif
-
-    /*
-     * This will remove our SSL locking hooks, if this is the last SSL
-     * connection, which means we must wait to call it until after all SSL
-     * calls have been made, otherwise we can end up with a race condition and
-     * possible deadlocks.
-     *
-     * See comments above destroy_ssl_system().
-     */
-    if (destroy_needed)
-        pqsecure_destroy();
 }

 /*
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 4aeb4fa..e10af9e 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -598,7 +598,6 @@ extern int    pqWriteReady(PGconn *conn);
 /* === in fe-secure.c === */

 extern int    pqsecure_initialize(PGconn *);
-extern void pqsecure_destroy(void);
 extern PostgresPollingStatusType pqsecure_open_client(PGconn *);
 extern void pqsecure_close(PGconn *);
 extern ssize_t pqsecure_read(PGconn *, void *ptr, size_t len);

Re: libpq's multi-threaded SSL callback handling is busted

From
Andres Freund
Date:
On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:
> We added unsetting the locking callback in
> 4e816286533dd34c10b368487d4079595a3e1418 due to this bug report:
> http://www.postgresql.org/message-id/48620925.6070806@pws.com.au
> 
> Indeed, commenting out the CRYPTO_set_locking_callback(NULL) call in
> fe-secure-openssl.c gets rid of the deadlock. However, it makes php segfault
> with the (attached) reproduction script from the original 2008 bug report. If
> your php.ini loads both the pgsql and curl extensions, reproduce the segfault with:
> 
> php -f pg_segfault.php
> 
> The most difficult part about fixing this bug is to determine *who's at
> fault*. I now lean towards the opinion that we shouldn't be messing with
> OpenSSL callbacks *at all*.
> 
> First of all, the current behaviour is crazy. We're setting and unsetting the
> locking callback every time a connection is made/closed, which is not how
> OpenSSL is supposed to be used. The *application* using libpq should set a
> callback before it starts threads, it's no business of the library's.

I don't buy this argument at all. Delivering a libpq that's not
threadsafe in some circumstances is a really bad idea.

> The old behaviour was slightly less insane (set callbacks first time we're
> engaging OpenSSL code, never touch them again). The real sane solution is to
> leave it up to the application.

The real solution would be for openssl to do it itself.

> I posit we should remove all CRYPTO_set_*_callback functions and associated
> cruft from libpq. This unfortunately means that multi-threaded applications
> using libpq and SSL will break if they haven't been setting their own callbacks
> (if they have, well, tough luck! libpq will just stomp over them the first time
> it connects to Postgres, but at least *some* callbacks are left present after
> that).

I think there's no chance in hell we can do that. Breaking a noticeable
amount of libpq users in a minor upgrade is imo a cure *FAR* worse than
the cure.

> Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
> libpq was setting them on its own. If we remove the callback handling from
> libpq, PHP will need to add them. By the way, the MySQL extension for PHP also
> does not set those callbacks.

I think this shows pretty clearly how insane it would be to change this
in a minor release. Do you really expect people to update libpq and php
in unison. After a minor release?

Note that we *already* provide applications with the ability to set the
callbacks themselves and prevent us from doing so: PQinitSSL(false).

> Let me reiterate: I now believe the callbacks should be set by the application,
> libraries should not touch them, since they don't know what will they be
> stomping on. If the application is run through a VM like Python or PHP, it's
> the VM that should make sure the callbacks are set.

I fail to see why it's any better to do it in the VM. That relies on
either always loading the VM's openssl module, even if you don't
actually need it because the library you use deals with openssl. It
prevents other implementations of openssl in the VM.

What I think we should do is the following:

* Emphasize the fact that it's important to use PQinitSSL(false) if you did things yourself.
* If there's already callbacks set: Remember that fact and don't overwrite. In the next major version: warn.
* Assert or something if the callback when unregistering isn't the same as it was when registering. That's pretty much
guaranteedto cause issues.
 

> I would very much like to have this change back-patched, since setting and
> resetting the callback makes using libpq in a threaded OpenSSL-enabled app
> arguably less safe than if it didn't use any locking.

Meh^2

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: libpq's multi-threaded SSL callback handling is busted

From
Jan Urbański
Date:
Andres Freund writes:

> On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:
>> First of all, the current behaviour is crazy. We're setting and unsetting the
>> locking callback every time a connection is made/closed, which is not how
>> OpenSSL is supposed to be used. The *application* using libpq should set a
>> callback before it starts threads, it's no business of the library's.
>
> I don't buy this argument at all. Delivering a libpq that's not
> threadsafe in some circumstances is a really bad idea.

I knew this would be a hard sell :( What I know is that the current situation
is not good and leaving it as-is causes real grief.

>> The old behaviour was slightly less insane (set callbacks first time we're
>> engaging OpenSSL code, never touch them again). The real sane solution is to
>> leave it up to the application.
>
> The real solution would be for openssl to do it itself.

I think that the OpenSSL API is the real culprit there, requiring threads to
register a callback is what's causing all the issues, but I don't think this
will get ever changed.

>> Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
>> libpq was setting them on its own. If we remove the callback handling from
>> libpq, PHP will need to add them. By the way, the MySQL extension for PHP also
>> does not set those callbacks.
>
> I think this shows pretty clearly how insane it would be to change this
> in a minor release. Do you really expect people to update libpq and php
> in unison. After a minor release?

Well, I haven't found reports of threaded PHP+MySQL+SSL programs crashing, and
the MySQL extension also doesn't care about the callbacks. I think it's because
it's both because it's a rare combination, or because almost everyone loads the
cURL extension, which *does* set up callbacks. Like I said, it's not a good
situation.

> Note that we *already* provide applications with the ability to set the
> callbacks themselves and prevent us from doing so: PQinitSSL(false).

Ah, I only now saw that with PQinitOpenSSL(true, false) you can work around the
problem, if you set up your own callbacks first. That seems to at least provide
a way to make libpq not do the callbacks dance in released versions. Thank you.

>> Let me reiterate: I now believe the callbacks should be set by the application,
>> libraries should not touch them, since they don't know what will they be
>> stomping on. If the application is run through a VM like Python or PHP, it's
>> the VM that should make sure the callbacks are set.
>
> I fail to see why it's any better to do it in the VM. That relies on
> either always loading the VM's openssl module, even if you don't
> actually need it because the library you use deals with openssl. It
> prevents other implementations of openssl in the VM.

The callbacks are a thing that's owned by the application, so libraries have no
business in setting them up. The way OpenSSL's API is specified (very
unfortunately IMHO) is that before you use OpenSSL from threads, you need to
set the callbacks. If you don't control your application's startup (like when
you're a script executed through a VM), you assume the VM took care of it at
startup. If you're a library, you assume the user took care of it, as they
should.

Now I know that a lot of apps get this wrong. Python almost does the right
thing, but you're right, it only sets up callbacks if you load the 'ssl'
module. It still feels like setting them up in library code is stomping on
things not owned by the library - like libperl setting up signal handlers,
which caused problems in Postgres. They're resources not owned by the library!

> What I think we should do is the following:
>
> * Emphasize the fact that it's important to use PQinitSSL(false) if you
>   did things yourself.

PQinitOpenSSL(true, false) if anything. The reason that function got introduced
is precisely because PQinitSSL(false) isn't exactly right, see
http://www.postgresql.org/message-id/49820D7D.7030902@esilo.com

That doesn't solve the problem of the Python deadlock, where you're not at
leisure to call a C function at the beginning of your module.

> * If there's already callbacks set: Remember that fact and don't
>   overwrite. In the next major version: warn.

So yeah, that was my initial approach - check if callbacks are set, don't do
the dance if they are. It felt like a crutch, though, and racy at that. There's
no atomic way to test-and-set those callbacks. The window for racyness is
small, though.

> * Assert or something if the callback when unregistering isn't the same
>   as it was when registering. That's pretty much guaranteed to cause
>   issues.

So let me try another proposal and see if it doesn't set alarm bells off.
 * When opening a connection, if there's a callback set, don't overwrite it   (small race window).
 * When closing a connection, if the callback set is not   pq_lockingcallback/pq_threadidcallback, don't NULL it (small
racewindow)
 

Asserts in frontend code are impossible, right? There's no way to warn, either.

That would be a ~8 line patch. Does it feel back-patcheable?

Cheers,
Jan



Re: libpq's multi-threaded SSL callback handling is busted

From
Andres Freund
Date:
On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
> 
> Andres Freund writes:
> 
> > On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:
> >> First of all, the current behaviour is crazy. We're setting and unsetting the
> >> locking callback every time a connection is made/closed, which is not how
> >> OpenSSL is supposed to be used. The *application* using libpq should set a
> >> callback before it starts threads, it's no business of the library's.
> >
> > I don't buy this argument at all. Delivering a libpq that's not
> > threadsafe in some circumstances is a really bad idea.
> 
> I knew this would be a hard sell :( What I know is that the current situation
> is not good and leaving it as-is causes real grief.

It certainly causes less grief than just breaking working
applications. While really shitty the current situation works in a large
number of scenarios. It's not that common to have several users of
openssl in an application *and* unload libpq.

> > I fail to see why it's any better to do it in the VM. That relies on
> > either always loading the VM's openssl module, even if you don't
> > actually need it because the library you use deals with openssl. It
> > prevents other implementations of openssl in the VM.
> 
> The callbacks are a thing that's owned by the application, so libraries have no
> business in setting them up. The way OpenSSL's API is specified (very
> unfortunately IMHO) is that before you use OpenSSL from threads, you need to
> set the callbacks. If you don't control your application's startup (like when
> you're a script executed through a VM), you assume the VM took care of it at
> startup. If you're a library, you assume the user took care of it, as they
> should.

That's a bogus argument. The VM cannot setup up every library in the
world in a correct way. It'd be obviously be completely insane to load
the ssl module in every library just because some part of some random
application might need it. In fact, the ssl library for python does
pretty much the same thing as libpq does (although it's slightly more
careful). It's not the VM at all.

> > What I think we should do is the following:
> >
> > * Emphasize the fact that it's important to use PQinitSSL(false) if you
> >   did things yourself.
> 
> PQinitOpenSSL(true, false) if anything. The reason that function got introduced
> is precisely because PQinitSSL(false) isn't exactly right, see
> http://www.postgresql.org/message-id/49820D7D.7030902@esilo.com

Well, that really depends on what the application is actually using...

> That doesn't solve the problem of the Python deadlock, where you're not at
> leisure to call a C function at the beginning of your module.

We could just never unload the hooks...

> > * If there's already callbacks set: Remember that fact and don't
> >   overwrite. In the next major version: warn.
> 
> So yeah, that was my initial approach - check if callbacks are set, don't do
> the dance if they are. It felt like a crutch, though, and racy at that. There's
> no atomic way to test-and-set those callbacks. The window for racyness is
> small, though.

If you do that check during library initialization instead of every
connection it shouldn't be racy - if that part is run in a multithreaded
fashion you're doing something crazy.

> > * Assert or something if the callback when unregistering isn't the same
> >   as it was when registering. That's pretty much guaranteed to cause
> >   issues.
> 
> So let me try another proposal and see if it doesn't set alarm bells off.
> 
>   * When opening a connection, if there's a callback set, don't overwrite it
>     (small race window).

>   * When closing a connection, if the callback set is not
>     pq_lockingcallback/pq_threadidcallback, don't NULL it (small race window)

If we do the tests once during initialization there shouldn't be a
race....

> Asserts in frontend code are impossible, right? There's no way to warn, either.

You can write to stderr...

> That would be a ~8 line patch. Does it feel back-patcheable?

I think we first need to find out what we all can agree on before
deciding about that.

Greetings,

Andres Freund



Re: libpq's multi-threaded SSL callback handling is busted

From
Jan Urbański
Date:
Andres Freund writes:

> On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
>> That doesn't solve the problem of the Python deadlock, where you're not at
>> leisure to call a C function at the beginning of your module.
>
> We could just never unload the hooks...

That's what we did before 4e816286533dd34c10b368487d4079595a3e1418 :) And it
got changed after http://www.postgresql.org/message-id/48620925.6070806@pws.com.au

>
>> > * If there's already callbacks set: Remember that fact and don't
>> >   overwrite. In the next major version: warn.
>>
>> So yeah, that was my initial approach - check if callbacks are set, don't do
>> the dance if they are. It felt like a crutch, though, and racy at that. There's
>> no atomic way to test-and-set those callbacks. The window for racyness is
>> small, though.
>
> If you do that check during library initialization instead of every
> connection it shouldn't be racy - if that part is run in a multithreaded
> fashion you're doing something crazy.

Yes, that's true. The problem is that there's no real libpq initialisation
function. The docs say that:

"If your application initializes libssl and/or libcrypto libraries and libpq is
built with SSL support, you should call PQinitOpenSSL"

So most apps will just not bother. The moment you know you'll need SSL is only
when you get an 'S' message from the server...

Cheers,
Jan



Re: libpq's multi-threaded SSL callback handling is busted

From
Jan Urbański
Date:
Jan Urbański writes:

> Andres Freund writes:
>
>> On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
>>> That doesn't solve the problem of the Python deadlock, where you're not at
>>> leisure to call a C function at the beginning of your module.
>>
>> We could just never unload the hooks...
>
> That's what we did before 4e816286533dd34c10b368487d4079595a3e1418 :) And it
> got changed after http://www.postgresql.org/message-id/48620925.6070806@pws.com.au
>
>>
>>> > * If there's already callbacks set: Remember that fact and don't
>>> >   overwrite. In the next major version: warn.
>>>
>>> So yeah, that was my initial approach - check if callbacks are set, don't do
>>> the dance if they are. It felt like a crutch, though, and racy at that. There's
>>> no atomic way to test-and-set those callbacks. The window for racyness is
>>> small, though.
>>
>> If you do that check during library initialization instead of every
>> connection it shouldn't be racy - if that part is run in a multithreaded
>> fashion you're doing something crazy.
>
> Yes, that's true. The problem is that there's no real libpq initialisation
> function. The docs say that:
>
> "If your application initializes libssl and/or libcrypto libraries and libpq is
> built with SSL support, you should call PQinitOpenSSL"
>
> So most apps will just not bother. The moment you know you'll need SSL is only
> when you get an 'S' message from the server...

For the sake of discussion, here's a patch to prevent stomping on
previously-set callbacks, racy as it looks.

FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

J
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..54312a5 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -710,6 +710,9 @@ verify_peer_name_matches_certificate(PGconn *conn)
  *    Callback functions for OpenSSL internal locking
  */

+#if OPENSSL_VERSION_NUMBER < 0x10000000
+/* OpenSSL 1.0.0 deprecates the CRYPTO_set_id_callback function and provides a
+ * default implementation, so there's no need for our own. */
 static unsigned long
 pq_threadidcallback(void)
 {
@@ -720,6 +723,7 @@ pq_threadidcallback(void)
      */
     return (unsigned long) pthread_self();
 }
+#endif   /* OPENSSL_VERSION_NUMBER < 0x10000000 */

 static pthread_mutex_t *pq_lockarray;

@@ -806,9 +810,14 @@ pgtls_init(PGconn *conn)

         if (ssl_open_connections++ == 0)
         {
-            /* These are only required for threaded libcrypto applications */
-            CRYPTO_set_id_callback(pq_threadidcallback);
-            CRYPTO_set_locking_callback(pq_lockingcallback);
+            /* These are only required for threaded libcrypto applications, but
+             * make sure we don't stomp on them if they're already set. */
+#if OPENSSL_VERSION_NUMBER < 0x10000000
+            if (CRYPTO_get_id_callback() == NULL)
+                CRYPTO_set_id_callback(pq_threadidcallback);
+#endif
+            if (CRYPTO_get_locking_callback() == NULL)
+                CRYPTO_set_locking_callback(pq_lockingcallback);
         }
     }
 #endif   /* ENABLE_THREAD_SAFETY */
@@ -885,9 +894,14 @@ destroy_ssl_system(void)

     if (pq_init_crypto_lib && ssl_open_connections == 0)
     {
-        /* No connections left, unregister libcrypto callbacks */
-        CRYPTO_set_locking_callback(NULL);
-        CRYPTO_set_id_callback(NULL);
+        /* No connections left, unregister libcrypto callbacks, if no one
+         * registered different ones in the meantime. */
+#if OPENSSL_VERSION_NUMBER < 0x10000000
+        if (CRYPTO_get_id_callback() == pq_threadidcallback)
+            CRYPTO_set_id_callback(NULL);
+#endif
+        if (CRYPTO_get_locking_callback() == pq_lockingcallback)
+            CRYPTO_set_locking_callback(NULL);

         /*
          * We don't free the lock array or the SSL_context. If we get another

Re: libpq's multi-threaded SSL callback handling is busted

From
Peter Eisentraut
Date:
On 2/12/15 7:28 AM, Jan Urbański wrote:
>>>>> * If there's already callbacks set: Remember that fact and don't
>>>>>   overwrite. In the next major version: warn.
>>>>
>>>> So yeah, that was my initial approach - check if callbacks are set, don't do
>>>> the dance if they are. It felt like a crutch, though, and racy at that. There's
>>>> no atomic way to test-and-set those callbacks. The window for racyness is
>>>> small, though.
>>>
>>> If you do that check during library initialization instead of every
>>> connection it shouldn't be racy - if that part is run in a multithreaded
>>> fashion you're doing something crazy.
>>
>> Yes, that's true. The problem is that there's no real libpq initialisation
>> function. The docs say that:
>>
>> "If your application initializes libssl and/or libcrypto libraries and libpq is
>> built with SSL support, you should call PQinitOpenSSL"
>>
>> So most apps will just not bother. The moment you know you'll need SSL is only
>> when you get an 'S' message from the server...
> 
> For the sake of discussion, here's a patch to prevent stomping on
> previously-set callbacks, racy as it looks.
> 
> FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

I don't think this patch would actually fix the problem that was
described after the original bug report
(http://www.postgresql.org/message-id/5436991B.5020708@vmware.com),
namely that another thread acquires a lock while the libpq callbacks are
set and then cannot release the lock if libpq has been shut down in the
meantime.

The only way to fix that is to never unset the callbacks.  But we don't
want that or can't do that for other reasons.

I think the only way out is to declare that if there are multiple
threads and other threads might be using OpenSSL not through libpq, then
the callbacks need to be managed outside of libpq.

In environments like PHP or Python this would require some coordination
work across modules somehow, but I don't see a way around that.




Re: libpq's multi-threaded SSL callback handling is busted

From
Jan Urbański
Date:
Peter Eisentraut writes:

> On 2/12/15 7:28 AM, Jan Urbański wrote:
>> For the sake of discussion, here's a patch to prevent stomping on
>> previously-set callbacks, racy as it looks.
>>
>> FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...
>
> I don't think this patch would actually fix the problem that was
> described after the original bug report
> (http://www.postgresql.org/message-id/5436991B.5020708@vmware.com),
> namely that another thread acquires a lock while the libpq callbacks are
> set and then cannot release the lock if libpq has been shut down in the
> meantime.

I did test both the Python and the PHP repro scripts and the patch fixed both
the deadlock and the segfault.

What happens is that Python (for instance) stops over the callback
unconditionally. So when libpq gets unloaded, it sees that the currently set
callback is no the one it originally set and refrains from NULLing it.

There's a small race window there, to be sure, but it's a lot better than what
we have now.

Cheers,
Jan



Re: libpq's multi-threaded SSL callback handling is busted

From
Peter Eisentraut
Date:
On 4/2/15 4:32 AM, Jan Urbański wrote:
> 
> Peter Eisentraut writes:
> 
>> On 2/12/15 7:28 AM, Jan Urbański wrote:
>>> For the sake of discussion, here's a patch to prevent stomping on
>>> previously-set callbacks, racy as it looks.
>>>
>>> FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...
>>
>> I don't think this patch would actually fix the problem that was
>> described after the original bug report
>> (http://www.postgresql.org/message-id/5436991B.5020708@vmware.com),
>> namely that another thread acquires a lock while the libpq callbacks are
>> set and then cannot release the lock if libpq has been shut down in the
>> meantime.
> 
> I did test both the Python and the PHP repro scripts and the patch fixed both
> the deadlock and the segfault.
> 
> What happens is that Python (for instance) stops over the callback
> unconditionally. So when libpq gets unloaded, it sees that the currently set
> callback is no the one it originally set and refrains from NULLing it.

So this works because Python is just as broken as libpq right now.  What
happens if Python is fixed as well?  Then we'll have the problem I
described above.




Re: libpq's multi-threaded SSL callback handling is busted

From
Jan Urbański
Date:
Peter Eisentraut writes:
> On 4/2/15 4:32 AM, Jan Urbański wrote:
>> 
>> Peter Eisentraut writes:
>>> I don't think this patch would actually fix the problem that was
>>> described after the original bug report
>>> (http://www.postgresql.org/message-id/5436991B.5020708@vmware.com),
>>> namely that another thread acquires a lock while the libpq callbacks are
>>> set and then cannot release the lock if libpq has been shut down in the
>>> meantime.
>> 
>> I did test both the Python and the PHP repro scripts and the patch fixed both
>> the deadlock and the segfault.
>> 
>> What happens is that Python (for instance) stops over the callback
>> unconditionally. So when libpq gets unloaded, it sees that the currently set
>> callback is no the one it originally set and refrains from NULLing it.
>
> So this works because Python is just as broken as libpq right now.  What
> happens if Python is fixed as well?  Then we'll have the problem I
> described above.

Well, not really, libpq sets and unsets the callbacks every time an SSL
connection is opened and closed. Python sets the callbacks once when the ssl
module is imported and never touches them again.

Arguably, it should set them even earlier, but it's still saner than
flip-flopping them. AFAIK you can't "unload" Python, so setting the callback
and keeping it there is a sound strategy.

The change I'm proposing is not to set the callback in libpq if one is already
set and not remove it if the one that's set is not libpq's. That's as sane as
it gets.

With multiple libraries wanting to use OpenSSL in threaded code, the mechanism
seems to be "last one wins". It doesn't matter *who's* callbacks are used, as
long as they're there and they stay there and that's Python's stance. This
doesn't work if you want to be able to unload the library, so the next best
thing is not touching the callback if someone else set his in the meantime.

What's broken is OpenSSL for offering such a bad way of dealing with
concurrency.

To reiterate: I recognise that not handling the callbacks is not the right
answer. But not stomping on callbacks others might have set sounds like a
minimal and safe improvement.

Cheers,
Jan



Re: libpq's multi-threaded SSL callback handling is busted

From
Peter Eisentraut
Date:
On 4/3/15 7:44 AM, Jan Urbański wrote:
> To reiterate: I recognise that not handling the callbacks is not the right
> answer. But not stomping on callbacks others might have set sounds like a
> minimal and safe improvement.

I think your patch is okay in that it is not a good idea to overwrite or
unset someone else's callbacks.  But we shouldn't mistake that for
fixing the underlying problem.  The only reason this patch appears to
fix the presented test cases is because other OpenSSL users are also
misbehaving and/or the OpenSSL interfaces are so stupid that they cannot
be worked with reasonably.





Re: libpq's multi-threaded SSL callback handling is busted

From
Jan Urbański
Date:
Peter Eisentraut writes:

> On 4/3/15 7:44 AM, Jan Urbański wrote:
>> To reiterate: I recognise that not handling the callbacks is not the right
>> answer. But not stomping on callbacks others might have set sounds like a
>> minimal and safe improvement.
>
> I think your patch is okay in that it is not a good idea to overwrite or
> unset someone else's callbacks.  But we shouldn't mistake that for
> fixing the underlying problem.  The only reason this patch appears to
> fix the presented test cases is because other OpenSSL users are also
> misbehaving and/or the OpenSSL interfaces are so stupid that they cannot
> be worked with reasonably.

Yeah, the underlying problem is OpenSSL's idea of handling threads by limiting
itself to providing a function pointer. That's what we have to work with,
sadly.

Faced by such madness, libpq should try to do the sanest possible thing, at
least then if it breaks it's not our fault.

Cheers,
Jan



Re: libpq's multi-threaded SSL callback handling is busted

From
Peter Eisentraut
Date:
On 2/12/15 7:28 AM, Jan Urbański wrote:
> +#if OPENSSL_VERSION_NUMBER < 0x10000000
> +/* OpenSSL 1.0.0 deprecates the CRYPTO_set_id_callback function and provides a
> + * default implementation, so there's no need for our own. */

I have some additional concerns about this.  It is true that OpenSSL
1.0.0 deprecates CRYPTO_set_id_callback(), but it replaces it with
CRYPTO_THREADID_set_callback().  There is no indication that you don't
need to set a callback anymore.  The man page
(https://www.openssl.org/docs/crypto/threads.html) still says you need
to set two callbacks, and points to the new interface.

It is true that there is a fallback implementation for some platforms,
but there is no indication that one is invited to rely on those.  Let's
keep in mind that libpq is potentially used on obscure platforms, so I'd
rather stick with the documented approaches.

I suggest you remove this part from your patch and submit a separate
patch for consideration if you want to.




Re: libpq's multi-threaded SSL callback handling is busted

From
Jan Urbański
Date:
Peter Eisentraut writes:

> On 2/12/15 7:28 AM, Jan Urbański wrote:
>> +#if OPENSSL_VERSION_NUMBER < 0x10000000
>> +/* OpenSSL 1.0.0 deprecates the CRYPTO_set_id_callback function and provides a
>> + * default implementation, so there's no need for our own. */
>
> I have some additional concerns about this.  It is true that OpenSSL
> 1.0.0 deprecates CRYPTO_set_id_callback(), but it replaces it with
> CRYPTO_THREADID_set_callback().  There is no indication that you don't
> need to set a callback anymore.  The man page
> (https://www.openssl.org/docs/crypto/threads.html) still says you need
> to set two callbacks, and points to the new interface.

Truly, no good deed can ever go unpunished.

I spent around an hour tracking down why setting both callbacks
for OpenSSL >= 1.0.0 brought back the PHP segfaults. Turns out, in OpenSSL
there's *no way* to unregister a callback registered with
CRYPTO_THREADID_set_callback()!

Observe: https://github.com/openssl/openssl/blob/35a1cc90bc1795e8893c11e442790ee7f659fffb/crypto/thr_id.c#L174-L180

Once you set a callback, game over - unloading your library will cause a
segfault. I cannot express how joyful I felt when I discovered it.

> I suggest you remove this part from your patch and submit a separate
> patch for consideration if you want to.

At this point I will propose an even simpler patch (attached). I gave up on
trying to use the more modern CRYPTO_THREADID_* functions.

Right now, HEAD libpq won't compile against OpenSSL compiled with
OPENSSL_NO_DEPRECATED, which I think is fine, given the lack of complaints. So
let's just keep using the older, saner functions and ignore the THREADID crap.

By the way, might I take the opportunity to breach the subject of backpatching
this change, should it get accepted?

Cheers,
Jan


Attachment

Re: libpq's multi-threaded SSL callback handling is busted

From
Peter Eisentraut
Date:
On 4/9/15 3:54 PM, Jan Urbański wrote:
> At this point I will propose an even simpler patch (attached). I gave up on
> trying to use the more modern CRYPTO_THREADID_* functions.
> 
> Right now, HEAD libpq won't compile against OpenSSL compiled with
> OPENSSL_NO_DEPRECATED, which I think is fine, given the lack of complaints. So
> let's just keep using the older, saner functions and ignore the THREADID crap.

Committed.

> By the way, might I take the opportunity to breach the subject of backpatching
> this change, should it get accepted?

I'll do that if another committer will speak out in favor of it.




Re: libpq's multi-threaded SSL callback handling is busted

From
Jan Urbański
Date:
Peter Eisentraut writes:

> On 4/9/15 3:54 PM, Jan Urbański wrote:
>> At this point I will propose an even simpler patch (attached). I gave up on
>> trying to use the more modern CRYPTO_THREADID_* functions.
>> 
>> Right now, HEAD libpq won't compile against OpenSSL compiled with
>> OPENSSL_NO_DEPRECATED, which I think is fine, given the lack of complaints. So
>> let's just keep using the older, saner functions and ignore the THREADID crap.
>
> Committed.

Thank you!

And thanks for the discussion, I hope my frustration did not come across as
brusqueness.

Cheers,
Jan