Re: GSSENC'ed connection stalls while reconnection attempts. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: GSSENC'ed connection stalls while reconnection attempts.
Date
Msg-id 2101368.1594396870@sss.pgh.pa.us
Whole thread Raw
In response to GSSENC'ed connection stalls while reconnection attempts.  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: GSSENC'ed connection stalls while reconnection attempts.  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: GSSENC'ed connection stalls while reconnection attempts.  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> If psql connected using GSSAPI auth and server restarted, reconnection
> sequence stalls and won't return.

Yeah, reproduced here.  (I wonder if there's any reasonable way to
exercise this scenario in src/test/kerberos/.)

> I found that psql(libpq) sends startup packet via gss
> encryption. conn->gssenc should be reset when encryption state is
> freed.

Actually, it looks to me like the GSS support was wedged in by somebody
who was paying no attention to how SSL is managed, or else we forgot
to pay attention to GSS the last time we rearranged SSL support.  It's
completely broken for the multiple-host-addresses scenario as well,
because try_gss is being set and cleared in the wrong places altogether.
conn->gcred is not being handled correctly either I think --- we need
to make sure that it's dropped in pqDropConnection.

The attached patch makes this all act more like the way SSL is handled,
and for me it resolves the reconnection problem.

> The reason that psql doesn't notice the error is pqPacketSend returns
> STATUS_OK when write error occurred.  That behavior contradicts to the
> comment of the function. The function is used only while making
> connection so it's ok to error out immediately by write failure.  I
> think other usage of pqFlush while making a connection should report
> write failure the same way.

I'm disinclined to mess with that, because (a) I don't think it's the
actual source of the problem, and (b) it would affect way more than
just GSS mode.

> Finally, It's user-friendly if psql shows error message for error on
> reset attempts. (This perhaps should be arguable.)

Meh, that's changing fairly longstanding behavior that I don't think
we've had many complaints about.

            regards, tom lane

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27c9bb46ee..7bee9dd201 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -477,6 +477,11 @@ pqDropConnection(PGconn *conn, bool flushInput)
     {
         OM_uint32    min_s;

+        if (conn->gcred != GSS_C_NO_CREDENTIAL)
+        {
+            gss_release_cred(&min_s, &conn->gcred);
+            conn->gcred = GSS_C_NO_CREDENTIAL;
+        }
         if (conn->gctx)
             gss_delete_sec_context(&min_s, &conn->gctx, GSS_C_NO_BUFFER);
         if (conn->gtarg_nam)
@@ -496,6 +501,7 @@ pqDropConnection(PGconn *conn, bool flushInput)
             free(conn->gss_ResultBuffer);
             conn->gss_ResultBuffer = NULL;
         }
+        conn->gssenc = false;
     }
 #endif
 #ifdef ENABLE_SSPI
@@ -2027,11 +2033,6 @@ connectDBStart(PGconn *conn)
      */
     resetPQExpBuffer(&conn->errorMessage);

-#ifdef ENABLE_GSS
-    if (conn->gssencmode[0] == 'd') /* "disable" */
-        conn->try_gss = false;
-#endif
-
     /*
      * Set up to try to connect to the first host.  (Setting whichhost = -1 is
      * a bit of a cheat, but PQconnectPoll will advance it to 0 before
@@ -2468,6 +2469,9 @@ keep_going:                        /* We will come back to here until there is
         conn->allow_ssl_try = (conn->sslmode[0] != 'd');    /* "disable" */
         conn->wait_ssl_try = (conn->sslmode[0] == 'a'); /* "allow" */
 #endif
+#ifdef ENABLE_GSS
+        conn->try_gss = (conn->gssencmode[0] != 'd');    /* "disable" */
+#endif

         reset_connection_state_machine = false;
         need_new_connection = true;
@@ -3349,12 +3353,8 @@ keep_going:                        /* We will come back to here until there is
                      */
                     if (conn->gssenc && conn->gssencmode[0] == 'p')
                     {
-                        OM_uint32    minor;
-
                         /* postmaster expects us to drop the connection */
                         conn->try_gss = false;
-                        conn->gssenc = false;
-                        gss_delete_sec_context(&minor, &conn->gctx, NULL);
                         pqDropConnection(conn, true);
                         conn->status = CONNECTION_NEEDED;
                         goto keep_going;
@@ -3906,9 +3906,6 @@ makeEmptyPGconn(void)
     conn->verbosity = PQERRORS_DEFAULT;
     conn->show_context = PQSHOW_CONTEXT_ERRORS;
     conn->sock = PGINVALID_SOCKET;
-#ifdef ENABLE_GSS
-    conn->try_gss = true;
-#endif

     /*
      * We try to send at least 8K at a time, which is the usual size of pipe
@@ -4065,22 +4062,6 @@ freePGconn(PGconn *conn)
         free(conn->gsslib);
     if (conn->connip)
         free(conn->connip);
-#ifdef ENABLE_GSS
-    if (conn->gcred != GSS_C_NO_CREDENTIAL)
-    {
-        OM_uint32    minor;
-
-        gss_release_cred(&minor, &conn->gcred);
-        conn->gcred = GSS_C_NO_CREDENTIAL;
-    }
-    if (conn->gctx)
-    {
-        OM_uint32    minor;
-
-        gss_delete_sec_context(&minor, &conn->gctx, GSS_C_NO_BUFFER);
-        conn->gctx = NULL;
-    }
-#endif
     /* Note that conn->Pfdebug is not ours to close or free */
     if (conn->last_query)
         free(conn->last_query);

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: expose parallel leader in CSV and log_line_prefix
Next
From: Justin Pryzby
Date:
Subject: Re: expose parallel leader in CSV and log_line_prefix