Thread: BUG #18907: SSL error: bad length failure during transfer data in pipeline mode with libpq

The following bug has been logged on the website:

Bug reference:      18907
Logged by:          Dorjpalam Batbaatar
Email address:      htgn.dbat.95@gmail.com
PostgreSQL version: 16.4
Operating system:   AlmaLinux 9
Description:

When using libpq to transfer large amounts of data to the server in pipeline
mode (registering with COPY), an error "SSL error: bad length"
sometimes occurs. The most common cause of the error is libpq's
PQsendQueryParams(). PostgreSQL is version 16.4.
I looked into this here, and it seems that the cause is that openssl's
SSL_write() is not being retried when it should be.
According to the openssl documentation SSL_write(), if the return value of
SSL_get_error() is SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE,
it must be called again with the same data.
https://docs.openssl.org/3.0/man3/SSL_write/#warnings
In libpq's message sending function pqPutMsgEnd(PGconn *conn), if not all
data has been sent and in non-blocking mode, it just returns,
but in the libpq's exported API (e.g. PQsendQueryGuts() called by
PQsendQueryParams()), pqPutMsgEnd() is called multiple times, so I think the
sent data changes.
So in the above situation, it needs to be retried with the same data, but it
seems that the error occurs because the send data has changed.
As a test, I tried to retry if pqsecure_write() returned 0 in pqSendSome(),
and it ran in pipeline mode without errors. pqSendSome()
is a function which called from pqPutMsgEnd(PGconn *conn) and
pqsecure_write() is called from this. In pqsecure_write() SSL_write() is
performed.
Below is the patch I tried.
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 488f7d6e55..bbafb189c9 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -914,22 +914,43 @@ pqSendSome(PGconn *conn, int len)
                         * Note that errors here don't result in
write_failed becoming
                         * set.
                         */
-                       if (pqReadData(conn) < 0)
+                       if (sent > 0)
                        {
-                               result = -1;    /* error message already set
up */
-                               break;
-                       }
+                               if (pqReadData(conn) < 0)
+                               {
+                                       result = -1;    /* error message
already set up */
+                                       break;
+                               }
-                       if (pqIsnonblocking(conn))
-                       {
-                               result = 1;
-                               break;
-                       }
+                               if (pqIsnonblocking(conn))
+                               {
+                                       result = 1;
+                                       break;
+                               }
-                       if (pqWait(true, true, conn))
+                               if (pqWait(true, true, conn))
+                               {
+                                       result = -1;
+                                       break;
+                               }
+                       }
+                       else
                        {
-                               result = -1;
-                               break;
+                               /*
+                                * When sent is 0 retry for write. Before
write again read
+                                * which arrived responses from the server
+                                */
+                               if (pqWait(true, true, conn))
+                               {
+                                       result = -1;
+                                       break;
+                               }
+
+                               if (pqReadData(conn) < 0)
+                               {
+                                       result = -1;    /* error message
already set up */
+                                       break;
+                               }
                        }
                }
        }


PG Bug reporting form <noreply@postgresql.org> writes:
> When using libpq to transfer large amounts of data to the server in pipeline
> mode (registering with COPY), an error "SSL error: bad length"
> sometimes occurs.

Could you provide a self-contained test case demonstrating such
failures?  This is not the kind of code that we like to change
on the basis of undocumented claims.

            regards, tom lane



On Tue, Apr 29, 2025 at 11:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Could you provide a self-contained test case demonstrating such
> failures?  This is not the kind of code that we like to change
> on the basis of undocumented claims.

Agreed -- but also, let us know if the answer is "no, I can't", or if
you get stuck and need some additional collaboration. These corner
cases can be really nasty to track down and record.

Thanks,
--Jacob