Thread: BUG #18907: SSL error: bad length failure during transfer data in pipeline mode with libpq
BUG #18907: SSL error: bad length failure during transfer data in pipeline mode with libpq
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18907 Logged by: Dorjpalam Batbaatar Email address: htgn.dbat.95@gmail.com PostgreSQL version: 16.4 Operating system: AlmaLinux 9 Description: When using libpq to transfer large amounts of data to the server in pipeline mode (registering with COPY), an error "SSL error: bad length" sometimes occurs. The most common cause of the error is libpq's PQsendQueryParams(). PostgreSQL is version 16.4. I looked into this here, and it seems that the cause is that openssl's SSL_write() is not being retried when it should be. According to the openssl documentation SSL_write(), if the return value of SSL_get_error() is SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE, it must be called again with the same data. https://docs.openssl.org/3.0/man3/SSL_write/#warnings In libpq's message sending function pqPutMsgEnd(PGconn *conn), if not all data has been sent and in non-blocking mode, it just returns, but in the libpq's exported API (e.g. PQsendQueryGuts() called by PQsendQueryParams()), pqPutMsgEnd() is called multiple times, so I think the sent data changes. So in the above situation, it needs to be retried with the same data, but it seems that the error occurs because the send data has changed. As a test, I tried to retry if pqsecure_write() returned 0 in pqSendSome(), and it ran in pipeline mode without errors. pqSendSome() is a function which called from pqPutMsgEnd(PGconn *conn) and pqsecure_write() is called from this. In pqsecure_write() SSL_write() is performed. Below is the patch I tried. diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index 488f7d6e55..bbafb189c9 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -914,22 +914,43 @@ pqSendSome(PGconn *conn, int len) * Note that errors here don't result in write_failed becoming * set. */ - if (pqReadData(conn) < 0) + if (sent > 0) { - result = -1; /* error message already set up */ - break; - } + if (pqReadData(conn) < 0) + { + result = -1; /* error message already set up */ + break; + } - if (pqIsnonblocking(conn)) - { - result = 1; - break; - } + if (pqIsnonblocking(conn)) + { + result = 1; + break; + } - if (pqWait(true, true, conn)) + if (pqWait(true, true, conn)) + { + result = -1; + break; + } + } + else { - result = -1; - break; + /* + * When sent is 0 retry for write. Before write again read + * which arrived responses from the server + */ + if (pqWait(true, true, conn)) + { + result = -1; + break; + } + + if (pqReadData(conn) < 0) + { + result = -1; /* error message already set up */ + break; + } } } }
Re: BUG #18907: SSL error: bad length failure during transfer data in pipeline mode with libpq
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > When using libpq to transfer large amounts of data to the server in pipeline > mode (registering with COPY), an error "SSL error: bad length" > sometimes occurs. Could you provide a self-contained test case demonstrating such failures? This is not the kind of code that we like to change on the basis of undocumented claims. regards, tom lane
Re: BUG #18907: SSL error: bad length failure during transfer data in pipeline mode with libpq
From
Jacob Champion
Date:
On Tue, Apr 29, 2025 at 11:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Could you provide a self-contained test case demonstrating such > failures? This is not the kind of code that we like to change > on the basis of undocumented claims. Agreed -- but also, let us know if the answer is "no, I can't", or if you get stuck and need some additional collaboration. These corner cases can be really nasty to track down and record. Thanks, --Jacob