BUG #17948: libpq seems to misbehave in a pipelining corner case - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17948: libpq seems to misbehave in a pipelining corner case
Date
Msg-id 17948-fcace7557e449957@postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17948
Logged by:          Ivan Trofimov
Email address:      i.trofimow@yandex.ru
PostgreSQL version: 15.3
Operating system:   Ubuntu 20.04
Description:

As far as i understand, there is an invariant in libpq, that if
PQpipelineSync call finished successfully,
PQresultStatus will eventually return PGRES_PIPELINE_SYNC, or the connection
will be in CONNECTION_BAD state.
 
This is highlighted in several places in the docs:
1. "PGRES_PIPELINE_SYNC is reported exactly once for each PQpipelineSync at
the corresponding point in the pipeline."
2. "From the client's perspective, after PQresultStatus returns
PGRES_FATAL_ERROR, the pipeline is flagged as aborted. 
PQresultStatus will report a PGRES_PIPELINE_ABORTED result for each
remaining queued operation in an aborted pipeline. 
The result for PQpipelineSync is reported as PGRES_PIPELINE_SYNC to signal
the end of the aborted pipeline and 
resumption of normal result processing. The client must process results with
PQgetResult during error recovery."
 
So i expect a code like this
 
if (!PQpipelineSync(conn)) exit(1);
 
while (PQstatus(conn) != CONNECTION_BAD) {
   res = PQgetResult(conn);
   if (PQresultStatus(res) == PGRES_PIPELINE_SYNC) {
       break;
   }
}
 
to eventually exit the loop.
 
However, if instead of expected "ReadyToQuery" response server sends an
error and closes the connection (
say, backend terminated by administrator or with a proxy in place and
upstream PG server being dead) this loop
gets stuck in busy-loop.
 
My understanding is that transitions in pipelining state-machine go like
this:
1. Initial PQgetResult call gets here

https://github.com/postgres/postgres/blob/a817edbf6f302c376f5c0012d19a0474b6bdea88/src/interfaces/libpq/fe-exec.c#L2081
2. parseInput gets an error the server send

https://github.com/postgres/postgres/blob/a817edbf6f302c376f5c0012d19a0474b6bdea88/src/interfaces/libpq/fe-protocol3.c#L216

and switches the state into PGASYNC_READY
3. PQgetResult continues and here

https://github.com/postgres/postgres/blob/a817edbf6f302c376f5c0012d19a0474b6bdea88/src/interfaces/libpq/fe-exec.c#L2126
advances the queue, so the Sync entry is gone, and switches into
PGASYNC_PIPELINE_IDLE later.
4. Next PQgetResult call gets here

https://github.com/postgres/postgres/blob/a817edbf6f302c376f5c0012d19a0474b6bdea88/src/interfaces/libpq/fe-exec.c#LL2110C4-L2110C26,
and pqPipelineProcessQueue switches into PGASYNC_IDLE here

https://github.com/postgres/postgres/blob/a817edbf6f302c376f5c0012d19a0474b6bdea88/src/interfaces/libpq/fe-exec.c#L3084
 
Now we are stuck in the position where libpq considers the connection being
in CONNECTION_OK state 
(because no reads over half-closed socket have been issues), 
asyncStatus being PGASYNC_IDLE, pipeline being in PQ_PIPELINE_ABORTED state,

and the expected PGRES_PIPELINE_SYNC never came and never will (because
PGASYNC_IDLE, so PQgetResult returns NULL right away).

I am able to reproduce this behavior reliably via
https://pastebin.com/raw/4j3v3QzC,
linking against libpq5=15.3 and running the program against PostgreSQL
15.2.

Who is at fault here, is it libpq or me misunderstanding/misusing libpq?


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Comparing date strings with jsonpath expression
Next
From: PG Bug reporting form
Date:
Subject: BUG #17949: Adding an index introduces serialisation anomalies.