Thread: Regression in pipeline mode in libpq 14.5
Hello, I believe that pipeline mode was broken in libpq 14.5, likely after the refactoring performed to solve the problem of the unexpected Close messages sent on PQexecQuery [1]. The psycopg 3.1 test suite hangs when running with libpq 14.5 (reported at [2]). I have written a script to reproduce the issue, which can be executed running: ``` git clone -b fix-350 git@github.com:psycopg/psycopg.git cd psycopg python3 -m venv .venv source .venv/bin/activate pip install -e ./psycopg PSYCOPG_IMPL=debug python ./test-350.py ``` The script prints on stderr all the libpq calls and the be-fe trace. You can find attached the two logs obtained running the script with libpq 14.4 and 14.5. Differences can be seen online in [3]. The script runs, in Python: ``` with conn.cursor() as cur: with conn.pipeline() as p: cur.execute("SELECT 1") ``` The execute() runs an implicit BEGIN, which is also executed in pipeline mode. Exiting the pipeline() block causes a Sync. So we expect 3 results in the pipeline (a COMMAND_OK after BEGIN, a TUPLES_OK after SELECT, a PIPELINE_SYNC). At a glance I see the following behaviours in 14.5 which seem errors: - the result of the SELECT (TUPLES_OK) is lost. - later, a PQisBusy() returns 1, but the following epoll() call blocks and times out, nothing is received from the network. Happy to know if we need to do something different to accommodate changes in 14.5, however these seem regressions to me. Thank you very much -- Daniele [1] https://www.postgresql.org/message-id/CA%2Bmi_8bvD0_CW3sumgwPvWdNzXY32itoG_16tDYRu_1S2gV2iw%40mail.gmail.com [2] https://github.com/psycopg/psycopg/issues/350 [3] https://www.diffchecker.com/oe0yA6lu
Attachment
On 2022-Aug-14, Daniele Varrazzo wrote: > The execute() runs an implicit BEGIN, which is also executed in > pipeline mode. Exiting the pipeline() block causes a Sync. So we > expect 3 results in the pipeline (a COMMAND_OK after BEGIN, a > TUPLES_OK after SELECT, a PIPELINE_SYNC). Hmm, it seems (judging only from comparing your two traces) that the problem stems from the newly added hack to handle CloseComplete. I'll have a look later in the week. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)
On Mon, 15 Aug 2022 at 17:24, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Hmm, it seems (judging only from comparing your two traces) that the > problem stems from the newly added hack to handle CloseComplete. I'll > have a look later in the week. We worked around the problem in psycopg by dropping every use of `PQsendQuery()` and only using `PQsendQueryParams()` for internal queries too. So this is no more a blocker for our 3.1 release. I will try to perform periodic test runs against Postgres master in order to catch future breakages before a Postgres release. Please find attached a smaller test to reproduce the issue. It's written in Python and uses psycopg master branch, but it only uses libpq calls so it can be easily converted to C or whatever is useful to add to your test suite. In order to run: ``` python3 -m venv venv source venv/bin/activate pip install "git+https://github.com/psycopg/psycopg.git@e5079184#subdirectory=psycopg&egg=psycopg" python test-pipeline-bug.py ``` The script will succeed running with libpq 14.4 and fail running libpq 14.5. The difference in the traces is similar to what was attached upthread. Best regards -- Daniele
Attachment
On 2022-Aug-14, Daniele Varrazzo wrote: > I believe that pipeline mode was broken in libpq 14.5, likely after > the refactoring performed to solve the problem of the unexpected Close > messages sent on PQexecQuery [1]. So I've spent a lot of time trying to understand what is going on here, and my impression is that this stuff is thoroughly broken, and I don't know how to fix it. So I propose to rip it out -- specifically: make it an error to call PQsendQuery when in pipeline mode. PQsendQueryParams can be used instead, and all is well. The problem is that that the CloseComplete message remains a mess, and the hack I added made things worse, or maybe it just moved the mess elsewhere. More specifically, I propose to remove its handling from 15 and master; but leave it in place in 14, to avoid breaking things in a minor release if somebody is already using it and they haven't run into this particular bug. This should be OK for psycopg, since Daniele said he already stopped using PQsendQuery in pipeline mode. PS: it's quite likely that there *is* a way to fix it, if we're OK with more coupling between fe-exec (PQgetResult) and fe-protocol3 (parseInput3). But we probably don't want that and I don't want to spend more time figuring out exactly how. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
So it'd be as in the attached. In writing this, I also noticed that the extended query protocol emulation I wrote for PQsendQuery had a bug, so the traces that result by using PQsendQueryParams instead have a small difference. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "You don't solve a bad join with SELECT DISTINCT" #CupsOfFail https://twitter.com/connor_mc_d/status/1431240081726115845