Re: Libpq single-row mode slowness - Mailing list pgsql-hackers
From | Daniele Varrazzo |
---|---|
Subject | Re: Libpq single-row mode slowness |
Date | |
Msg-id | CA+mi_8Yfs_knZmPKFjKa_WdgYUzUBp-=xChTzhTf70n8DAGdMQ@mail.gmail.com Whole thread Raw |
In response to | Re: Libpq single-row mode slowness (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Sun, 1 May 2022 at 23:12, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The usual expectation is that you call PQconsumeInput to get rid of > a read-ready condition on the socket. If you don't have a poll() or > select() or the like in the loop, you might be wasting a lot of > pointless recvfrom calls. You definitely don't need to call > PQconsumeInput if PQisBusy is already saying that a result is available, > and in single-row mode it's likely that several results can be consumed > per recvfrom call. This makes sense and, with some refactoring of our fetch loop, the overhead of using single-row mode is now down to about 3x, likely caused by the greater overhead in Python calls. Please note that the insight you gave in your answer seems to contradict the documentation. Some excerpts of https://www.postgresql.org/docs/current/libpq-async.html: """ PQconsumeInput: "After calling PQconsumeInput , the application can check PQisBusy and/or PQnotifies to see if their state has changed" PQisBusy: "will not itself attempt to read data from the server; therefore PQconsumeInput must be invoked first, or the busy state will never end." ... A typical application [will use select()]. When the main loop detects input ready, it should call PQconsumeInput to read the input. It can then call PQisBusy, followed by PQgetResult if PQisBusy returns false (0). """ All these indications give the impression that there is a sort of mandatory order, requiring to call first PQconsumeInput, then PQisBusy. As a consequence, the core of our function to fetch a single result was implemented as: ``` def fetch(pgconn): while True: pgconn.consume_input() if not pgconn.is_busy(): break yield Wait.R return pgconn.get_result() ``` (Where the `yield Wait.R` suspends this execution to call into select() or whatever waiting policy the program is using.) Your remarks suggest that PQisBusy() can be called before PQconsumeInput(), and that the latter doesn't need to be called if not busy. As such I have modified the loop to be: ``` def fetch(pgconn): if pgconn.is_busy(): yield Wait.R while True: pgconn.consume_input() if not pgconn.is_busy(): break yield Wait.R return pgconn.get_result() ``` which seems to work well: tests don't show regressions and single-row mode doesn't waste recvfrom() anymore. Is this new fetching pattern the expected way to interact with the libpq? If so, should we improve the documentation to suggest that there are reasons to call PQisBusy before PQconsumeInput? Especially in the single-row mode docs page, which doesn't make relevant mentions to the use of these functions. Thank you very much for your help, really appreciated. -- Daniele
pgsql-hackers by date: