Hello,
a Psycopg 3 user has tested what boils down pretty much to a
"generate_series(100K)" and has reported a 100x difference between
fetching it normally and in single-row mode. I have repeated the test
myself and I have found a 50x difference (the OP doesn't specify which
platform is using, mine is Linux).
Of course calling PQconsumeInput 100K times has some overhead compared
to calling it only once. However, I would like to know if this level
of overhead is expected, or if instead anyone smells some wasted
cycles.
According to some profiling, the largest part of the time is spent
inside a libc function I don't know the symbol of, called by
pqReadData(). Details and pretty graphs are available at
https://github.com/psycopg/psycopg/issues/286
The operations we perform, for every row, are PQconsumeInput,
PQisBusy, PQgetResult. Every PQconsumeInput results in a recvfrom()
syscall, of which the first one returns the whole recordset, the
following ones return EAGAIN. There are two extra cycles: one to get
the TUPLES_OK result, one to get the end-of-stream NULL. It seems the
documented usage pattern, but I'm not sure if I'm not misreading it,
especially in the light of this libpq grumble [1].
[1] https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-misc.c#L681
Our connection is in non-blocking mode and we see the need for waiting
(using epoll here) only on the first call. The resulting strace of the
entire query process (of two rows) is:
21:36:53.659529 sendto(3, "P\0\0\0>\0select 'test' a, t b from "...,
108, MSG_NOSIGNAL, NULL, 0) = 108
21:36:53.660236 recvfrom(3, 0x1f6a870, 16384, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)
21:36:53.660589 epoll_create1(EPOLL_CLOEXEC) = 4
21:36:53.660848 epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLONESHOT,
{u32=3, u64=3}}) = 0
21:36:53.661099 epoll_wait(4, [{EPOLLIN, {u32=3, u64=3}}], 1023, 100000) = 1
21:36:53.661941 recvfrom(3,
"1\0\0\0\0042\0\0\0\4T\0\0\0.\0\2a\0\0\0\0\0\0\0\0\0\0\31\377\377\377"...,
16384, 0, NULL, NULL) = 117
21:36:53.662506 close(4) = 0
21:36:53.662830 recvfrom(3, 0x1f6a898, 16344, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)
21:36:53.663162 recvfrom(3, 0x1f6a884, 16364, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)
21:36:53.663359 recvfrom(3, 0x1f6a876, 16378, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)
The test is on a localhost connection with sslmode disabled using libpq 14.2.
Is this the correct usage? Any insight is welcome.
Thank you very much!
-- Daniele