Thread: LibPQ: PQresultMemorySize as proxy to transfered bytes
Hi,
To measure throughput, I'm timing exec time spent in LibPQ,
against the size of the result-set in bytes, as reported by PQresultMemorySize().
EXEC: 7x ( 130,867 rows, 54,921,532 bytes) in 0.305s (171.8 MB/s)
EXEC: 8x ( 180,079 rows, 95,876,047 bytes) in 0.493s (185.5 MB/s)
EXEC: 9x ( 224,253 rows, 371,663,836 bytes) in 2.986s (118.7 MB/s)
The problem is that I have only a 1GbE network link, so theoretically,
max-throughput should be around 125MB/s, which the 1st two runs exceed.
These 3 runs access the same schema, doing full scans of a few "real data" tables,
and the last one accesses more/larger bytea columns. These are plain SELECTs in
binary mode using normal sync execution (no cursors, COPY, single-row mode, PIPELINE, etc...)
Obviously from these results, I now realize PQresultMemorySize() returns
something larger than what went across the network. Can someone explain
how so? And if there's a better proxy to programmatically know the network
traffic exchanged on the connection's socket, that's cross-platform? Obviously
libpq itself knows, but I don't see any way to access that info.
Perhaps tracing might? But will that incur overhead?
I'd appreciate any insight. Thanks, --DD
Dominique Devienne <ddevienne@gmail.com> writes: > Obviously from these results, I now realize PQresultMemorySize() returns > something larger than what went across the network. Can someone explain > how so? That number is the total space actually requested from malloc() for the PGresult object. But we request space in blocks (typically 2KB each), so there's some overhead due to fields not exactly filling a block, unused space in the last block, etc. If you're testing with very small result sets (say a couple hundred KB) then the value could be pretty far off what went across the network. A larger result set should show less percentage error. regards, tom lane
I wrote: > That number is the total space actually requested from malloc() for > the PGresult object. But we request space in blocks (typically 2KB > each), so there's some overhead due to fields not exactly filling > a block, unused space in the last block, etc. If you're testing > with very small result sets (say a couple hundred KB) Sigh, obviously I meant a couple hundred *bytes* there. -ENOCAFFEINE regards, tom lane
On Wed, Jun 28, 2023 at 1:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> That number is the total space actually requested from malloc() for
> the PGresult object. But we request space in blocks (typically 2KB
> each), so there's some overhead due to fields not exactly filling
> a block, unused space in the last block, etc. If you're testing
> with very small result sets (say a couple hundred KB)
Sigh, obviously I meant a couple hundred *bytes* there. -ENOCAFFEINE
Thanks Tom. --DD
PS: I was hoping for answers to my other questions too,
but I guess that means there are no good answers to those.
On 2023-Jun-28, Dominique Devienne wrote: > And if there's a better proxy to programmatically know the network > traffic exchanged on the connection's socket, that's cross-platform? > Obviously > libpq itself knows, but I don't see any way to access that info. > > Perhaps tracing might? But will that incur overhead? Maybe you can use PQtrace()[1], yeah. It will indicate, for each message transmitted, its size in bytes. Naturally there will be some overhead in writing the trace file. Also, the format of the output file has some issues[2] that may make it difficult to machine-parse. But it might be good enough for you ... or you might find yourself motivated to fix these problems. [1] https://www.postgresql.org/docs/current/libpq-control.html#LIBPQ-PQTRACE [2] https://www.postgr.es/m/CAFCRh-8OPoe%3D0j9v4wth7qU-x4jvjVc8DoOLV66qdnC9Do0ymw%40mail.gmail.com -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Doing what he did amounts to sticking his fingers under the hood of the implementation; if he gets his fingers burnt, it's his problem." (Tom Lane)