Home > mailing lists

Re: Flushing large data immediately in pqcomm - Mailing list pgsql-hackers

From	Melih Mutlu
Subject	Re: Flushing large data immediately in pqcomm
Date	January 30, 2024 17:41:30
Msg-id	CAGPVpCTfzhiOCWPwpRpvV6EZU0egJix4jNObp_OkhfZESdPbFQ@mail.gmail.com Whole thread Raw
In response to	Re: Flushing large data immediately in pqcomm (Heikki Linnakangas <hlinnaka@iki.fi>)
List	pgsql-hackers

Tree view

Hi Heikki,

Heikki Linnakangas <hlinnaka@iki.fi>, 29 Oca 2024 Pzt, 19:12 tarihinde şunu yazdı:

> Proposed change modifies socket_putmessage to send any data larger than
> 8K immediately without copying it into the send buffer. Assuming that
> the send buffer would be flushed anyway due to reaching its limit, the
> patch just gets rid of the copy part which seems unnecessary and sends
> data without waiting.

If there's already some data in PqSendBuffer, I wonder if it would be
better to fill it up with data, flush it, and then send the rest of the
data directly. Instead of flushing the partial data first. I'm afraid
that you'll make a tiny call to secure_write(), followed by a large one,
then a tine one again, and so forth. Especially when socket_putmessage
itself writes the msgtype and len, which are tiny, before the payload.

I agree that I could do better there without flushing twice for both PqSendBuffer and input data. PqSendBuffer always has some data, even if it's tiny, since msgtype and len are added.

Perhaps we should invent a new pq_putmessage() function that would take
an input buffer with 5 bytes of space reserved before the payload.
pq_putmessage() could then fill in the msgtype and len bytes in the
input buffer and send that directly. (Not wedded to that particular API,
but something that would have the same effect)

I thought about doing this. The reason why I didn't was because I think that such a change would require adjusting all input buffers wherever pq_putmessage is called, and I did not want to touch that many different places. These places where we need pq_putmessage might not be that many though, I'm not sure.

> This change affects places where pq_putmessage is used such as
> pg_basebackup, COPY TO, walsender etc.
>
> I did some experiments to see how the patch performs.
> Firstly, I loaded ~5GB data into a table [1], then ran "COPY test TO
> STDOUT". Here are perf results of both the patch and HEAD > ...
> The patch brings a ~5% gain in socket_putmessage.
>
> [1]
> CREATE TABLE test(id int, name text, time TIMESTAMP);
> INSERT INTO test (id, name, time) SELECT i AS id, repeat('dummy', 100)
> AS name, NOW() AS time FROM generate_series(1, 100000000) AS i;

I'm surprised by these results, because each row in that table is < 600
bytes. PqSendBufferSize is 8kB, so the optimization shouldn't kick in in
that test. Am I missing something?

You're absolutely right. I made a silly mistake there. I also think that the way I did perf analysis does not make much sense, even if one row of the table is greater than 8kB.

Here are some quick timing results after being sure that it triggers this patch's optimization. I need to think more on how to profile this with perf. I hope to share proper results soon.

I just added a bit more zeros [1] and ran [2] (hopefully measured the correct thing)

HEAD:

real 2m48,938s

user 0m9,226s

sys 1m35,342s

Patch:

real 2m40,690s

user 0m8,492s

sys 1m31,001s

[1]

INSERT INTO test (id, name, time) SELECT i AS id, repeat('dummy', 10000) AS name, NOW() AS time FROM generate_series(1, 1000000) AS i;

[2]

rm /tmp/dummy && echo 3 | sudo tee /proc/sys/vm/drop_caches && time psql -d postgres -c "COPY test TO STDOUT;" > /tmp/dummy

Thanks,

Melih Mutlu

Microsoft

pgsql-hackers by date:

From: Pavel Stehule
Date: 30 January 2024, 17:35:55
Subject: Re: Bytea PL/Perl transform

From: Robert Haas
Date: 30 January 2024, 17:48:50
Subject: Re: Possibility to disable `ALTER SYSTEM`

Re: Flushing large data immediately in pqcomm - Mailing list pgsql-hackers

Previous

Next