Thread: refactor the CopyOneRowTo

refactor the CopyOneRowTo

From
jian he
Date:
original CopyOneRowTo:
https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/commands/copyto.c#n922
I change it to:
-----------------------
if (!cstate->opts.binary)
{
foreach_int(attnum, cstate->attnumlist)
{
Datum value = slot->tts_values[attnum - 1];
bool isnull = slot->tts_isnull[attnum - 1];

if (need_delim)
CopySendChar(cstate, cstate->opts.delim[0]);
need_delim = true;

if (isnull)
CopySendString(cstate, cstate->opts.null_print_client);
else
{
string = OutputFunctionCall(&out_functions[attnum - 1],
value);
if (cstate->opts.csv_mode)
CopyAttributeOutCSV(cstate, string,
cstate->opts.force_quote_flags[attnum - 1]);
else
CopyAttributeOutText(cstate, string);
}
}
}
else
{
foreach_int(attnum, cstate->attnumlist)
{
Datum value = slot->tts_values[attnum - 1];
bool isnull = slot->tts_isnull[attnum - 1];
bytea    *outputbytes;

if (isnull)
CopySendInt32(cstate, -1);
else
{
outputbytes = SendFunctionCall(&out_functions[attnum - 1],
value);
CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
CopySendData(cstate, VARDATA(outputbytes),
VARSIZE(outputbytes) - VARHDRSZ);
}
}
}


overall less "if else" logic,
also copy format don't change during copy, we don't need to check
binary or nor for each datum value.

Attachment

Re: refactor the CopyOneRowTo

From
Ilia Evdokimov
Date:
Hi, hackers

I'm sure this refactoring is useful because it's more readable when 
datum value is binary or not.

However, I can see a little improvement. We can declare variable 'bytea 
*outputbytes' in 'else' because variable is used nowhere except this place.


Regards,

Ilia Evdokimov,

Tantor Labs LLC.




Re: refactor the CopyOneRowTo

From
Junwang Zhao
Date:
On Fri, Jul 5, 2024 at 12:26 AM jian he <jian.universality@gmail.com> wrote:
>
> original CopyOneRowTo:
> https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/commands/copyto.c#n922
> I change it to:
> -----------------------
> if (!cstate->opts.binary)
> {
> foreach_int(attnum, cstate->attnumlist)
> {
> Datum value = slot->tts_values[attnum - 1];
> bool isnull = slot->tts_isnull[attnum - 1];
>
> if (need_delim)
> CopySendChar(cstate, cstate->opts.delim[0]);
> need_delim = true;
>
> if (isnull)
> CopySendString(cstate, cstate->opts.null_print_client);
> else
> {
> string = OutputFunctionCall(&out_functions[attnum - 1],
> value);
> if (cstate->opts.csv_mode)
> CopyAttributeOutCSV(cstate, string,
> cstate->opts.force_quote_flags[attnum - 1]);
> else
> CopyAttributeOutText(cstate, string);
> }
> }
> }
> else
> {
> foreach_int(attnum, cstate->attnumlist)
> {
> Datum value = slot->tts_values[attnum - 1];
> bool isnull = slot->tts_isnull[attnum - 1];
> bytea    *outputbytes;
>
> if (isnull)
> CopySendInt32(cstate, -1);
> else
> {
> outputbytes = SendFunctionCall(&out_functions[attnum - 1],
> value);
> CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
> CopySendData(cstate, VARDATA(outputbytes),
> VARSIZE(outputbytes) - VARHDRSZ);
> }
> }
> }
>
>
> overall less "if else" logic,
> also copy format don't change during copy, we don't need to check
> binary or nor for each datum value.

I believe what you proposed is included in the patch 0002
attached in [1], but you use foreach_int, which I think is
an improvement.

[1] https://www.postgresql.org/message-id/20240724.173059.909782980111496972.kou%40clear-code.com


--
Regards
Junwang Zhao



Re: refactor the CopyOneRowTo

From
Heikki Linnakangas
Date:
On 31/07/2024 16:30, Junwang Zhao wrote:
> On Fri, Jul 5, 2024 at 12:26 AM jian he <jian.universality@gmail.com> wrote:
>> overall less "if else" logic,
>> also copy format don't change during copy, we don't need to check
>> binary or nor for each datum value.

Committed, thanks.

For the archives: this code is in a very hot path during COPY TO, so I 
did some quick micro-benchmarking on my laptop. I used this:

COPY (select 

0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10

from generate_series(1, 1_000_000) g) TO '/dev/null';

and

COPY (select 0 from generate_series(1, 1_000_000) g) TO '/dev/null';

to check the impact with few attributes and with many attributes. I 
repeated that a few times with \timing with and without the patch, and 
eyeballed the result. I also used 'perf' to check the fraction of CPU 
time spent in CopyOneRowTo. Conclusion: the patch has no performance impact.

-- 
Heikki Linnakangas
Neon (https://neon.tech)