Thread: refactor the CopyOneRowTo
original CopyOneRowTo: https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/commands/copyto.c#n922 I change it to: ----------------------- if (!cstate->opts.binary) { foreach_int(attnum, cstate->attnumlist) { Datum value = slot->tts_values[attnum - 1]; bool isnull = slot->tts_isnull[attnum - 1]; if (need_delim) CopySendChar(cstate, cstate->opts.delim[0]); need_delim = true; if (isnull) CopySendString(cstate, cstate->opts.null_print_client); else { string = OutputFunctionCall(&out_functions[attnum - 1], value); if (cstate->opts.csv_mode) CopyAttributeOutCSV(cstate, string, cstate->opts.force_quote_flags[attnum - 1]); else CopyAttributeOutText(cstate, string); } } } else { foreach_int(attnum, cstate->attnumlist) { Datum value = slot->tts_values[attnum - 1]; bool isnull = slot->tts_isnull[attnum - 1]; bytea *outputbytes; if (isnull) CopySendInt32(cstate, -1); else { outputbytes = SendFunctionCall(&out_functions[attnum - 1], value); CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); CopySendData(cstate, VARDATA(outputbytes), VARSIZE(outputbytes) - VARHDRSZ); } } } overall less "if else" logic, also copy format don't change during copy, we don't need to check binary or nor for each datum value.
Attachment
Hi, hackers I'm sure this refactoring is useful because it's more readable when datum value is binary or not. However, I can see a little improvement. We can declare variable 'bytea *outputbytes' in 'else' because variable is used nowhere except this place. Regards, Ilia Evdokimov, Tantor Labs LLC.
On Fri, Jul 5, 2024 at 12:26 AM jian he <jian.universality@gmail.com> wrote: > > original CopyOneRowTo: > https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/commands/copyto.c#n922 > I change it to: > ----------------------- > if (!cstate->opts.binary) > { > foreach_int(attnum, cstate->attnumlist) > { > Datum value = slot->tts_values[attnum - 1]; > bool isnull = slot->tts_isnull[attnum - 1]; > > if (need_delim) > CopySendChar(cstate, cstate->opts.delim[0]); > need_delim = true; > > if (isnull) > CopySendString(cstate, cstate->opts.null_print_client); > else > { > string = OutputFunctionCall(&out_functions[attnum - 1], > value); > if (cstate->opts.csv_mode) > CopyAttributeOutCSV(cstate, string, > cstate->opts.force_quote_flags[attnum - 1]); > else > CopyAttributeOutText(cstate, string); > } > } > } > else > { > foreach_int(attnum, cstate->attnumlist) > { > Datum value = slot->tts_values[attnum - 1]; > bool isnull = slot->tts_isnull[attnum - 1]; > bytea *outputbytes; > > if (isnull) > CopySendInt32(cstate, -1); > else > { > outputbytes = SendFunctionCall(&out_functions[attnum - 1], > value); > CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); > CopySendData(cstate, VARDATA(outputbytes), > VARSIZE(outputbytes) - VARHDRSZ); > } > } > } > > > overall less "if else" logic, > also copy format don't change during copy, we don't need to check > binary or nor for each datum value. I believe what you proposed is included in the patch 0002 attached in [1], but you use foreach_int, which I think is an improvement. [1] https://www.postgresql.org/message-id/20240724.173059.909782980111496972.kou%40clear-code.com -- Regards Junwang Zhao
On 31/07/2024 16:30, Junwang Zhao wrote: > On Fri, Jul 5, 2024 at 12:26 AM jian he <jian.universality@gmail.com> wrote: >> overall less "if else" logic, >> also copy format don't change during copy, we don't need to check >> binary or nor for each datum value. Committed, thanks. For the archives: this code is in a very hot path during COPY TO, so I did some quick micro-benchmarking on my laptop. I used this: COPY (select 0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9,10 from generate_series(1, 1_000_000) g) TO '/dev/null'; and COPY (select 0 from generate_series(1, 1_000_000) g) TO '/dev/null'; to check the impact with few attributes and with many attributes. I repeated that a few times with \timing with and without the patch, and eyeballed the result. I also used 'perf' to check the fraction of CPU time spent in CopyOneRowTo. Conclusion: the patch has no performance impact. -- Heikki Linnakangas Neon (https://neon.tech)