Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Sutou Kouhei |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | 20250814.153654.91221343186154858.kou@clear-code.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
Hi, In <CAD21AoBa0Wm3C2H12jaqkvLidP2zEhsC+gf=3w7XiA4LQnvx0g@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 28 Jul 2025 22:19:36 -0700, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > The fields in 1 are mostly static fields, and the fields in 2 and 3 > are likely to be accessed in hot functions during COPY FROM. Would it > be a good idea to restructure these fields so that we can hide the > fields in 1 from callback functions and having the fields in 3 in a > separate format-specific struct that can be accessed via an opaque > pointer? But could the latter change potentially cause performance > overheads? Yes. It changes memory layout (1 continuous memory chunk -> 2 separated memory chunks) and introduces indirect member accesses (x->y -> x->z->y). They may not have performance impact but we need to measure it if we want to use this approach. BTW, how about the following approach? copyapi.h: typedef struct CopyToStateData { /* public members */ /* ... */ } CopyToStateData; copyto.c: typedef struct CopyToStateInternalData { CopyToStateData parent; /* private members */ /* ... */ } CopyToStateInternalData; We export CopyToStateData only with public members. We don't export CopyToStateInternalData that has members only for built-in formats. CopyToStateInternalData has the same memory layout as CopyToStateData. So we can use CopyToStateInternalData as CopyToStateData. We use CopyToStateData not CopyToStateInternalData in public API. We cast CopyToStateData to CopyToStateInternalData when we need to use private members: static void CopySendData(CopyToState cstate, const void *databuf, int datasize) { CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; appendBinaryStringInfo(cstate_internal->fe_msgbuf, databuf, datasize); } It's still direct member access. With this approach, we can keep the same memory layout (1 continuous memory chunk) and direct member access. I think that this approach doesn't have performance impact. See the attached patch for PoC of this approach. Drawback: This approach always allocates CopyToStateInternalData not CopyToStateData. So we need to allocate needless memories for extensions. But this will prevent performance regression of built-in formats. Is it acceptable? Thanks, -- kou From 4f1ee841677774cdc36091066020674d300714db Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Thu, 14 Aug 2025 15:21:50 +0900 Subject: [PATCH] Split CopyToStateData to CopyToStateData and CopyToStateInternalData --- src/backend/commands/copyto.c | 133 +++++++++++++++++---------------- src/include/commands/copyapi.h | 21 ++++++ 2 files changed, 91 insertions(+), 63 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 67b94b91cae..8c58cb18a4d 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -21,7 +21,6 @@ #include "access/tableam.h" #include "commands/copyapi.h" #include "commands/progress.h" -#include "executor/execdesc.h" #include "executor/executor.h" #include "executor/tuptable.h" #include "libpq/libpq.h" @@ -62,40 +61,27 @@ typedef enum CopyDest * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true * when we have to do it the hard way. */ -typedef struct CopyToStateData +typedef struct CopyToStateInternalData { - /* format-specific routines */ - const CopyToRoutine *routine; + struct CopyToStateData parent; /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ FILE *copy_file; /* used if copy_dest == COPY_FILE */ StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - int file_encoding; /* file or remote side's character encoding */ bool need_transcoding; /* file encoding diff from server? */ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - /* * Working state */ MemoryContext copycontext; /* per-copy execution context */ - FmgrInfo *out_functions; /* lookup info for output functions */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; +} CopyToStateInternalData; +typedef struct CopyToStateInternalData *CopyToStateInternal; /* DestReceiver for COPY (query) TO */ typedef struct @@ -189,11 +175,13 @@ CopyToGetRoutine(const CopyFormatOptions *opts) static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc) { + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; + /* * For non-binary copy, we need to convert null_print to file encoding, * because it will be sent directly with CopySendString. */ - if (cstate->need_transcoding) + if (cstate_internal->need_transcoding) cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, cstate->opts.null_print_len, cstate->file_encoding); @@ -390,6 +378,7 @@ CopyToBinaryEnd(CopyToState cstate) static void SendCopyBegin(CopyToState cstate) { + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; StringInfoData buf; int natts = list_length(cstate->attnumlist); int16 format = (cstate->opts.binary ? 1 : 0); @@ -401,14 +390,14 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate_internal->copy_dest = COPY_FRONTEND; } static void SendCopyEnd(CopyToState cstate) { /* Shouldn't have any unsent data */ - Assert(cstate->fe_msgbuf->len == 0); + Assert(((CopyToStateInternal) cstate)->fe_msgbuf->len == 0); /* Send Copy Done message */ pq_putemptymessage(PqMsg_CopyDone); } @@ -426,32 +415,39 @@ SendCopyEnd(CopyToState cstate) static void CopySendData(CopyToState cstate, const void *databuf, int datasize) { - appendBinaryStringInfo(cstate->fe_msgbuf, databuf, datasize); + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; + + appendBinaryStringInfo(cstate_internal->fe_msgbuf, databuf, datasize); } static void CopySendString(CopyToState cstate, const char *str) { - appendBinaryStringInfo(cstate->fe_msgbuf, str, strlen(str)); + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; + + appendBinaryStringInfo(cstate_internal->fe_msgbuf, str, strlen(str)); } static void CopySendChar(CopyToState cstate, char c) { - appendStringInfoCharMacro(cstate->fe_msgbuf, c); + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; + + appendStringInfoCharMacro(cstate_internal->fe_msgbuf, c); } static void CopySendEndOfRow(CopyToState cstate) { - StringInfo fe_msgbuf = cstate->fe_msgbuf; + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; + StringInfo fe_msgbuf = cstate_internal->fe_msgbuf; - switch (cstate->copy_dest) + switch (cstate_internal->copy_dest) { case COPY_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, - cstate->copy_file) != 1 || - ferror(cstate->copy_file)) + cstate_internal->copy_file) != 1 || + ferror(cstate_internal->copy_file)) { if (cstate->is_program) { @@ -492,8 +488,8 @@ CopySendEndOfRow(CopyToState cstate) } /* Update the progress */ - cstate->bytes_processed += fe_msgbuf->len; - pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, cstate->bytes_processed); + cstate_internal->bytes_processed += fe_msgbuf->len; + pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, cstate_internal->bytes_processed); resetStringInfo(fe_msgbuf); } @@ -505,7 +501,9 @@ CopySendEndOfRow(CopyToState cstate) static inline void CopySendTextLikeEndOfRow(CopyToState cstate) { - switch (cstate->copy_dest) + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; + + switch (cstate_internal->copy_dest) { case COPY_FILE: /* Default line termination depends on platform */ @@ -561,11 +559,12 @@ CopySendInt16(CopyToState cstate, int16 val) static void ClosePipeToProgram(CopyToState cstate) { + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; int pclose_rc; Assert(cstate->is_program); - pclose_rc = ClosePipeStream(cstate->copy_file); + pclose_rc = ClosePipeStream(cstate_internal->copy_file); if (pclose_rc == -1) ereport(ERROR, (errcode_for_file_access(), @@ -586,13 +585,15 @@ ClosePipeToProgram(CopyToState cstate) static void EndCopy(CopyToState cstate) { + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; + if (cstate->is_program) { ClosePipeToProgram(cstate); } else { - if (cstate->filename != NULL && FreeFile(cstate->copy_file)) + if (cstate->filename != NULL && FreeFile(cstate_internal->copy_file)) ereport(ERROR, (errcode_for_file_access(), errmsg("could not close file \"%s\": %m", @@ -601,7 +602,7 @@ EndCopy(CopyToState cstate) pgstat_progress_end_command(); - MemoryContextDelete(cstate->copycontext); + MemoryContextDelete(cstate_internal->copycontext); pfree(cstate); } @@ -631,6 +632,7 @@ BeginCopyTo(ParseState *pstate, List *options) { CopyToState cstate; + CopyToStateInternal cstate_internal; bool pipe = (filename == NULL && data_dest_cb == NULL); TupleDesc tupDesc; int num_phys_attrs; @@ -687,17 +689,18 @@ BeginCopyTo(ParseState *pstate, /* Allocate workspace and zero all fields */ - cstate = (CopyToStateData *) palloc0(sizeof(CopyToStateData)); + cstate_internal = (CopyToStateInternal) palloc0(sizeof(CopyToStateInternalData)); + cstate = (CopyToState) cstate_internal; /* * We allocate everything used by a cstate in a new memory context. This * avoids memory leaks during repeated use of COPY in a query. */ - cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext, - "COPY", - ALLOCSET_DEFAULT_SIZES); + cstate_internal->copycontext = AllocSetContextCreate(CurrentMemoryContext, + "COPY", + ALLOCSET_DEFAULT_SIZES); - oldcontext = MemoryContextSwitchTo(cstate->copycontext); + oldcontext = MemoryContextSwitchTo(cstate_internal->copycontext); /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); @@ -895,19 +898,19 @@ BeginCopyTo(ParseState *pstate, */ if (cstate->file_encoding == GetDatabaseEncoding() || cstate->file_encoding == PG_SQL_ASCII) - cstate->need_transcoding = false; + cstate_internal->need_transcoding = false; else - cstate->need_transcoding = true; + cstate_internal->need_transcoding = true; /* See Multibyte encoding comment above */ - cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); + cstate_internal->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); - cstate->copy_dest = COPY_FILE; /* default */ + cstate_internal->copy_dest = COPY_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate_internal->copy_dest = COPY_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) @@ -916,7 +919,7 @@ BeginCopyTo(ParseState *pstate, Assert(!is_program); /* the grammar does not allow this */ if (whereToSendOutput != DestRemote) - cstate->copy_file = stdout; + cstate_internal->copy_file = stdout; } else { @@ -926,8 +929,8 @@ BeginCopyTo(ParseState *pstate, if (is_program) { progress_vals[1] = PROGRESS_COPY_TYPE_PROGRAM; - cstate->copy_file = OpenPipeStream(cstate->filename, PG_BINARY_W); - if (cstate->copy_file == NULL) + cstate_internal->copy_file = OpenPipeStream(cstate->filename, PG_BINARY_W); + if (cstate_internal->copy_file == NULL) ereport(ERROR, (errcode_for_file_access(), errmsg("could not execute command \"%s\": %m", @@ -952,14 +955,14 @@ BeginCopyTo(ParseState *pstate, oumask = umask(S_IWGRP | S_IWOTH); PG_TRY(); { - cstate->copy_file = AllocateFile(cstate->filename, PG_BINARY_W); + cstate_internal->copy_file = AllocateFile(cstate->filename, PG_BINARY_W); } PG_FINALLY(); { umask(oumask); } PG_END_TRY(); - if (cstate->copy_file == NULL) + if (cstate_internal->copy_file == NULL) { /* copy errno because ereport subfunctions might change it */ int save_errno = errno; @@ -973,7 +976,7 @@ BeginCopyTo(ParseState *pstate, "You may want a client-side facility such as psql's \\copy.") : 0)); } - if (fstat(fileno(cstate->copy_file), &st)) + if (fstat(fileno(cstate_internal->copy_file), &st)) ereport(ERROR, (errcode_for_file_access(), errmsg("could not stat file \"%s\": %m", @@ -991,7 +994,7 @@ BeginCopyTo(ParseState *pstate, cstate->rel ? RelationGetRelid(cstate->rel) : InvalidOid); pgstat_progress_update_multi_param(2, progress_cols, progress_vals); - cstate->bytes_processed = 0; + cstate_internal->bytes_processed = 0; MemoryContextSwitchTo(oldcontext); @@ -1025,6 +1028,7 @@ EndCopyTo(CopyToState cstate) uint64 DoCopyTo(CopyToState cstate) { + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; bool pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL); bool fe_copy = (pipe && whereToSendOutput == DestRemote); TupleDesc tupDesc; @@ -1043,7 +1047,7 @@ DoCopyTo(CopyToState cstate) cstate->opts.null_print_client = cstate->opts.null_print; /* default */ /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */ - cstate->fe_msgbuf = makeStringInfo(); + cstate_internal->fe_msgbuf = makeStringInfo(); /* Get info about the columns we need to process. */ cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); @@ -1062,9 +1066,9 @@ DoCopyTo(CopyToState cstate) * datatype output routines, and should be faster than retail pfree's * anyway. (We don't need a whole econtext as CopyFrom does.) */ - cstate->rowcontext = AllocSetContextCreate(CurrentMemoryContext, - "COPY TO", - ALLOCSET_DEFAULT_SIZES); + cstate_internal->rowcontext = AllocSetContextCreate(CurrentMemoryContext, + "COPY TO", + ALLOCSET_DEFAULT_SIZES); cstate->routine->CopyToStart(cstate, tupDesc); @@ -1107,7 +1111,7 @@ DoCopyTo(CopyToState cstate) cstate->routine->CopyToEnd(cstate); - MemoryContextDelete(cstate->rowcontext); + MemoryContextDelete(cstate_internal->rowcontext); if (fe_copy) SendCopyEnd(cstate); @@ -1121,10 +1125,11 @@ DoCopyTo(CopyToState cstate) static inline void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; MemoryContext oldcontext; - MemoryContextReset(cstate->rowcontext); - oldcontext = MemoryContextSwitchTo(cstate->rowcontext); + MemoryContextReset(cstate_internal->rowcontext); + oldcontext = MemoryContextSwitchTo(cstate_internal->rowcontext); /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); @@ -1146,12 +1151,13 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) static void CopyAttributeOutText(CopyToState cstate, const char *string) { + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; const char *ptr; const char *start; char c; char delimc = cstate->opts.delim[0]; - if (cstate->need_transcoding) + if (cstate_internal->need_transcoding) ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding); else ptr = string; @@ -1170,7 +1176,7 @@ CopyAttributeOutText(CopyToState cstate, const char *string) * it's worth making two copies of it to get the IS_HIGHBIT_SET() test out * of the normal safe-encoding path. */ - if (cstate->encoding_embeds_ascii) + if (cstate_internal->encoding_embeds_ascii) { start = ptr; while ((c = *ptr) != '\0') @@ -1300,6 +1306,7 @@ static void CopyAttributeOutCSV(CopyToState cstate, const char *string, bool use_quote) { + CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate; const char *ptr; const char *start; char c; @@ -1312,7 +1319,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string, if (!use_quote && strcmp(string, cstate->opts.null_print) == 0) use_quote = true; - if (cstate->need_transcoding) + if (cstate_internal->need_transcoding) ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding); else ptr = string; @@ -1342,7 +1349,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string, use_quote = true; break; } - if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii) + if (IS_HIGHBIT_SET(c) && cstate_internal->encoding_embeds_ascii) tptr += pg_encoding_mblen(cstate->file_encoding, tptr); else tptr++; @@ -1366,7 +1373,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string, CopySendChar(cstate, escapec); start = ptr; /* we include char in next run */ } - if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii) + if (IS_HIGHBIT_SET(c) && cstate_internal->encoding_embeds_ascii) ptr += pg_encoding_mblen(cstate->file_encoding, ptr); else ptr++; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 2a2d2f9876b..b818e13cd1b 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -15,6 +15,7 @@ #define COPYAPI_H #include "commands/copy.h" +#include "executor/execdesc.h" /* * API structure for a COPY TO format implementation. Note this must be @@ -54,6 +55,26 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +typedef struct CopyToStateData +{ + /* format-specific routines */ + const CopyToRoutine *routine; + + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + QueryDesc *queryDesc; /* executable query to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + + int file_encoding; /* file or remote side's character encoding */ + + CopyFormatOptions opts; + + FmgrInfo *out_functions; /* lookup info for output functions */ +} CopyToStateData; + /* * API structure for a COPY FROM format implementation. Note this must be * allocated in a server-lifetime manner, typically as a static const struct. -- 2.50.0
pgsql-hackers by date: