Thread: Split copy.c
While looking at the parallel copy patches, it started to annoy me how large copy.c is. It confuses my little head. (Ok, it's annoyed me many times in the past, but I haven't done anything about it.) There isn't much common code between COPY FROM and COPY TO, so I propose that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin that's much nicer. Usually when I'm looking at COPY, I'm specifically looking at COPY FROM or COPY TO. There is symmetry between the two, like SendCopyBegin() and ReceiveCopyBegin(), but more often I drill into the details of either COPY FROM or TO. And when I need to look at those symmetric cases, I want to open the two functions side-by-side anyway, so they might as well be in different files. There is some code duplication now. BeginCopy() was called by both BeginCopyFrom() and BeginCopyTo(). I copied the common parts of it to BeginCopyFrom and BeginCopyTo(), but some of the code was only applicable to FROM or TO. I also split/duplicated the CopyStateData struct into CopyFromStateData and CopyToStateData. Many of the fields were common, but many were not, and I think some duplication is nicer than a struct where you use some fields and others are unused. I put the common formatting options into a new CopyFormatOptions struct. Moving code around always makes backpatching bug fixes harder, but I think we should bit the bullet. If the parallel COPY work is committed, it's going to move things around a lot anyway. - Heikki
Attachment
On 2020-11-02 10:03, Heikki Linnakangas wrote: > While looking at the parallel copy patches, it started to annoy me how > large copy.c is. It confuses my little head. (Ok, it's annoyed me many > times in the past, but I haven't done anything about it.) > [0001-Split-copy.c-into-...o.c-and-copyfrom.c.patch] There seems to be an oversight of contrib/file_fdw. (debian 10, gcc 8.3.0) After: ./configure --prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.split_copy --bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.split_copy/bin.fast --libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.split_copy/lib.fast --with-pgport=6973 --quiet --enable-depend --with-libxml --with-libxslt --with-zlib --with-openssl --enable-tap-tests --with-extra-version=_split_copy_1102_90d8 ... these errors+warnings from contrib/file_fdw: -- [2020.11.02 10:31:53 split_copy/1] make contrib file_fdw.c:108:2: error: unknown type name ‘CopyState’ CopyState cstate; /* COPY execution state */ ^~~~~~~~~ file_fdw.c: In function ‘fileBeginForeignScan’: file_fdw.c:658:2: error: unknown type name ‘CopyState’; did you mean ‘CopyToState’? CopyState cstate; ^~~~~~~~~ CopyToState file_fdw.c:680:10: warning: passing argument 3 of ‘BeginCopyFrom’ from incompatible pointer type [-Wincompatible-pointer-types] filename, ^~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:60:76: note: expected ‘Node *’ {aka ‘struct Node *’} but argument is of type ‘char *’ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, ~~~~~~^~~~~~~~~~~ file_fdw.c:681:10: error: incompatible type for argument 4 of ‘BeginCopyFrom’ is_program, ^~~~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:61:23: note: expected ‘const char *’ but argument is of type ‘_Bool’ const char *filename, ~~~~~~~~~~~~^~~~~~~~ In file included from ../../src/include/access/tupdesc.h:19, from ../../src/include/access/htup_details.h:19, from file_fdw.c:18: ../../src/include/nodes/pg_list.h:65:19: warning: passing argument 6 of ‘BeginCopyFrom’ from incompatible pointer type [-Wincompatible-pointer-types] #define NIL ((List *) NULL) ~^~~~~~~~~~~~~~ file_fdw.c:683:10: note: in expansion of macro ‘NIL’ NIL, ^~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:62:48: note: expected ‘copy_data_source_cb’ {aka ‘int (*)(void *, int, int)’} but argument is of type ‘List *’ {aka ‘struct List *’} bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options); ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~ file_fdw.c:678:11: error: too few arguments to function ‘BeginCopyFrom’ cstate = BeginCopyFrom(NULL, ^~~~~~~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:60:22: note: declared here extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, ^~~~~~~~~~~~~ file_fdw.c: In function ‘fileIterateForeignScan’: file_fdw.c:714:20: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] errcallback.arg = (void *) festate->cstate; ^ file_fdw.c:731:30: warning: passing argument 1 of ‘NextCopyFrom’ makes pointer from integer without a cast [-Wint-conversion] found = NextCopyFrom(festate->cstate, NULL, ~~~~~~~^~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:64:40: note: expected ‘CopyFromState’ {aka ‘struct CopyFromStateData *’} but argument is of type ‘int’ extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, ~~~~~~~~~~~~~~^~~~~~ file_fdw.c: In function ‘fileReScanForeignScan’: file_fdw.c:751:21: warning: passing argument 1 of ‘EndCopyFrom’ makes pointer from integer without a cast [-Wint-conversion] EndCopyFrom(festate->cstate); ~~~~~~~^~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:63:39: note: expected ‘CopyFromState’ {aka ‘struct CopyFromStateData *’} but argument is of type ‘int’ extern void EndCopyFrom(CopyFromState cstate); ~~~~~~~~~~~~~~^~~~~~ file_fdw.c:755:17: warning: passing argument 3 of ‘BeginCopyFrom’ from incompatible pointer type [-Wincompatible-pointer-types] festate->filename, ~~~~~~~^~~~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:60:76: note: expected ‘Node *’ {aka ‘struct Node *’} but argument is of type ‘char *’ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, ~~~~~~^~~~~~~~~~~ file_fdw.c:756:17: error: incompatible type for argument 4 of ‘BeginCopyFrom’ festate->is_program, ~~~~~~~^~~~~~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:61:23: note: expected ‘const char *’ but argument is of type ‘_Bool’ const char *filename, ~~~~~~~~~~~~^~~~~~~~ In file included from ../../src/include/access/tupdesc.h:19, from ../../src/include/access/htup_details.h:19, from file_fdw.c:18: ../../src/include/nodes/pg_list.h:65:19: warning: passing argument 6 of ‘BeginCopyFrom’ from incompatible pointer type [-Wincompatible-pointer-types] #define NIL ((List *) NULL) ~^~~~~~~~~~~~~~ file_fdw.c:758:10: note: in expansion of macro ‘NIL’ NIL, ^~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:62:48: note: expected ‘copy_data_source_cb’ {aka ‘int (*)(void *, int, int)’} but argument is of type ‘List *’ {aka ‘struct List *’} bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options); ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~ file_fdw.c:753:20: error: too few arguments to function ‘BeginCopyFrom’ festate->cstate = BeginCopyFrom(NULL, ^~~~~~~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:60:22: note: declared here extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, ^~~~~~~~~~~~~ file_fdw.c: In function ‘fileEndForeignScan’: file_fdw.c:773:22: warning: passing argument 1 of ‘EndCopyFrom’ makes pointer from integer without a cast [-Wint-conversion] EndCopyFrom(festate->cstate); ~~~~~~~^~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:63:39: note: expected ‘CopyFromState’ {aka ‘struct CopyFromStateData *’} but argument is of type ‘int’ extern void EndCopyFrom(CopyFromState cstate); ~~~~~~~~~~~~~~^~~~~~ file_fdw.c: In function ‘file_acquire_sample_rows’: file_fdw.c: In function ‘file_acquire_sample_rows’: file_fdw.c:1110:2: error: unknown type name ‘CopyState’; did you mean ‘CopyToState’? CopyState cstate; ^~~~~~~~~ CopyToState file_fdw.c:1128:39: warning: passing argument 3 of ‘BeginCopyFrom’ from incompatible pointer type [-Wincompatible-pointer-types] cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NULL, NIL, ^~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:60:76: note: expected ‘Node *’ {aka ‘struct Node *’} but argument is of type ‘char *’ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, ~~~~~~^~~~~~~~~~~ file_fdw.c:1128:49: error: incompatible type for argument 4 of ‘BeginCopyFrom’ cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NULL, NIL, ^~~~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:61:23: note: expected ‘const char *’ but argument is of type ‘_Bool’ const char *filename, ~~~~~~~~~~~~^~~~~~~~ In file included from ../../src/include/access/tupdesc.h:19, from ../../src/include/access/htup_details.h:19, from file_fdw.c:18: ../../src/include/nodes/pg_list.h:65:19: warning: passing argument 6 of ‘BeginCopyFrom’ from incompatible pointer type [-Wincompatible-pointer-types] #define NIL ((List *) NULL) ~^~~~~~~~~~~~~~ file_fdw.c:1128:67: note: in expansion of macro ‘NIL’ cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NULL, NIL, ^~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:62:48: note: expected ‘copy_data_source_cb’ {aka ‘int (*)(void *, int, int)’} but argument is of type ‘List *’ {aka ‘struct List *’} bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options); ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~ file_fdw.c:1128:11: error: too few arguments to function ‘BeginCopyFrom’ cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NULL, NIL, ^~~~~~~~~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:60:22: note: declared here extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, ^~~~~~~~~~~~~ file_fdw.c:1144:20: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] errcallback.arg = (void *) cstate; ^ file_fdw.c:1159:24: warning: passing argument 1 of ‘NextCopyFrom’ makes pointer from integer without a cast [-Wint-conversion] found = NextCopyFrom(cstate, NULL, values, nulls); ^~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:64:40: note: expected ‘CopyFromState’ {aka ‘struct CopyFromStateData *’} but argument is of type ‘int’ extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, ~~~~~~~~~~~~~~^~~~~~ file_fdw.c:1211:14: warning: passing argument 1 of ‘EndCopyFrom’ makes pointer from integer without a cast [-Wint-conversion] EndCopyFrom(cstate); ^~~~~~ In file included from file_fdw.c:24: ../../src/include/commands/copy.h:63:39: note: expected ‘CopyFromState’ {aka ‘struct CopyFromStateData *’} but argument is of type ‘int’ extern void EndCopyFrom(CopyFromState cstate); ~~~~~~~~~~~~~~^~~~~~ make[1]: *** [../../src/Makefile.global:921: file_fdw.o] Error 1 make: *** [Makefile:95: all-file_fdw-recurse] Error 2 -- contrib make returned 2 - abort
On 02/11/2020 11:36, Erik Rijkers wrote: > On 2020-11-02 10:03, Heikki Linnakangas wrote: >> While looking at the parallel copy patches, it started to annoy me how >> large copy.c is. It confuses my little head. (Ok, it's annoyed me many >> times in the past, but I haven't done anything about it.) > >> [0001-Split-copy.c-into-...o.c-and-copyfrom.c.patch] > > There seems to be an oversight of contrib/file_fdw. (debian 10, gcc > 8.3.0) Ah yeah, I missed updating file_fdw. Here's a new patch. - Heikki
Attachment
On 2020-11-02 12:19, Heikki Linnakangas wrote: > On 02/11/2020 11:36, Erik Rijkers wrote: >> On 2020-11-02 10:03, Heikki Linnakangas wrote: >>> While looking at the parallel copy patches, it started to annoy me >>> how >>> large copy.c is. It confuses my little head. (Ok, it's annoyed me >>> many >>> times in the past, but I haven't done anything about it.) >> >>> [0001-Split-copy.c-into-...o.c-and-copyfrom.c.patch] >> >> There seems to be an oversight of contrib/file_fdw. (debian 10, gcc >> 8.3.0) > > Ah yeah, I missed updating file_fdw. Here's a new patch. Something still not quite right in the compile-with-assert: -- [2020.11.02 12:49:12 split_copy/0] make core: make --quiet -j 4 (speed 0=debug_assertions speed 1=fast) In file included from ../../../src/include/postgres.h:46, from copyto.c:15: copyto.c: In function ‘BeginCopyTo’: copyto.c:477:11: error: ‘is_from’ undeclared (first use in this function); did you mean ‘is_program’? Assert(!is_from); ^~~~~~~ ../../../src/include/c.h:790:9: note: in definition of macro ‘Assert’ if (!(condition)) \ ^~~~~~~~~ copyto.c:477:11: note: each undeclared identifier is reported only once for each function it appears in Assert(!is_from); ^~~~~~~ ../../../src/include/c.h:790:9: note: in definition of macro ‘Assert’ if (!(condition)) \ ^~~~~~~~~ make[3]: *** [../../../src/Makefile.global:921: copyto.o] Error 1 make[2]: *** [common.mk:39: commands-recursive] Error 2 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [Makefile:42: all-backend-recurse] Error 2 make: *** [GNUmakefile:11: all-src-recurse] Error 2
On Mon, Nov 2, 2020 at 2:33 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > While looking at the parallel copy patches, it started to annoy me how > large copy.c is. It confuses my little head. (Ok, it's annoyed me many > times in the past, but I haven't done anything about it.) > +1 for having copy from & copy to functionality in separate files. This is present in both copyfrom.c & copyto.c, can it be removed from one place & moved to a common header file? static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; CopyDest was changed to: typedef enum CopySource { COPY_FILE, /* from file (or a piped program) */ COPY_OLD_FE, /* from frontend (2.0 protocol) */ COPY_NEW_FE, /* from frontend (3.0 protocol) */ COPY_CALLBACK /* from callback function */ } CopySource; typedef enum CopyDest { COPY_FILE, /* to file (or a piped program) */ COPY_OLD_FE, /* to frontend (2.0 protocol) */ COPY_NEW_FE, /* to frontend (3.0 protocol) */ } CopyDest; Should we have one enum or both are required, if both are required we could think of naming like COPY_TO_FILE, COPY_FROM_FILE, it will make it more clearer. There is one warning while applying the v2 patch: Applying: Split copy.c into copyto.c and copyfrom.c. /home/vignesh/postgres/postgres/.git/rebase-apply/patch:909: trailing whitespace. warning: 1 line adds whitespace errors. There is one compilation error, may be this Assert is not required: copyto.c: In function ‘BeginCopyTo’: copyto.c:477:11: error: ‘is_from’ undeclared (first use in this function) Assert(!is_from); Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Hi, On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote: > While looking at the parallel copy patches, it started to annoy me how large > copy.c is. Agreed. > There isn't much common code between COPY FROM and COPY TO, so I propose > that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin > that's much nicer. Not quite convinced that's the right split - or perhaps there's just more potential. My feeling is that splitting out all the DML related code would make the code considerably easier to read. Regards, Andres
On 02/11/2020 19:23, Andres Freund wrote: > On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote: >> There isn't much common code between COPY FROM and COPY TO, so I propose >> that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin >> that's much nicer. > > Not quite convinced that's the right split - or perhaps there's just > more potential. My feeling is that splitting out all the DML related > code would make the code considerably easier to read. What do you mean by DML related code? - Heikki
On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote: > On 02/11/2020 19:23, Andres Freund wrote: > > On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote: > > > There isn't much common code between COPY FROM and COPY TO, so I propose > > > that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin > > > that's much nicer. > > > > Not quite convinced that's the right split - or perhaps there's just > > more potential. My feeling is that splitting out all the DML related > > code would make the code considerably easier to read. > > What do you mean by DML related code? Basically all the insertion related code (e.g CopyMultiInsert*, lots of code in CopyFrom()) and perhaps also the type input invocations. - Andres
On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres@anarazel.de> wrote: > > On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote: > > On 02/11/2020 19:23, Andres Freund wrote: > > > On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote: > > > > There isn't much common code between COPY FROM and COPY TO, so I propose > > > > that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin > > > > that's much nicer. > > > > > > Not quite convinced that's the right split - or perhaps there's just > > > more potential. My feeling is that splitting out all the DML related > > > code would make the code considerably easier to read. > > > > What do you mean by DML related code? > > Basically all the insertion related code (e.g CopyMultiInsert*, lots of > code in CopyFrom()) and perhaps also the type input invocations. I quite like the fact that those are static and inline-able. I very much imagine there'd be a performance hit if we moved them out to another .c file and made them extern. Some of those functions can be quite hot when copying into a partitioned table. David
On 03/11/2020 04:15, David Rowley wrote: > On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres@anarazel.de> wrote: >> >> On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote: >>> On 02/11/2020 19:23, Andres Freund wrote: >>>> On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote: >>>>> There isn't much common code between COPY FROM and COPY TO, so I propose >>>>> that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin >>>>> that's much nicer. >>>> >>>> Not quite convinced that's the right split - or perhaps there's just >>>> more potential. My feeling is that splitting out all the DML related >>>> code would make the code considerably easier to read. >>> >>> What do you mean by DML related code? >> >> Basically all the insertion related code (e.g CopyMultiInsert*, lots of >> code in CopyFrom()) and perhaps also the type input invocations. Hmm. COPY FROM consists of two parts: 1. Parse the input text/CSV/binary format into Datums. 2. Store the Datums in the table. They're somewhat split already. If you want to only do the parsing part, you can use the BeginCopyFrom() and NextCopyFrom() functions to get Datums, without storing them to a table. file_fdw uses that. Yeah, that might indeed be another good split point. Attached is quick prototype of that. I tried to avoid doing other changes as part of this split, but some further refactoring might be good. Like extracting the state for the input parsing from CopyFromStateData into a separate struct. With these patches: $ wc -l src/backend/commands/copy*.c 782 src/backend/commands/copy.c 1641 src/backend/commands/copyfrom.c 1646 src/backend/commands/copyfromparse.c 1363 src/backend/commands/copyto.c 5432 total > I quite like the fact that those are static and inline-able. I very > much imagine there'd be a performance hit if we moved them out to > another .c file and made them extern. Some of those functions can be > quite hot when copying into a partitioned table. Agreed. - Heikki
Attachment
On 2020-11-03 08:38, Heikki Linnakangas wrote: > [v3-0001-Split-copy.c-into-copyto.c-and-copyfrom.c.patch] > [v3-0002-Split-copyfrom.c-further-into-copyfrom.c-and-copy.patch] The patches apply ok, but I get these errors: In file included from ../../../src/include/postgres.h:46, from copyto.c:15: copyto.c: In function ‘BeginCopyTo’: copyto.c:477:11: error: ‘is_from’ undeclared (first use in this function); did you mean ‘is_program’? Assert(!is_from); ^~~~~~~ ../../../src/include/c.h:790:9: note: in definition of macro ‘Assert’ if (!(condition)) \ ^~~~~~~~~ copyto.c:477:11: note: each undeclared identifier is reported only once for each function it appears in Assert(!is_from); ^~~~~~~ ../../../src/include/c.h:790:9: note: in definition of macro ‘Assert’ if (!(condition)) \ ^~~~~~~~~
On 03/11/2020 10:46, Erikjan Rijkers wrote: > On 2020-11-03 08:38, Heikki Linnakangas wrote: > >> [v3-0001-Split-copy.c-into-copyto.c-and-copyfrom.c.patch] >> [v3-0002-Split-copyfrom.c-further-into-copyfrom.c-and-copy.patch] > > The patches apply ok, but I get these errors: Here's a fixed version. There was a stray Assert that didn't make sense anymore. - Heikki
Attachment
Hey Heikki, On Tue, Nov 3, 2020 at 1:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: Thanks for working on this refactor. LGTM! I had a few minor comments: 1. We should rename the CopyFormatOptions *cstate param in ProcessCopyOptions() to CopyFormatOptions *options and List *options to List *raw_options IMO, to make it more readable. 2. We need to update the header comment for Copy{From,To}StateData. It is currently the old comment from CopyStateData. 3. Can we add docs for the missing fields in the header comment for BeginCopyFrom()? 4. > /* > * Working state for COPY TO/FROM > */ > MemoryContext copycontext; /* per-copy execution context */ Comment needs to be updated for the COPY operation. 5. > I also split/duplicated the CopyStateData struct into CopyFromStateData > and CopyToStateData. Many of the fields were common, but many were not, > and I think some duplication is nicer than a struct where you use some > fields and others are unused. I put the common formatting options into a > new CopyFormatOptions struct. Would we be better off if we sub-struct CopyState <- Copy{From,To}State? Like this: typedef struct Copy{From|To}StateData { CopyState cs; // Fields specific to COPY FROM/TO follow.. } 6. > /* create workspace for CopyReadAttributes results */ > if (!cstate->opts.binary) Can we replace this if with an else? Regards, Soumyadeep (VMware)
On Tue, Nov 03, 2020 at 03:15:27PM +1300, David Rowley wrote: > On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres@anarazel.de> wrote: > > > > On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote: > > > On 02/11/2020 19:23, Andres Freund wrote: > > > > On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote: > > > > > There isn't much common code between COPY FROM and COPY TO, so I propose > > > > > that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin > > > > > that's much nicer. > > > > > > > > Not quite convinced that's the right split - or perhaps there's just > > > > more potential. My feeling is that splitting out all the DML related > > > > code would make the code considerably easier to read. > > > > > > What do you mean by DML related code? > > > > Basically all the insertion related code (e.g CopyMultiInsert*, lots of > > code in CopyFrom()) and perhaps also the type input invocations. > > I quite like the fact that those are static and inline-able. I very > much imagine there'd be a performance hit if we moved them out to > another .c file and made them extern. Some of those functions can be > quite hot when copying into a partitioned table. For another patch [0], I moved into copy.h: +typedef struct CopyMultiInsertBuffer +typedef struct CopyMultiInsertInfo +CopyMultiInsertBufferInit(ResultRelInfo *rri) +CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo, +CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo) +CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo, +CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo, +CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri, That's an experimental part 0002 of my patch in response to Simon's suggestion. Maybe your response will be that variants of those interfaces should be added to nodeModifyTable.[ch] instead of moving them. Currently I'm passing (void*)mtstate as cstate - if there were a generic interface, that would be a void *state or so. [0] https://commitfest.postgresql.org/30/2553/ should INSERT SELECT use a BulkInsertState? (and multi_insert) -- Justin
On 16/11/2020 04:28, Justin Pryzby wrote: > On Tue, Nov 03, 2020 at 03:15:27PM +1300, David Rowley wrote: >> On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres@anarazel.de> wrote: >>> >>> On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote: >>>> On 02/11/2020 19:23, Andres Freund wrote: >>>>> On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote: >>>>>> There isn't much common code between COPY FROM and COPY TO, so I propose >>>>>> that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin >>>>>> that's much nicer. >>>>> >>>>> Not quite convinced that's the right split - or perhaps there's just >>>>> more potential. My feeling is that splitting out all the DML related >>>>> code would make the code considerably easier to read. >>>> >>>> What do you mean by DML related code? >>> >>> Basically all the insertion related code (e.g CopyMultiInsert*, lots of >>> code in CopyFrom()) and perhaps also the type input invocations. >> >> I quite like the fact that those are static and inline-able. I very >> much imagine there'd be a performance hit if we moved them out to >> another .c file and made them extern. Some of those functions can be >> quite hot when copying into a partitioned table. > > For another patch [0], I moved into copy.h: > +typedef struct CopyMultiInsertBuffer > +typedef struct CopyMultiInsertInfo > +CopyMultiInsertBufferInit(ResultRelInfo *rri) > +CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo, > +CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo) > +CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo, > +CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo, > +CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri, > > That's an experimental part 0002 of my patch in response to Simon's suggestion. > Maybe your response will be that variants of those interfaces should be added > to nodeModifyTable.[ch] instead of moving them. Nice. I don't think that affects this patch too much. I would suggest renaming renaming the functions and structs to remove the "Copy"-prefix. COPY uses them, but so does INSERT with the patch. > Currently I'm passing (void*)mtstate as cstate - if there were a > generic interface, that would be a void *state or so. The functions only need cstate/mtstate to set the line number, for the error callback, and to access the transition_capture field. You could add a field for transition_capture in CopyMultiInsertInfo. For the line number, you could add a line number field in CopyMultiInsertInfo, set that in CopyMultiInsertBufferFlush() instead of cstate->cur_lineno, and teach CopyFromErrorCallback() to get the line number from there. - Heikki
Thanks for feedback, attached is a new patch version. On 11/11/2020 21:49, Soumyadeep Chakraborty wrote: > On Tue, Nov 3, 2020 at 1:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> I also split/duplicated the CopyStateData struct into CopyFromStateData >> and CopyToStateData. Many of the fields were common, but many were not, >> and I think some duplication is nicer than a struct where you use some >> fields and others are unused. I put the common formatting options into a >> new CopyFormatOptions struct. > > Would we be better off if we sub-struct CopyState <- Copy{From,To}State? > Like this: > typedef struct Copy{From|To}StateData > { > CopyState cs; > // Fields specific to COPY FROM/TO follow.. > } Hmm. I don't think that would be better. There isn't actually that much in common between CopyFromStateData and CopyToStateData, and a little bit of duplication seems better. > 6. > >> /* create workspace for CopyReadAttributes results */ >> if (!cstate->opts.binary) > > Can we replace this if with an else? Seems better as it is IMO, the if- and else-branches are not really related to each other, even though they both happen to be conditioned on cstate->opts.binary. Fixed all the other things you listed, fixed a bug in setting 'file_encoding', and trimmed down the #includes. - Heikki
Attachment
On Tue, Nov 17, 2020 at 2:38 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > Thanks for feedback, attached is a new patch version. > > On 11/11/2020 21:49, Soumyadeep Chakraborty wrote: > > On Tue, Nov 3, 2020 at 1:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> I also split/duplicated the CopyStateData struct into CopyFromStateData > >> and CopyToStateData. Many of the fields were common, but many were not, > >> and I think some duplication is nicer than a struct where you use some > >> fields and others are unused. I put the common formatting options into a > >> new CopyFormatOptions struct. > > > > Would we be better off if we sub-struct CopyState <- Copy{From,To}State? > > Like this: > > typedef struct Copy{From|To}StateData > > { > > CopyState cs; > > // Fields specific to COPY FROM/TO follow.. > > } > > Hmm. I don't think that would be better. There isn't actually that much > in common between CopyFromStateData and CopyToStateData, and a little > bit of duplication seems better. > Fair. > > 6. > > > >> /* create workspace for CopyReadAttributes results */ > >> if (!cstate->opts.binary) > > > > Can we replace this if with an else? > > Seems better as it is IMO, the if- and else-branches are not really > related to each other, even though they both happen to be conditioned on > cstate->opts.binary. Fair. > Fixed all the other things you listed, fixed a bug in setting > 'file_encoding', and trimmed down the #includes. > Thanks! LGTM! Marking as Ready for Committer. Regards, Soumyadeep (VMware)
On 18/11/2020 08:21, Soumyadeep Chakraborty wrote: > On Tue, Nov 17, 2020 at 2:38 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> Fixed all the other things you listed, fixed a bug in setting >> 'file_encoding', and trimmed down the #includes. >> > Thanks! LGTM! Marking as Ready for Committer. Pushed, thanks for the review! - Heikki