Thread: Split copy.c

Split copy.c

From
Heikki Linnakangas
Date:
While looking at the parallel copy patches, it started to annoy me how 
large copy.c is. It confuses my little head. (Ok, it's annoyed me many 
times in the past, but I haven't done anything about it.)

There isn't much common code between COPY FROM and COPY TO, so I propose 
that we split copy.c into two: copyfrom.c and copyto.c. See attached. I 
thin that's much nicer.

Usually when I'm looking at COPY, I'm specifically looking at COPY FROM 
or COPY TO. There is symmetry between the two, like SendCopyBegin() and 
ReceiveCopyBegin(), but more often I drill into the details of either 
COPY FROM or TO. And when I need to look at those symmetric cases, I 
want to open the two functions side-by-side anyway, so they might as 
well be in different files.

There is some code duplication now. BeginCopy() was called by both 
BeginCopyFrom() and BeginCopyTo(). I copied the common parts of it to 
BeginCopyFrom and BeginCopyTo(), but some of the code was only 
applicable to FROM or TO.

I also split/duplicated the CopyStateData struct into CopyFromStateData 
and CopyToStateData. Many of the fields were common, but many were not, 
and I think some duplication is nicer than a struct where you use some 
fields and others are unused. I put the common formatting options into a 
new CopyFormatOptions struct.

Moving code around always makes backpatching bug fixes harder, but I 
think we should bit the bullet. If the parallel COPY work is committed, 
it's going to move things around a lot anyway.

- Heikki

Attachment

Re: Split copy.c

From
Erik Rijkers
Date:
On 2020-11-02 10:03, Heikki Linnakangas wrote:
> While looking at the parallel copy patches, it started to annoy me how
> large copy.c is. It confuses my little head. (Ok, it's annoyed me many
> times in the past, but I haven't done anything about it.)

> [0001-Split-copy.c-into-...o.c-and-copyfrom.c.patch]

There seems to be an oversight of contrib/file_fdw. (debian 10, gcc 
8.3.0)

After:

./configure  
--prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.split_copy 
--bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.split_copy/bin.fast 
--libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.split_copy/lib.fast 
--with-pgport=6973 --quiet --enable-depend    --with-libxml 
--with-libxslt --with-zlib --with-openssl  --enable-tap-tests  
--with-extra-version=_split_copy_1102_90d8


... these errors+warnings from contrib/file_fdw:


-- [2020.11.02 10:31:53 split_copy/1] make contrib
file_fdw.c:108:2: error: unknown type name ‘CopyState’
   CopyState cstate;   /* COPY execution state */
   ^~~~~~~~~
file_fdw.c: In function ‘fileBeginForeignScan’:
file_fdw.c:658:2: error: unknown type name ‘CopyState’; did you mean 
‘CopyToState’?
   CopyState cstate;
   ^~~~~~~~~
   CopyToState
file_fdw.c:680:10: warning: passing argument 3 of ‘BeginCopyFrom’ from 
incompatible pointer type [-Wincompatible-pointer-types]
           filename,
           ^~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:60:76: note: expected ‘Node *’ {aka 
‘struct Node *’} but argument is of type ‘char *’
  extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, 
Node *whereClause,
                                                                       
~~~~~~^~~~~~~~~~~
file_fdw.c:681:10: error: incompatible type for argument 4 of 
‘BeginCopyFrom’
           is_program,
           ^~~~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:61:23: note: expected ‘const char *’ 
but argument is of type ‘_Bool’
            const char *filename,
            ~~~~~~~~~~~~^~~~~~~~
In file included from ../../src/include/access/tupdesc.h:19,
                  from ../../src/include/access/htup_details.h:19,
                  from file_fdw.c:18:
../../src/include/nodes/pg_list.h:65:19: warning: passing argument 6 of 
‘BeginCopyFrom’ from incompatible pointer type 
[-Wincompatible-pointer-types]
  #define NIL      ((List *) NULL)
                   ~^~~~~~~~~~~~~~
file_fdw.c:683:10: note: in expansion of macro ‘NIL’
           NIL,
           ^~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:62:48: note: expected 
‘copy_data_source_cb’ {aka ‘int (*)(void *, int,  int)’} but argument is 
of type ‘List *’ {aka ‘struct List *’}
            bool is_program, copy_data_source_cb data_source_cb, List 
*attnamelist, List *options);
                             ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
file_fdw.c:678:11: error: too few arguments to function ‘BeginCopyFrom’
   cstate = BeginCopyFrom(NULL,
            ^~~~~~~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:60:22: note: declared here
  extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, 
Node *whereClause,
                       ^~~~~~~~~~~~~
file_fdw.c: In function ‘fileIterateForeignScan’:
file_fdw.c:714:20: warning: cast to pointer from integer of different 
size [-Wint-to-pointer-cast]
   errcallback.arg = (void *) festate->cstate;
                     ^
file_fdw.c:731:30: warning: passing argument 1 of ‘NextCopyFrom’ makes 
pointer from integer without a cast [-Wint-conversion]
   found = NextCopyFrom(festate->cstate, NULL,
                        ~~~~~~~^~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:64:40: note: expected ‘CopyFromState’ 
{aka ‘struct CopyFromStateData *’} but argument is of type ‘int’
  extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,

                           ~~~~~~~~~~~~~~^~~~~~
file_fdw.c: In function ‘fileReScanForeignScan’:
file_fdw.c:751:21: warning: passing argument 1 of ‘EndCopyFrom’ makes 
pointer from integer without a cast [-Wint-conversion]
   EndCopyFrom(festate->cstate);
               ~~~~~~~^~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:63:39: note: expected ‘CopyFromState’ 
{aka ‘struct CopyFromStateData *’} but argument is of type ‘int’
  extern void EndCopyFrom(CopyFromState cstate);
                          ~~~~~~~~~~~~~~^~~~~~
file_fdw.c:755:17: warning: passing argument 3 of ‘BeginCopyFrom’ from 
incompatible pointer type [-Wincompatible-pointer-types]
           festate->filename,
           ~~~~~~~^~~~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:60:76: note: expected ‘Node *’ {aka 
‘struct Node *’} but argument is of type ‘char *’
  extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, 
Node *whereClause,
                                                                       
~~~~~~^~~~~~~~~~~
file_fdw.c:756:17: error: incompatible type for argument 4 of 
‘BeginCopyFrom’
           festate->is_program,
           ~~~~~~~^~~~~~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:61:23: note: expected ‘const char *’ 
but argument is of type ‘_Bool’
            const char *filename,
            ~~~~~~~~~~~~^~~~~~~~
In file included from ../../src/include/access/tupdesc.h:19,
                  from ../../src/include/access/htup_details.h:19,
                  from file_fdw.c:18:
../../src/include/nodes/pg_list.h:65:19: warning: passing argument 6 of 
‘BeginCopyFrom’ from incompatible pointer type 
[-Wincompatible-pointer-types]
  #define NIL      ((List *) NULL)
                   ~^~~~~~~~~~~~~~
file_fdw.c:758:10: note: in expansion of macro ‘NIL’
           NIL,
           ^~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:62:48: note: expected 
‘copy_data_source_cb’ {aka ‘int (*)(void *, int,  int)’} but argument is 
of type ‘List *’ {aka ‘struct List *’}
            bool is_program, copy_data_source_cb data_source_cb, List 
*attnamelist, List *options);
                             ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
file_fdw.c:753:20: error: too few arguments to function ‘BeginCopyFrom’
   festate->cstate = BeginCopyFrom(NULL,
                     ^~~~~~~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:60:22: note: declared here
  extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, 
Node *whereClause,
                       ^~~~~~~~~~~~~
file_fdw.c: In function ‘fileEndForeignScan’:
file_fdw.c:773:22: warning: passing argument 1 of ‘EndCopyFrom’ makes 
pointer from integer without a cast [-Wint-conversion]
    EndCopyFrom(festate->cstate);
                ~~~~~~~^~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:63:39: note: expected ‘CopyFromState’ 
{aka ‘struct CopyFromStateData *’} but argument is of type ‘int’
  extern void EndCopyFrom(CopyFromState cstate);
                          ~~~~~~~~~~~~~~^~~~~~
file_fdw.c: In function ‘file_acquire_sample_rows’:


file_fdw.c: In function ‘file_acquire_sample_rows’:
file_fdw.c:1110:2: error: unknown type name ‘CopyState’; did you mean 
‘CopyToState’?
   CopyState cstate;
   ^~~~~~~~~
   CopyToState
file_fdw.c:1128:39: warning: passing argument 3 of ‘BeginCopyFrom’ from 
incompatible pointer type [-Wincompatible-pointer-types]
   cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NULL, NIL,
                                        ^~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:60:76: note: expected ‘Node *’ {aka 
‘struct Node *’} but argument is of type ‘char *’
  extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel,
Node *whereClause,
                                                                       
~~~~~~^~~~~~~~~~~
file_fdw.c:1128:49: error: incompatible type for argument 4 of 
‘BeginCopyFrom’
   cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NULL, NIL,
                                                  ^~~~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:61:23: note: expected ‘const char *’ 
but argument is of type ‘_Bool’
            const char *filename,
            ~~~~~~~~~~~~^~~~~~~~
In file included from ../../src/include/access/tupdesc.h:19,
                  from ../../src/include/access/htup_details.h:19,
                  from file_fdw.c:18:
../../src/include/nodes/pg_list.h:65:19: warning: passing argument 6 of 
‘BeginCopyFrom’ from incompatible pointer type 
[-Wincompatible-pointer-types]
  #define NIL      ((List *) NULL)
                   ~^~~~~~~~~~~~~~
file_fdw.c:1128:67: note: in expansion of macro ‘NIL’
   cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NULL, NIL,
                                                                    ^~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:62:48: note: expected 
‘copy_data_source_cb’ {aka ‘int (*)(void *, int,  int)’} but argument is 
of type ‘List *’ {aka ‘struct List *’}
            bool is_program, copy_data_source_cb data_source_cb, List 
*attnamelist, List *options);
                             ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
file_fdw.c:1128:11: error: too few arguments to function ‘BeginCopyFrom’
   cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NULL, NIL,
            ^~~~~~~~~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:60:22: note: declared here
  extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, 
Node *whereClause,
                       ^~~~~~~~~~~~~
file_fdw.c:1144:20: warning: cast to pointer from integer of different 
size [-Wint-to-pointer-cast]
   errcallback.arg = (void *) cstate;
                     ^
file_fdw.c:1159:24: warning: passing argument 1 of ‘NextCopyFrom’ makes 
pointer from integer without a cast [-Wint-conversion]
    found = NextCopyFrom(cstate, NULL, values, nulls);
                         ^~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:64:40: note: expected ‘CopyFromState’ 
{aka ‘struct CopyFromStateData *’} but argument is of type ‘int’
  extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                           ~~~~~~~~~~~~~~^~~~~~
file_fdw.c:1211:14: warning: passing argument 1 of ‘EndCopyFrom’ makes 
pointer from integer without a cast [-Wint-conversion]
   EndCopyFrom(cstate);
               ^~~~~~
In file included from file_fdw.c:24:
../../src/include/commands/copy.h:63:39: note: expected ‘CopyFromState’ 
{aka ‘struct CopyFromStateData *’} but argument is of type ‘int’
  extern void EndCopyFrom(CopyFromState cstate);
                          ~~~~~~~~~~~~~~^~~~~~
make[1]: *** [../../src/Makefile.global:921: file_fdw.o] Error 1
make: *** [Makefile:95: all-file_fdw-recurse] Error 2
-- contrib make returned 2 - abort





Re: Split copy.c

From
Heikki Linnakangas
Date:
On 02/11/2020 11:36, Erik Rijkers wrote:
> On 2020-11-02 10:03, Heikki Linnakangas wrote:
>> While looking at the parallel copy patches, it started to annoy me how
>> large copy.c is. It confuses my little head. (Ok, it's annoyed me many
>> times in the past, but I haven't done anything about it.)
> 
>> [0001-Split-copy.c-into-...o.c-and-copyfrom.c.patch]
> 
> There seems to be an oversight of contrib/file_fdw. (debian 10, gcc
> 8.3.0)

Ah yeah, I missed updating file_fdw. Here's a new patch.

- Heikki

Attachment

Re: Split copy.c

From
Erik Rijkers
Date:
On 2020-11-02 12:19, Heikki Linnakangas wrote:
> On 02/11/2020 11:36, Erik Rijkers wrote:
>> On 2020-11-02 10:03, Heikki Linnakangas wrote:
>>> While looking at the parallel copy patches, it started to annoy me 
>>> how
>>> large copy.c is. It confuses my little head. (Ok, it's annoyed me 
>>> many
>>> times in the past, but I haven't done anything about it.)
>> 
>>> [0001-Split-copy.c-into-...o.c-and-copyfrom.c.patch]
>> 
>> There seems to be an oversight of contrib/file_fdw. (debian 10, gcc
>> 8.3.0)
> 
> Ah yeah, I missed updating file_fdw. Here's a new patch.

Something still not quite right in the compile-with-assert:

-- [2020.11.02 12:49:12 split_copy/0] make core: make --quiet -j 4   
(speed 0=debug_assertions  speed 1=fast)


In file included from ../../../src/include/postgres.h:46,
                  from copyto.c:15:
copyto.c: In function ‘BeginCopyTo’:
copyto.c:477:11: error: ‘is_from’ undeclared (first use in this 
function); did you mean ‘is_program’?
    Assert(!is_from);
            ^~~~~~~
../../../src/include/c.h:790:9: note: in definition of macro ‘Assert’
    if (!(condition)) \
          ^~~~~~~~~
copyto.c:477:11: note: each undeclared identifier is reported only once 
for each function it appears in
    Assert(!is_from);
            ^~~~~~~
../../../src/include/c.h:790:9: note: in definition of macro ‘Assert’
    if (!(condition)) \
          ^~~~~~~~~
make[3]: *** [../../../src/Makefile.global:921: copyto.o] Error 1
make[2]: *** [common.mk:39: commands-recursive] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2






Re: Split copy.c

From
vignesh C
Date:
On Mon, Nov 2, 2020 at 2:33 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> While looking at the parallel copy patches, it started to annoy me how
> large copy.c is. It confuses my little head. (Ok, it's annoyed me many
> times in the past, but I haven't done anything about it.)
>

+1 for having copy from & copy to functionality in separate files.

This is present in both copyfrom.c & copyto.c, can it be removed from
one place & moved to a common header file?
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";

CopyDest was changed to:
typedef enum CopySource
{
COPY_FILE, /* from file (or a piped program) */
COPY_OLD_FE, /* from frontend (2.0 protocol) */
COPY_NEW_FE, /* from frontend (3.0 protocol) */
COPY_CALLBACK /* from callback function */
} CopySource;

typedef enum CopyDest
{
COPY_FILE, /* to file (or a piped program) */
COPY_OLD_FE, /* to frontend (2.0 protocol) */
COPY_NEW_FE, /* to frontend (3.0 protocol) */
} CopyDest;

Should we have one enum or both are required, if both are required we
could think of naming like COPY_TO_FILE, COPY_FROM_FILE, it will make
it more clearer.

There is one warning while applying the v2 patch:
Applying: Split copy.c into copyto.c and copyfrom.c.
/home/vignesh/postgres/postgres/.git/rebase-apply/patch:909: trailing
whitespace.
warning: 1 line adds whitespace errors.

There is one compilation error, may be this Assert is not required:
copyto.c: In function ‘BeginCopyTo’:
copyto.c:477:11: error: ‘is_from’ undeclared (first use in this function)
   Assert(!is_from);

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Split copy.c

From
Andres Freund
Date:
Hi,

On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote:
> While looking at the parallel copy patches, it started to annoy me how large
> copy.c is.

Agreed.


> There isn't much common code between COPY FROM and COPY TO, so I propose
> that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin
> that's much nicer.

Not quite convinced that's the right split - or perhaps there's just
more potential. My feeling is that splitting out all the DML related
code would make the code considerably easier to read.


Regards,

Andres



Re: Split copy.c

From
Heikki Linnakangas
Date:
On 02/11/2020 19:23, Andres Freund wrote:
> On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote:
>> There isn't much common code between COPY FROM and COPY TO, so I propose
>> that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin
>> that's much nicer.
> 
> Not quite convinced that's the right split - or perhaps there's just
> more potential. My feeling is that splitting out all the DML related
> code would make the code considerably easier to read.

What do you mean by DML related code?

- Heikki



Re: Split copy.c

From
Andres Freund
Date:
On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote:
> On 02/11/2020 19:23, Andres Freund wrote:
> > On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote:
> > > There isn't much common code between COPY FROM and COPY TO, so I propose
> > > that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin
> > > that's much nicer.
> > 
> > Not quite convinced that's the right split - or perhaps there's just
> > more potential. My feeling is that splitting out all the DML related
> > code would make the code considerably easier to read.
> 
> What do you mean by DML related code?

Basically all the insertion related code (e.g CopyMultiInsert*, lots of
code in CopyFrom()) and perhaps also the type input invocations.

- Andres



Re: Split copy.c

From
David Rowley
Date:
On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres@anarazel.de> wrote:
>
> On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote:
> > On 02/11/2020 19:23, Andres Freund wrote:
> > > On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote:
> > > > There isn't much common code between COPY FROM and COPY TO, so I propose
> > > > that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin
> > > > that's much nicer.
> > >
> > > Not quite convinced that's the right split - or perhaps there's just
> > > more potential. My feeling is that splitting out all the DML related
> > > code would make the code considerably easier to read.
> >
> > What do you mean by DML related code?
>
> Basically all the insertion related code (e.g CopyMultiInsert*, lots of
> code in CopyFrom()) and perhaps also the type input invocations.

I quite like the fact that those are static and inline-able.  I very
much imagine there'd be a performance hit if we moved them out to
another .c file and made them extern.  Some of those functions can be
quite hot when copying into a partitioned table.

David



Re: Split copy.c

From
Heikki Linnakangas
Date:
On 03/11/2020 04:15, David Rowley wrote:
> On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote:
>>> On 02/11/2020 19:23, Andres Freund wrote:
>>>> On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote:
>>>>> There isn't much common code between COPY FROM and COPY TO, so I propose
>>>>> that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin
>>>>> that's much nicer.
>>>>
>>>> Not quite convinced that's the right split - or perhaps there's just
>>>> more potential. My feeling is that splitting out all the DML related
>>>> code would make the code considerably easier to read.
>>>
>>> What do you mean by DML related code?
>>
>> Basically all the insertion related code (e.g CopyMultiInsert*, lots of
>> code in CopyFrom()) and perhaps also the type input invocations.

Hmm. COPY FROM consists of two parts:

1. Parse the input text/CSV/binary format into Datums.

2. Store the Datums in the table.

They're somewhat split already. If you want to only do the parsing part, 
you can use the BeginCopyFrom() and NextCopyFrom() functions to get 
Datums, without storing them to a table. file_fdw uses that.

Yeah, that might indeed be another good split point. Attached is quick 
prototype of that. I tried to avoid doing other changes as part of this 
split, but some further refactoring might be good. Like extracting the 
state for the input parsing from CopyFromStateData into a separate struct.

With these patches:

$ wc -l src/backend/commands/copy*.c
    782 src/backend/commands/copy.c
   1641 src/backend/commands/copyfrom.c
   1646 src/backend/commands/copyfromparse.c
   1363 src/backend/commands/copyto.c
   5432 total

> I quite like the fact that those are static and inline-able.  I very
> much imagine there'd be a performance hit if we moved them out to
> another .c file and made them extern.  Some of those functions can be
> quite hot when copying into a partitioned table.

Agreed.

- Heikki

Attachment

Re: Split copy.c

From
Erikjan Rijkers
Date:
On 2020-11-03 08:38, Heikki Linnakangas wrote:

> [v3-0001-Split-copy.c-into-copyto.c-and-copyfrom.c.patch]
> [v3-0002-Split-copyfrom.c-further-into-copyfrom.c-and-copy.patch]

The patches apply ok, but I get these errors:

In file included from ../../../src/include/postgres.h:46,
                  from copyto.c:15:
copyto.c: In function ‘BeginCopyTo’:
copyto.c:477:11: error: ‘is_from’ undeclared (first use in this 
function); did you mean ‘is_program’?
    Assert(!is_from);
            ^~~~~~~
../../../src/include/c.h:790:9: note: in definition of macro ‘Assert’
    if (!(condition)) \
          ^~~~~~~~~
copyto.c:477:11: note: each undeclared identifier is reported only once 
for each function it appears in
    Assert(!is_from);
            ^~~~~~~
../../../src/include/c.h:790:9: note: in definition of macro ‘Assert’
    if (!(condition)) \
          ^~~~~~~~~



Re: Split copy.c

From
Heikki Linnakangas
Date:
On 03/11/2020 10:46, Erikjan Rijkers wrote:
> On 2020-11-03 08:38, Heikki Linnakangas wrote:
> 
>> [v3-0001-Split-copy.c-into-copyto.c-and-copyfrom.c.patch]
>> [v3-0002-Split-copyfrom.c-further-into-copyfrom.c-and-copy.patch]
> 
> The patches apply ok, but I get these errors:

Here's a fixed version. There was a stray Assert that didn't make sense 
anymore.

- Heikki

Attachment

Re: Split copy.c

From
Soumyadeep Chakraborty
Date:
Hey Heikki,

On Tue, Nov 3, 2020 at 1:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Thanks for working on this refactor. LGTM! I had a few minor comments:

1. We should rename the CopyFormatOptions *cstate param in
ProcessCopyOptions() to CopyFormatOptions *options and List *options to
List *raw_options IMO, to make it more readable.

2. We need to update the header comment for Copy{From,To}StateData. It is
currently the old comment from CopyStateData.

3. Can we add docs for the missing fields in the header comment for
BeginCopyFrom()?

4.

> /*
>  * Working state for COPY TO/FROM
>  */
> MemoryContext copycontext; /* per-copy execution context */

Comment needs to be updated for the COPY operation.

5.

> I also split/duplicated the CopyStateData struct into CopyFromStateData
> and CopyToStateData. Many of the fields were common, but many were not,
> and I think some duplication is nicer than a struct where you use some
> fields and others are unused. I put the common formatting options into a
> new CopyFormatOptions struct.

Would we be better off if we sub-struct CopyState <- Copy{From,To}State?
Like this:
typedef struct Copy{From|To}StateData
{
CopyState cs;
// Fields specific to COPY FROM/TO follow..
}

6.

> /* create workspace for CopyReadAttributes results */
> if (!cstate->opts.binary)

Can we replace this if with an else?

Regards,
Soumyadeep (VMware)



Re: Split copy.c

From
Justin Pryzby
Date:
On Tue, Nov 03, 2020 at 03:15:27PM +1300, David Rowley wrote:
> On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote:
> > > On 02/11/2020 19:23, Andres Freund wrote:
> > > > On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote:
> > > > > There isn't much common code between COPY FROM and COPY TO, so I propose
> > > > > that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin
> > > > > that's much nicer.
> > > >
> > > > Not quite convinced that's the right split - or perhaps there's just
> > > > more potential. My feeling is that splitting out all the DML related
> > > > code would make the code considerably easier to read.
> > >
> > > What do you mean by DML related code?
> >
> > Basically all the insertion related code (e.g CopyMultiInsert*, lots of
> > code in CopyFrom()) and perhaps also the type input invocations.
> 
> I quite like the fact that those are static and inline-able.  I very
> much imagine there'd be a performance hit if we moved them out to
> another .c file and made them extern.  Some of those functions can be
> quite hot when copying into a partitioned table.

For another patch [0], I moved into copy.h:
+typedef struct CopyMultiInsertBuffer
+typedef struct CopyMultiInsertInfo
+CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
+CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
+CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
+CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
+CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,

That's an experimental part 0002 of my patch in response to Simon's suggestion.
Maybe your response will be that variants of those interfaces should be added
to nodeModifyTable.[ch] instead of moving them.  Currently I'm passing
(void*)mtstate as cstate - if there were a generic interface, that would be a
void *state or so.

[0] https://commitfest.postgresql.org/30/2553/
    should INSERT SELECT use a BulkInsertState? (and multi_insert)

-- 
Justin



Re: Split copy.c

From
Heikki Linnakangas
Date:
On 16/11/2020 04:28, Justin Pryzby wrote:
> On Tue, Nov 03, 2020 at 03:15:27PM +1300, David Rowley wrote:
>> On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres@anarazel.de> wrote:
>>>
>>> On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote:
>>>> On 02/11/2020 19:23, Andres Freund wrote:
>>>>> On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote:
>>>>>> There isn't much common code between COPY FROM and COPY TO, so I propose
>>>>>> that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin
>>>>>> that's much nicer.
>>>>>
>>>>> Not quite convinced that's the right split - or perhaps there's just
>>>>> more potential. My feeling is that splitting out all the DML related
>>>>> code would make the code considerably easier to read.
>>>>
>>>> What do you mean by DML related code?
>>>
>>> Basically all the insertion related code (e.g CopyMultiInsert*, lots of
>>> code in CopyFrom()) and perhaps also the type input invocations.
>>
>> I quite like the fact that those are static and inline-able.  I very
>> much imagine there'd be a performance hit if we moved them out to
>> another .c file and made them extern.  Some of those functions can be
>> quite hot when copying into a partitioned table.
> 
> For another patch [0], I moved into copy.h:
> +typedef struct CopyMultiInsertBuffer
> +typedef struct CopyMultiInsertInfo
> +CopyMultiInsertBufferInit(ResultRelInfo *rri)
> +CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
> +CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
> +CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
> +CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
> +CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
> 
> That's an experimental part 0002 of my patch in response to Simon's suggestion.
> Maybe your response will be that variants of those interfaces should be added
> to nodeModifyTable.[ch] instead of moving them.

Nice. I don't think that affects this patch too much.

I would suggest renaming renaming the functions and structs to remove 
the "Copy"-prefix. COPY uses them, but so does INSERT with the patch.

> Currently I'm passing (void*)mtstate as cstate - if there were a
> generic interface, that would be a void *state or so.
The functions only need cstate/mtstate to set the line number, for the 
error callback, and to access the transition_capture field. You could 
add a field for transition_capture in CopyMultiInsertInfo. For the line 
number, you could add a line number field in CopyMultiInsertInfo, set 
that in CopyMultiInsertBufferFlush() instead of cstate->cur_lineno, and 
teach CopyFromErrorCallback() to get the line number from there.

- Heikki



Re: Split copy.

From
Heikki Linnakangas
Date:
Thanks for feedback, attached is a new patch version.

On 11/11/2020 21:49, Soumyadeep Chakraborty wrote:
> On Tue, Nov 3, 2020 at 1:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I also split/duplicated the CopyStateData struct into CopyFromStateData
>> and CopyToStateData. Many of the fields were common, but many were not,
>> and I think some duplication is nicer than a struct where you use some
>> fields and others are unused. I put the common formatting options into a
>> new CopyFormatOptions struct.
> 
> Would we be better off if we sub-struct CopyState <- Copy{From,To}State?
> Like this:
> typedef struct Copy{From|To}StateData
> {
> CopyState cs;
> // Fields specific to COPY FROM/TO follow..
> }

Hmm. I don't think that would be better. There isn't actually that much 
in common between CopyFromStateData and CopyToStateData, and a little 
bit of duplication seems better.

> 6.
> 
>> /* create workspace for CopyReadAttributes results */
>> if (!cstate->opts.binary)
> 
> Can we replace this if with an else?

Seems better as it is IMO, the if- and else-branches are not really 
related to each other, even though they both happen to be conditioned on 
cstate->opts.binary.

Fixed all the other things you listed, fixed a bug in setting 
'file_encoding', and trimmed down the #includes.

- Heikki

Attachment

Re: Split copy.

From
Soumyadeep Chakraborty
Date:
On Tue, Nov 17, 2020 at 2:38 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> Thanks for feedback, attached is a new patch version.
>
> On 11/11/2020 21:49, Soumyadeep Chakraborty wrote:
> > On Tue, Nov 3, 2020 at 1:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >> I also split/duplicated the CopyStateData struct into CopyFromStateData
> >> and CopyToStateData. Many of the fields were common, but many were not,
> >> and I think some duplication is nicer than a struct where you use some
> >> fields and others are unused. I put the common formatting options into a
> >> new CopyFormatOptions struct.
> >
> > Would we be better off if we sub-struct CopyState <- Copy{From,To}State?
> > Like this:
> > typedef struct Copy{From|To}StateData
> > {
> > CopyState cs;
> > // Fields specific to COPY FROM/TO follow..
> > }
>
> Hmm. I don't think that would be better. There isn't actually that much
> in common between CopyFromStateData and CopyToStateData, and a little
> bit of duplication seems better.
>

Fair.

> > 6.
> >
> >> /* create workspace for CopyReadAttributes results */
> >> if (!cstate->opts.binary)
> >
> > Can we replace this if with an else?
>
> Seems better as it is IMO, the if- and else-branches are not really
> related to each other, even though they both happen to be conditioned on
> cstate->opts.binary.

Fair.

> Fixed all the other things you listed, fixed a bug in setting
> 'file_encoding', and trimmed down the #includes.
>

Thanks! LGTM! Marking as Ready for Committer.

Regards,
Soumyadeep (VMware)



Re: Split copy.c

From
Heikki Linnakangas
Date:
On 18/11/2020 08:21, Soumyadeep Chakraborty wrote:
> On Tue, Nov 17, 2020 at 2:38 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Fixed all the other things you listed, fixed a bug in setting
>> 'file_encoding', and trimmed down the #includes.
>>
> Thanks! LGTM! Marking as Ready for Committer.

Pushed, thanks for the review!

- Heikki