Re: Split copy.c - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Split copy.c
Date
Msg-id bd19fa6e-d0f4-9009-44cd-6e885a4c7236@iki.fi
Whole thread Raw
In response to Re: Split copy.c  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On 16/11/2020 04:28, Justin Pryzby wrote:
> On Tue, Nov 03, 2020 at 03:15:27PM +1300, David Rowley wrote:
>> On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres@anarazel.de> wrote:
>>>
>>> On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote:
>>>> On 02/11/2020 19:23, Andres Freund wrote:
>>>>> On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote:
>>>>>> There isn't much common code between COPY FROM and COPY TO, so I propose
>>>>>> that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin
>>>>>> that's much nicer.
>>>>>
>>>>> Not quite convinced that's the right split - or perhaps there's just
>>>>> more potential. My feeling is that splitting out all the DML related
>>>>> code would make the code considerably easier to read.
>>>>
>>>> What do you mean by DML related code?
>>>
>>> Basically all the insertion related code (e.g CopyMultiInsert*, lots of
>>> code in CopyFrom()) and perhaps also the type input invocations.
>>
>> I quite like the fact that those are static and inline-able.  I very
>> much imagine there'd be a performance hit if we moved them out to
>> another .c file and made them extern.  Some of those functions can be
>> quite hot when copying into a partitioned table.
> 
> For another patch [0], I moved into copy.h:
> +typedef struct CopyMultiInsertBuffer
> +typedef struct CopyMultiInsertInfo
> +CopyMultiInsertBufferInit(ResultRelInfo *rri)
> +CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
> +CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
> +CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
> +CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
> +CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
> 
> That's an experimental part 0002 of my patch in response to Simon's suggestion.
> Maybe your response will be that variants of those interfaces should be added
> to nodeModifyTable.[ch] instead of moving them.

Nice. I don't think that affects this patch too much.

I would suggest renaming renaming the functions and structs to remove 
the "Copy"-prefix. COPY uses them, but so does INSERT with the patch.

> Currently I'm passing (void*)mtstate as cstate - if there were a
> generic interface, that would be a void *state or so.
The functions only need cstate/mtstate to set the line number, for the 
error callback, and to access the transition_capture field. You could 
add a field for transition_capture in CopyMultiInsertInfo. For the line 
number, you could add a line number field in CopyMultiInsertInfo, set 
that in CopyMultiInsertBufferFlush() instead of cstate->cur_lineno, and 
teach CopyFromErrorCallback() to get the line number from there.

- Heikki



pgsql-hackers by date:

Previous
From: a.pervushina@postgrespro.ru
Date:
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Next
From: Magnus Hagander
Date:
Subject: Re: Online verification of checksums