Re: Split copy.c - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Split copy.c
Date
Msg-id 21d9ed60-1269-c1f2-7f89-6bd722d535e2@iki.fi
Whole thread Raw
In response to Re: Split copy.c  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Split copy.c  (Erikjan Rijkers <er@xs4all.nl>)
List pgsql-hackers
On 03/11/2020 04:15, David Rowley wrote:
> On Tue, 3 Nov 2020 at 07:35, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2020-11-02 19:43:38 +0200, Heikki Linnakangas wrote:
>>> On 02/11/2020 19:23, Andres Freund wrote:
>>>> On 2020-11-02 11:03:29 +0200, Heikki Linnakangas wrote:
>>>>> There isn't much common code between COPY FROM and COPY TO, so I propose
>>>>> that we split copy.c into two: copyfrom.c and copyto.c. See attached. I thin
>>>>> that's much nicer.
>>>>
>>>> Not quite convinced that's the right split - or perhaps there's just
>>>> more potential. My feeling is that splitting out all the DML related
>>>> code would make the code considerably easier to read.
>>>
>>> What do you mean by DML related code?
>>
>> Basically all the insertion related code (e.g CopyMultiInsert*, lots of
>> code in CopyFrom()) and perhaps also the type input invocations.

Hmm. COPY FROM consists of two parts:

1. Parse the input text/CSV/binary format into Datums.

2. Store the Datums in the table.

They're somewhat split already. If you want to only do the parsing part, 
you can use the BeginCopyFrom() and NextCopyFrom() functions to get 
Datums, without storing them to a table. file_fdw uses that.

Yeah, that might indeed be another good split point. Attached is quick 
prototype of that. I tried to avoid doing other changes as part of this 
split, but some further refactoring might be good. Like extracting the 
state for the input parsing from CopyFromStateData into a separate struct.

With these patches:

$ wc -l src/backend/commands/copy*.c
    782 src/backend/commands/copy.c
   1641 src/backend/commands/copyfrom.c
   1646 src/backend/commands/copyfromparse.c
   1363 src/backend/commands/copyto.c
   5432 total

> I quite like the fact that those are static and inline-able.  I very
> much imagine there'd be a performance hit if we moved them out to
> another .c file and made them extern.  Some of those functions can be
> quite hot when copying into a partitioned table.

Agreed.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Jinbao Chen
Date:
Subject: Re: Add table AM 'tid_visible'
Next
From: Thomas Munro
Date:
Subject: Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)