Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAD21AoBW5dEv=Gd2iF_BYNZGEsF=3KTG7fpq=vP5qwpC1CAOeA@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
On Sun, Nov 24, 2024 at 6:06 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoBNfKDbJnu-zONNpG820ZXYC0fuTSLrJ-UdRqU4qp2wog@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Nov 2024 13:01:06 -0800,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >> @@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
> >>  /*
> >>   * CopyReadLineText - inner loop of CopyReadLine for text mode
> >>   */
> >> -static pg_attribute_always_inline bool
> >> +static bool
> >>  CopyReadLineText(CopyFromState cstate, bool is_csv)
> >>
> >> Is this an intentional change?
> >> CopyReadLineText() has "bool in_csv".
> >
> > Yes, I'm not sure it's really necessary to make it inline since the
> > benchmark results don't show much difference. Probably this is because
> > the function has 'is_csv' in some 'if' branches but the compiler
> > cannot optimize out the whole 'if' branches as most 'if' branches
> > check 'is_csv' and other variables.
>
> I see. If explicit "inline" isn't related to performance, we
> don't need explicit "inline".
>
> > I've attached the v25 patches that squashed the minor changes I made
> > in v24 and incorporated all comments I got so far. I think these two
> > patches are in good shape. Could you rebase remaining patches on top
> > of them so that we can see the big picture of this feature?
>
> OK. I'll work on it.
>
> > Regarding exposing the structs such as CopyToStateData, v22-0004 patch
> > moves most of all copy-related structs to copyapi.h from copyto.c,
> > copyfrom_internal.h, and copy.h, which seems odd to me. I think we can
> > expose CopyToStateData (and related structs) in a new file
> > copyto_internal.h and keep other structs in the original header files.
>
> Custom COPY format extensions need to use
> CopyToStateData/CopyFromStateData. For example,
> CopyToStateData::rel is used to retrieve table schema. If we
> move CopyToStateData to copyto_internal.h not copyapi.h,
> custom COPY format extensions need to include
> copyto_internal.h. I feel that it's strange that extensions
> need to use internal headers.
>
> What is your real concern? If you don't want to export
> CopyToStateData/CopyFromStateData entirely, we can provide
> accessors only for some members of them.

I'm not against exposing CopyToStateData and CopyFromStateData. My
concern is that if we move all copy-related structs to copyapi.h,
other copy-related files would need to include copyapi.h even if the
file is not related to copy format APIs. IMO copyapi.h should have
only copy-format-API-related variables structs such as CopyFromRoutine
and CopyToRoutine and functions that custom COPY format extension can
utilize to access data source and destination, such as CopyGetData().

When it comes to CopyToStateData and CopyFromStateData, I feel that
they have mixed fields of common fields (e.g., rel, num_errors, and
transition_capture) and format-specific fields (e.g., input_buf,
line_buf, and eol_type). While it makes sense to me that custom copy
format extensions can access the common fields, it seems odd to me
that they can access text-and-csv-format-specific fields such as
input_buf. We might want to sort out these fields but it could be a
huge task.

Also, I realized that CopyFromTextLikeOneRow() does input function
calls and handle soft errors based on ON_ERROR and LOG_VERBOSITY
options. I think these should be done in the core side.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Improve the error message for logical replication of regular column to generated column.
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: Amcheck verification of GiST and GIN