Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAD21AoBW5dEv=Gd2iF_BYNZGEsF=3KTG7fpq=vP5qwpC1CAOeA@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Sutou Kouhei <kou@clear-code.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
On Sun, Nov 24, 2024 at 6:06 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoBNfKDbJnu-zONNpG820ZXYC0fuTSLrJ-UdRqU4qp2wog@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Nov 2024 13:01:06 -0800, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > >> @@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) > >> /* > >> * CopyReadLineText - inner loop of CopyReadLine for text mode > >> */ > >> -static pg_attribute_always_inline bool > >> +static bool > >> CopyReadLineText(CopyFromState cstate, bool is_csv) > >> > >> Is this an intentional change? > >> CopyReadLineText() has "bool in_csv". > > > > Yes, I'm not sure it's really necessary to make it inline since the > > benchmark results don't show much difference. Probably this is because > > the function has 'is_csv' in some 'if' branches but the compiler > > cannot optimize out the whole 'if' branches as most 'if' branches > > check 'is_csv' and other variables. > > I see. If explicit "inline" isn't related to performance, we > don't need explicit "inline". > > > I've attached the v25 patches that squashed the minor changes I made > > in v24 and incorporated all comments I got so far. I think these two > > patches are in good shape. Could you rebase remaining patches on top > > of them so that we can see the big picture of this feature? > > OK. I'll work on it. > > > Regarding exposing the structs such as CopyToStateData, v22-0004 patch > > moves most of all copy-related structs to copyapi.h from copyto.c, > > copyfrom_internal.h, and copy.h, which seems odd to me. I think we can > > expose CopyToStateData (and related structs) in a new file > > copyto_internal.h and keep other structs in the original header files. > > Custom COPY format extensions need to use > CopyToStateData/CopyFromStateData. For example, > CopyToStateData::rel is used to retrieve table schema. If we > move CopyToStateData to copyto_internal.h not copyapi.h, > custom COPY format extensions need to include > copyto_internal.h. I feel that it's strange that extensions > need to use internal headers. > > What is your real concern? If you don't want to export > CopyToStateData/CopyFromStateData entirely, we can provide > accessors only for some members of them. I'm not against exposing CopyToStateData and CopyFromStateData. My concern is that if we move all copy-related structs to copyapi.h, other copy-related files would need to include copyapi.h even if the file is not related to copy format APIs. IMO copyapi.h should have only copy-format-API-related variables structs such as CopyFromRoutine and CopyToRoutine and functions that custom COPY format extension can utilize to access data source and destination, such as CopyGetData(). When it comes to CopyToStateData and CopyFromStateData, I feel that they have mixed fields of common fields (e.g., rel, num_errors, and transition_capture) and format-specific fields (e.g., input_buf, line_buf, and eol_type). While it makes sense to me that custom copy format extensions can access the common fields, it seems odd to me that they can access text-and-csv-format-specific fields such as input_buf. We might want to sort out these fields but it could be a huge task. Also, I realized that CopyFromTextLikeOneRow() does input function calls and handle soft errors based on ON_ERROR and LOG_VERBOSITY options. I think these should be done in the core side. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: