Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Sutou Kouhei |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | 20250714.173803.865595983884510428.kou@clear-code.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
Hi, In <CAD21AoB0Z3gkOGALK3pXYmGTWATVvgDAmn-yXGp2mX64S-YrSw@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 14 Jul 2025 03:28:16 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I've reviewed the 0001 and 0002 patches. The API implemented in the > 0002 patch looks good to me, but I'm concerned about the capsulation > of copy state data. With the v42 patches, we pass the whole > CopyToStateData to the extension codes, but most of the fields in > CopyToStateData are internal working state data that shouldn't be > exposed to extensions. I think we need to sort out which fields are > exposed or not. That way, it would be safer and we would be able to > avoid exposing copyto_internal.h and extensions would not need to > include copyfrom_internal.h. FYI: We discussed this so far. For example: https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com > I think we can move CopyToState to copy.h and we don't > need to have set/get functions for its fields. https://www.postgresql.org/message-id/flat/CAD21AoBpWFU4k-_bwrTq0AkFSAdwQqhAsSW188STmu9HxLJ0nQ%40mail.gmail.com > > What does "private" mean here? I thought that it means that > > "PostgreSQL itself can use it". But it seems that you mean > > that "PostgreSQL itself and custom format extensions can use > > it but other extensions can't use it". > > > > I'm not familiar with "_internal.h" in PostgreSQL but is > > "_internal.h" for the latter "private" mean? > > My understanding is that we don't strictly prohibit _internal.h from > being included in out of core files. For example, file_fdw.c includes > copyfrom_internal.h in order to access some fields of CopyFromState. In general, I agree that we should export only needed information. How about adding accessors instead of splitting Copy{From,To}State to Copy{From,To}ExecutionData? If we use the accessors approach, we can export only needed information step by step without breaking ABI. The built-in formats can keep using Copy{From,To}State directly with the accessors approach. We can avoid any performance regression of the built-in formats. If we split Copy{From,To}State to Copy{From,To}ExecutionData, performance may be changed. > I've implemented a draft patch for that idea. In the 0001 patch, I > moved fields that are related to internal working state from > CopyToStateData to CopyToExectuionData. COPY routine APIs pass a > pointer of CopyToStateData but extensions can access only fields > except for CopyToExectuionData. In the 0002 patch, I've implemented > the registration API and some related APIs based on your v42 patch. > I've made similar changes to COPY FROM codes too. > > The patch is a very PoC phase and we would need to scrutinize the > fields that should or should not be exposed. Feedback is very welcome. Based on our sample extensions [1][2], the following fields may be minimal. I added "(*)" marks that exist in Copy{From,To}StateDate in your patch. Other fields exist in Copy{From,To}ExecutionData. We need to export them to extensions. We can hide fields in Copy{From,To}StateData not listed here. FROM: - attnumlist (*) - bytes_processed - cur_attname - escontext - in_functions (*) - input_buf - input_reached_eof - line_buf - opts (*) - raw_buf - raw_buf_index - raw_buf_len - rel (*) - typioparams (*) TO: - attnumlist (*) - fe_msgbuf - opts (*) [1] https://github.com/kou/pg-copy-arrow/ [2] https://github.com/MasahikoSawada/pg_copy_jsonlines/ Thanks, -- kou
pgsql-hackers by date: