Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id 20250714.173803.865595983884510428.kou@clear-code.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi,

In <CAD21AoB0Z3gkOGALK3pXYmGTWATVvgDAmn-yXGp2mX64S-YrSw@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 14 Jul 2025 03:28:16 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> I've reviewed the 0001 and 0002 patches. The API implemented in the
> 0002 patch looks good to me, but I'm concerned about the capsulation
> of copy state data. With the v42 patches, we pass the whole
> CopyToStateData to the extension codes, but most of the fields in
> CopyToStateData are internal working state data that shouldn't be
> exposed to extensions. I think we need to sort out which fields are
> exposed or not. That way, it would be safer and we would be able to
> avoid exposing copyto_internal.h and extensions would not need to
> include copyfrom_internal.h.

FYI: We discussed this so far. For example:

https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com

> I think we can move CopyToState to copy.h and we don't
> need to have set/get functions for its fields.

https://www.postgresql.org/message-id/flat/CAD21AoBpWFU4k-_bwrTq0AkFSAdwQqhAsSW188STmu9HxLJ0nQ%40mail.gmail.com

> > What does "private" mean here? I thought that it means that
> > "PostgreSQL itself can use it". But it seems that you mean
> > that "PostgreSQL itself and custom format extensions can use
> > it but other extensions can't use it".
> >
> > I'm not familiar with "_internal.h" in PostgreSQL but is
> > "_internal.h" for the latter "private" mean?
>
> My understanding is that we don't strictly prohibit _internal.h from
> being included in out of core files. For example, file_fdw.c includes
> copyfrom_internal.h in order to access some fields of CopyFromState.


In general, I agree that we should export only needed
information.

How about adding accessors instead of splitting
Copy{From,To}State to Copy{From,To}ExecutionData? If we use
the accessors approach, we can export only needed
information step by step without breaking ABI.

The built-in formats can keep using Copy{From,To}State
directly with the accessors approach. We can avoid any
performance regression of the built-in formats. If we split
Copy{From,To}State to Copy{From,To}ExecutionData,
performance may be changed.


> I've implemented a draft patch for that idea. In the 0001 patch, I
> moved fields that are related to internal working state from
> CopyToStateData to CopyToExectuionData. COPY routine APIs pass a
> pointer of CopyToStateData but extensions can access only fields
> except for CopyToExectuionData. In the 0002 patch, I've implemented
> the registration API and some related APIs based on your v42 patch.
> I've made similar changes to COPY FROM codes too.
> 
> The patch is a very PoC phase and we would need to scrutinize the
> fields that should or should not be exposed. Feedback is very welcome.

Based on our sample extensions [1][2], the following fields
may be minimal. I added "(*)" marks that exist in
Copy{From,To}StateDate in your patch. Other fields exist in
Copy{From,To}ExecutionData. We need to export them to
extensions. We can hide fields in Copy{From,To}StateData not
listed here.

FROM:

- attnumlist (*)
- bytes_processed
- cur_attname
- escontext
- in_functions (*)
- input_buf
- input_reached_eof
- line_buf
- opts (*)
- raw_buf
- raw_buf_index
- raw_buf_len
- rel (*)
- typioparams (*)

TO:

- attnumlist (*)
- fe_msgbuf
- opts (*)

[1] https://github.com/kou/pg-copy-arrow/
[2] https://github.com/MasahikoSawada/pg_copy_jsonlines/


Thanks,
-- 
kou



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Next
From: Dmitry Koval
Date:
Subject: Re: Add SPLIT PARTITION/MERGE PARTITIONS commands