Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAD21AoB0Z3gkOGALK3pXYmGTWATVvgDAmn-yXGp2mX64S-YrSw@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
On Mon, Jun 30, 2025 at 3:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jun 25, 2025 at 4:35 PM Sutou Kouhei <kou@clear-code.com> wrote:
> >
> > Hi,
> >
> > In <CAD21AoC19fV5Ujs-1r24MNU+hwTQUeZMEnaJDjSFwHLMMdFi0Q@mail.gmail.com>
> >   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 25 Jun 2025 00:48:46 +0900,
> >   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > >> >> It's natural to add more related APIs with this
> > >> >> approach. The single registration API provides one feature
> > >> >> by one operation. If we use the RegisterCopyRoutine() for
> > >> >> FROM and TO formats API, it's not natural that we add more
> > >> >> related APIs. In this case, some APIs may provide multiple
> > >> >> features by one operation and other APIs may provide single
> > >> >> feature by one operation. Developers may be confused with
> > >> >> the API. For example, developers may think "what does mean
> > >> >> NULL here?" or "can we use NULL here?" for
> > >> >> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
> > >> >> NULL)".
> > >> >
> > >> > We can document it in the comment for the registration function.
> > >>
> > >> I think that API that can be understandable without the
> > >> additional note is better API than API that needs some
> > >> notes.
> > >
> > > I don't see much difference in this case.
> >
> > OK. It seems that we can't agree on which API is better.
> >
> > I've implemented your idea as the v42 patch set. Can we
> > proceed this proposal with this approach? What is the next
> > step?
>
> I'll review the patches.

I've reviewed the 0001 and 0002 patches. The API implemented in the
0002 patch looks good to me, but I'm concerned about the capsulation
of copy state data. With the v42 patches, we pass the whole
CopyToStateData to the extension codes, but most of the fields in
CopyToStateData are internal working state data that shouldn't be
exposed to extensions. I think we need to sort out which fields are
exposed or not. That way, it would be safer and we would be able to
avoid exposing copyto_internal.h and extensions would not need to
include copyfrom_internal.h.

I've implemented a draft patch for that idea. In the 0001 patch, I
moved fields that are related to internal working state from
CopyToStateData to CopyToExectuionData. COPY routine APIs pass a
pointer of CopyToStateData but extensions can access only fields
except for CopyToExectuionData. In the 0002 patch, I've implemented
the registration API and some related APIs based on your v42 patch.
I've made similar changes to COPY FROM codes too.

The patch is a very PoC phase and we would need to scrutinize the
fields that should or should not be exposed. Feedback is very welcome.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: AIO v2.5
Next
From: Andrey Borodin
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)