Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Junwang Zhao
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAEG8a3LUBcvjwqgt6AijJmg67YN_b_NZ4Kzoxc_dH4rpAq0pKg@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
Hi,

On Mon, Nov 25, 2024 at 2:02 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <20241125.110620.313152541320718947.kou@clear-code.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 25 Nov 2024 11:06:20 +0900 (JST),
>   Sutou Kouhei <kou@clear-code.com> wrote:
>
> >> I've attached the v25 patches that squashed the minor changes I made
> >> in v24 and incorporated all comments I got so far. I think these two
> >> patches are in good shape. Could you rebase remaining patches on top
> >> of them so that we can see the big picture of this feature?
> >
> > OK. I'll work on it.
>
> I've attached the v26 patch set:
>
> 0001: It's same as 0001 in the v25 patch set.
> 0002: It's same as 0002 in the v25 patch set.
> 0003: It's same as 0003 in the v23 patch set.
>       This parses the "format" option and adds support for
>       custom TO handler.
> 0004: It's same as 0004 in the v23 patch set.
>       This exports CopyToStateData. But the following
>       enums/structs/functions aren't moved to copyapi.h from
>       copy.h:
>       * CopyHeaderChoice
>       * CopyOnErrorChoice
>       * CopyLogVerbosityChoice
>       * CopyFormatOptions
>       * copy_data_dest_cb()
> 0005: It's same as 0005 in the v23 patch set.
>       This adds missing APIs to implement custom TO handler
>       as an extension.
> 0006: It's same as 0008 in the v23 patch set.
>       This adds support for custom FROM handler.
> 0007: It's same as 0009 in the v23 patch set.
>       This exports CopyFromStateData.
> 0008: It's same as 0010 in the v23 patch set.
>       This adds missing APIs to implement custom FROM handler
>       as an extension.
>
> I've also updated https://github.com/kou/pg-copy-arrow for
> the current API.
>
> I think that we can merge only 0001/0002 as the first
> step. Because they don't change the current behavior and
> they improve performance. We can merge other patches after
> that.
>
> >> Regarding exposing the structs such as CopyToStateData, v22-0004 patch
> >> moves most of all copy-related structs to copyapi.h from copyto.c,
> >> copyfrom_internal.h, and copy.h, which seems odd to me. I think we can
> >> expose CopyToStateData (and related structs) in a new file
> >> copyto_internal.h and keep other structs in the original header files.
> >
> > Custom COPY format extensions need to use
> > CopyToStateData/CopyFromStateData. For example,
> > CopyToStateData::rel is used to retrieve table schema. If we
> > move CopyToStateData to copyto_internal.h not copyapi.h,
> > custom COPY format extensions need to include
> > copyto_internal.h. I feel that it's strange that extensions
> > need to use internal headers.
> >
> > What is your real concern? If you don't want to export
> > CopyToStateData/CopyFromStateData entirely, we can provide
> > accessors only for some members of them.
>
> The v26 patch set still exports
> CopyToStateData/CopyFromStateData in copyapi.h not
> copy{to,from}_internal.h. But I didn't move the following
> enums/structs/functions:
>
> * CopyHeaderChoice
> * CopyOnErrorChoice
> * CopyLogVerbosityChoice
> * CopyFormatOptions
> * copy_data_dest_cb()
>
> What do you think about this approach?
>
>
> Thanks,
> --
> kou

I just gave this another round of benchmarking tests. I'd like to
share the number,
since COPY TO has some performance drawbacks, I test only COPY TO. I
use the run.sh Tomas provided earlier but use pgbench with a custom script, you
can find it here[0].

I tested 3 branches:

1. the master branch
2. all v26 patch sets applied
3. Emitting JSON to file using COPY TO v13 patch set[1], this add some
if branch in CopyOneRowTo, so I was expecting this slower than master

2 can be about -3%~+3% compared to 1, but what surprised me is that 3
is always better than 1 & 2.

I reviewed the patch set of 3 and I don't see any magic.

You can see the detailed results here[2], I can not upload files so I
just shared the google doc link, ping me if you can not open the link.

[0]: https://github.com/pghacking/scripts/tree/main/extensible_copy
[1]: https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com
[2]: https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing


--
Regards
Junwang Zhao



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Conflict detection for update_deleted in logical replication
Next
From: Kirill Reshke
Date:
Subject: Re: Add Pipelining support in psql