Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Junwang Zhao |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAEG8a3LUBcvjwqgt6AijJmg67YN_b_NZ4Kzoxc_dH4rpAq0pKg@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Sutou Kouhei <kou@clear-code.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
Hi, On Mon, Nov 25, 2024 at 2:02 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <20241125.110620.313152541320718947.kou@clear-code.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 25 Nov 2024 11:06:20 +0900 (JST), > Sutou Kouhei <kou@clear-code.com> wrote: > > >> I've attached the v25 patches that squashed the minor changes I made > >> in v24 and incorporated all comments I got so far. I think these two > >> patches are in good shape. Could you rebase remaining patches on top > >> of them so that we can see the big picture of this feature? > > > > OK. I'll work on it. > > I've attached the v26 patch set: > > 0001: It's same as 0001 in the v25 patch set. > 0002: It's same as 0002 in the v25 patch set. > 0003: It's same as 0003 in the v23 patch set. > This parses the "format" option and adds support for > custom TO handler. > 0004: It's same as 0004 in the v23 patch set. > This exports CopyToStateData. But the following > enums/structs/functions aren't moved to copyapi.h from > copy.h: > * CopyHeaderChoice > * CopyOnErrorChoice > * CopyLogVerbosityChoice > * CopyFormatOptions > * copy_data_dest_cb() > 0005: It's same as 0005 in the v23 patch set. > This adds missing APIs to implement custom TO handler > as an extension. > 0006: It's same as 0008 in the v23 patch set. > This adds support for custom FROM handler. > 0007: It's same as 0009 in the v23 patch set. > This exports CopyFromStateData. > 0008: It's same as 0010 in the v23 patch set. > This adds missing APIs to implement custom FROM handler > as an extension. > > I've also updated https://github.com/kou/pg-copy-arrow for > the current API. > > I think that we can merge only 0001/0002 as the first > step. Because they don't change the current behavior and > they improve performance. We can merge other patches after > that. > > >> Regarding exposing the structs such as CopyToStateData, v22-0004 patch > >> moves most of all copy-related structs to copyapi.h from copyto.c, > >> copyfrom_internal.h, and copy.h, which seems odd to me. I think we can > >> expose CopyToStateData (and related structs) in a new file > >> copyto_internal.h and keep other structs in the original header files. > > > > Custom COPY format extensions need to use > > CopyToStateData/CopyFromStateData. For example, > > CopyToStateData::rel is used to retrieve table schema. If we > > move CopyToStateData to copyto_internal.h not copyapi.h, > > custom COPY format extensions need to include > > copyto_internal.h. I feel that it's strange that extensions > > need to use internal headers. > > > > What is your real concern? If you don't want to export > > CopyToStateData/CopyFromStateData entirely, we can provide > > accessors only for some members of them. > > The v26 patch set still exports > CopyToStateData/CopyFromStateData in copyapi.h not > copy{to,from}_internal.h. But I didn't move the following > enums/structs/functions: > > * CopyHeaderChoice > * CopyOnErrorChoice > * CopyLogVerbosityChoice > * CopyFormatOptions > * copy_data_dest_cb() > > What do you think about this approach? > > > Thanks, > -- > kou I just gave this another round of benchmarking tests. I'd like to share the number, since COPY TO has some performance drawbacks, I test only COPY TO. I use the run.sh Tomas provided earlier but use pgbench with a custom script, you can find it here[0]. I tested 3 branches: 1. the master branch 2. all v26 patch sets applied 3. Emitting JSON to file using COPY TO v13 patch set[1], this add some if branch in CopyOneRowTo, so I was expecting this slower than master 2 can be about -3%~+3% compared to 1, but what surprised me is that 3 is always better than 1 & 2. I reviewed the patch set of 3 and I don't see any magic. You can see the detailed results here[2], I can not upload files so I just shared the google doc link, ping me if you can not open the link. [0]: https://github.com/pghacking/scripts/tree/main/extensible_copy [1]: https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com [2]: https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing -- Regards Junwang Zhao
pgsql-hackers by date: