Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAD21AoBuEqcz2_+dpA3WTiDUF=FgudPBKwM+nvH+qHT-k4p5mA@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Sutou Kouhei <kou@clear-code.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
On Fri, Apr 25, 2025 at 5:45 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > I've updated the patch set. See the attached v40 patch set. > > In <CAD21AoAXzwPC7jjPMTcT80hnzmPa2SUJkiqdYHweEY8sZscEMA@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 23 Apr 2025 23:44:55 -0700, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > >> Are the followings correct? > >> > >> 1. Move invalid input patterns in > >> src/test/modules/test_copy_format/sql/invalid.sql to > >> src/test/regress/sql/copy.sql as much as possible. > >> 2. Create > >> src/test/modules/test_copy_format/sql/test_copy_format.sql > >> and move all contents in existing *.sql to the file. > >> 3. Add comments what the tests expect to > >> src/test/modules/test_copy_format/sql/test_copy_format.sql. > >> 4. Remove CopyFormatOptions::{binary,csv_mode}. > > > > Agreed with the above items. > > Done except 1. because 1. is removed by 3. in the following > list: > > ---- > >> There are 3 unconfirmed suggested changes for tests in: > >> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com > >> > >> Here are my opinions for them: > >> > >> > 1.: There is no difference between single-quoting and > >> > double-quoting here. Because the information what quote > >> > was used for the given FORMAT value isn't remained > >> > here. Should we update gram.y? > >> > > >> > 2.: I don't have a strong opinion for it. If nobody objects > >> > it, I'll remove them. > >> > > >> > 3.: I don't have a strong opinion for it. If nobody objects > >> > it, I'll remove them. > ---- > > 0005 is added for 4. Could you squash 0004 ("Use copy > handler for bult-in formats") and 0005 ("Remove > CopyFormatOptions::{binary,csv_mode}") if needed when you > push? > > >> 6. Use handler OID for detecting the default built-in format > >> instead of comparing the given format as string. > > Done. > > >> 7. Update documentation. > > Could someone help this? 0007 is the draft commit for this. > > >> There are 3 unconfirmed suggested changes for tests in: > >> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com > >> > >> Here are my opinions for them: > >> > >> > 1.: There is no difference between single-quoting and > >> > double-quoting here. Because the information what quote > >> > was used for the given FORMAT value isn't remained > >> > here. Should we update gram.y? > >> > > >> > 2.: I don't have a strong opinion for it. If nobody objects > >> > it, I'll remove them. > >> > > >> > 3.: I don't have a strong opinion for it. If nobody objects > >> > it, I'll remove them. > >> > >> Is the 1. required for "ready for merge"? If so, is there > >> any suggestion? I don't have a strong opinion for it. > >> > >> If there are no more opinions for 2. and 3., I'll remove > >> them. > > > > Agreed. > > 1.: I didn't do anything. Because there is no suggestion. > > 2., 3.: Done. Thank you for updating the patches. One of the primary considerations we need to address is the treatment of the specified format name. The current patch set utilizes built-in formats (namely 'csv', 'text', and 'binary') when the format name is either unqualified or explicitly specified with 'pg_catalog' as the schema. In all other cases, we search for custom format handler functions based on the search_path. To be frank, I have reservations about this interface design, as the dependence of the specified custom format name on the search_path could potentially confuse users. In light of these concerns, I've been contemplating alternative interface designs. One promising approach would involve registering custom copy formats via a C function during module loading (specifically, in _PG_init()). This method would require extension authors to invoke a registration function, say RegisterCustomCopyFormat(), in _PG_init() as follows: JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines", &JsonLinesCopyToRoutine, &JsonLinesCopyFromRoutine); The registration function would validate the format name and store it in TopMemoryContext. It would then return a unique identifier that can be used subsequently to reference the custom copy format extension. Custom copy format modules could be loaded through shared_preload_libraries, session_preload_libraries, or the LOAD command. Extensions could register their own options within this framework, for example: RegisterCustomCopyFormatOption(JsonLinesFormatId, "custom_option", custom_option_handler); This approach offers several advantages: it would eliminate the search_path issue, provide greater flexibility, and potentially simplify the overall interface for users and developers alike. We might be able to provide a view showing the registered custom COPY format in the future. Also, these interfaces align with other customizable functionalities such as custom rmgr, custom lwlock, custom waitevent, and custom EXPLAIN option etc. Feedback is very welcome. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: