Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
| From | Sutou Kouhei | 
|---|---|
| Subject | Re: Make COPY format extendable: Extract COPY TO format implementations | 
| Date | |
| Msg-id | 20250503.111958.55503535810706028.kou@clear-code.com Whole thread Raw | 
| In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Masahiko Sawada <sawada.mshk@gmail.com>) | 
| Responses | Re: Make COPY format extendable: Extract COPY TO format implementations | 
| List | pgsql-hackers | 
Hi,
In <CAD21AoBuEqcz2_+dpA3WTiDUF=FgudPBKwM+nvH+qHT-k4p5mA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 1 May 2025 12:15:30 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> One of the primary considerations we need to address is the treatment
> of the specified format name. The current patch set utilizes built-in
> formats (namely 'csv', 'text', and 'binary') when the format name is
> either unqualified or explicitly specified with 'pg_catalog' as the
> schema. In all other cases, we search for custom format handler
> functions based on the search_path. To be frank, I have reservations
> about this interface design, as the dependence of the specified custom
> format name on the search_path could potentially confuse users.
How about requiring schema for all custom formats?
Valid:
  COPY ... TO ... (FORMAT 'text');
  COPY ... TO ... (FORMAT 'my_schema.jsonlines');
Invalid:
  COPY ... TO ... (FORMAT 'jsonlines'); -- no schema
  COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema
If we require "schema" for all custom formats, we don't need
to depend on search_path.
> In light of these concerns, I've been contemplating alternative
> interface designs. One promising approach would involve registering
> custom copy formats via a C function during module loading
> (specifically, in _PG_init()). This method would require extension
> authors to invoke a registration function, say
> RegisterCustomCopyFormat(), in _PG_init() as follows:
> 
> JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
>                                              &JsonLinesCopyToRoutine,
>                                              &JsonLinesCopyFromRoutine);
> 
> The registration function would validate the format name and store it
> in TopMemoryContext. It would then return a unique identifier that can
> be used subsequently to reference the custom copy format extension.
I don't object the suggested interface because I don't have
a strong opinion how to implement this feature.
Why do we need to assign a unique ID? For performance? For
RegisterCustomCopyFormatOption()?
I think that we don't need to use it so much in COPY. We
don't need to use format name and assigned ID after we
retrieve a corresponding Copy{To,From}Routine. Because all
needed information are in Copy{To,From}Routine.
>          Extensions could register their own options within this
> framework, for example:
> 
> RegisterCustomCopyFormatOption(JsonLinesFormatId,
>     "custom_option",
>     custom_option_handler);
Can we defer to discuss how to add support for custom
options while we focus on the first implementation? Earlier
patch sets with the current approach had custom options
support but it's removed in the first implementation.
(BTW, I think that it's not a good API because we want COPY
FROM only options and COPY TO only options something like
"compression level".)
> This approach offers several advantages: it would eliminate the
> search_path issue, provide greater flexibility, and potentially
> simplify the overall interface for users and developers alike.
What contributes to the "flexibility"? Developers can call
multiple Register* functions in _PG_Init(), right?
Thanks,
-- 
kou
		
	pgsql-hackers by date: