Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id 20250503.111958.55503535810706028.kou@clear-code.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
Hi,

In <CAD21AoBuEqcz2_+dpA3WTiDUF=FgudPBKwM+nvH+qHT-k4p5mA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 1 May 2025 12:15:30 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> One of the primary considerations we need to address is the treatment
> of the specified format name. The current patch set utilizes built-in
> formats (namely 'csv', 'text', and 'binary') when the format name is
> either unqualified or explicitly specified with 'pg_catalog' as the
> schema. In all other cases, we search for custom format handler
> functions based on the search_path. To be frank, I have reservations
> about this interface design, as the dependence of the specified custom
> format name on the search_path could potentially confuse users.

How about requiring schema for all custom formats?

Valid:

  COPY ... TO ... (FORMAT 'text');
  COPY ... TO ... (FORMAT 'my_schema.jsonlines');

Invalid:

  COPY ... TO ... (FORMAT 'jsonlines'); -- no schema
  COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema

If we require "schema" for all custom formats, we don't need
to depend on search_path.

> In light of these concerns, I've been contemplating alternative
> interface designs. One promising approach would involve registering
> custom copy formats via a C function during module loading
> (specifically, in _PG_init()). This method would require extension
> authors to invoke a registration function, say
> RegisterCustomCopyFormat(), in _PG_init() as follows:
> 
> JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
>                                              &JsonLinesCopyToRoutine,
>                                              &JsonLinesCopyFromRoutine);
> 
> The registration function would validate the format name and store it
> in TopMemoryContext. It would then return a unique identifier that can
> be used subsequently to reference the custom copy format extension.

I don't object the suggested interface because I don't have
a strong opinion how to implement this feature.

Why do we need to assign a unique ID? For performance? For
RegisterCustomCopyFormatOption()?

I think that we don't need to use it so much in COPY. We
don't need to use format name and assigned ID after we
retrieve a corresponding Copy{To,From}Routine. Because all
needed information are in Copy{To,From}Routine.

>          Extensions could register their own options within this
> framework, for example:
> 
> RegisterCustomCopyFormatOption(JsonLinesFormatId,
>     "custom_option",
>     custom_option_handler);

Can we defer to discuss how to add support for custom
options while we focus on the first implementation? Earlier
patch sets with the current approach had custom options
support but it's removed in the first implementation.

(BTW, I think that it's not a good API because we want COPY
FROM only options and COPY TO only options something like
"compression level".)

> This approach offers several advantages: it would eliminate the
> search_path issue, provide greater flexibility, and potentially
> simplify the overall interface for users and developers alike.

What contributes to the "flexibility"? Developers can call
multiple Register* functions in _PG_Init(), right?


Thanks,
-- 
kou



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pgsql: Add function to log the memory contexts of specified backend pro
Next
From: Sutou Kouhei
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations