Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAD21AoBrSTmPyDai_QVR-XOe7PL722Dazm70A+FpvGy2hfSV9g@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
List pgsql-hackers
On Fri, May 9, 2025 at 2:41 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAKFQuwaRDXANaL+QcT6LZRAem4rwkSwv9v+viv_mcR+Rex3quA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 3 May 2025 22:27:36 -0700,
>   "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>
> > In any case, I’m doubtful either of us can make a convincing enough
> > argument to sway the other fully.  Both options are plausible, IMO.  Others
> > need to chime in.
>
> I may misunderstand but here is the current summary, right?

Thank you for summarizing the discussion.

>
> Proposed approaches to register custom COPY formats:
> a. Create a function that has the same name of custom COPY
>    format
> b. Call a register function from _PG_init()
>
> FYI: I proposed c. approach that uses a. but it always
> requires schema name for format name in other e-mail.

With approach (c), do you mean that we require users to change all
FORMAT option values like from 'text' to 'pg_catalog.text' after the
upgrade? Or are we exempt the built-in formats?

>
> Users can register the same format name:
> a. Yes
>    * Users can distinct the same format name by schema name
>    * If format name doesn't have schema name, the used
>      format depends on search_path
>      * Pros:
>        * Using schema for it is consistent with other
>          PostgreSQL mechanisms
>        * Custom format never conflict with built-in
>          format. For example, an extension register "xml" and
>          PostgreSQL adds "xml" later, they are never
>          conflicted because PostgreSQL's "xml" is registered
>          to pg_catalog.
>      * Cons: Different format may be used with the same
>        input. For example, "jsonlines" may choose
>        "jsonlines" implemented by extension X or implemented
>        by extension Y when search_path is different.
> b. No
>    * Users can use "${schema}.${name}" for format name
>      that mimics PostgreSQL's builtin schema (but it's just
>      a string)
>
>
> Built-in formats (text/csv/binary) should be able to
> overwritten by extensions:
> a. (The current patch is no but David's answer is) Yes
>    * Pros: Users can use drop-in replacement faster
>      implementation without changing input
>    * Cons: Users may overwrite them accidentally.
>      It may break pg_dump result.
>      (This is called as "backward incompatibility.")
> b. No

The summary matches my understanding. I think the second point is
important. If we go with a tablesample-like API, I agree with David's
point that all FORMAT values including the built-in formats should
depend on the search_path value. While it provides a similar user
experience to other database objects, there is a possibility that a
COPY with built-in format could work differently on v19 than v18 or
earlier depending on the search_path value.

> Are there any missing or wrong items?

I think the approach (b) provides more flexibility than (a) in terms
of API design as with (a) we need to do everything based on one
handler function and callbacks.

> If we can summarize
> the current discussion here correctly, others will be able
> to chime in this discussion. (At least I can do it.)

+1

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: [PATCH] Fix references in comments, and sync up heap_page_is_all_visible() with heap_page_prune_and_freeze()
Next
From: Masahiko Sawada
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations