On Fri, May 9, 2025 at 2:41 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAKFQuwaRDXANaL+QcT6LZRAem4rwkSwv9v+viv_mcR+Rex3quA@mail.gmail.com>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 3 May 2025 22:27:36 -0700,
> "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>
> > In any case, I’m doubtful either of us can make a convincing enough
> > argument to sway the other fully. Both options are plausible, IMO. Others
> > need to chime in.
>
> I may misunderstand but here is the current summary, right?
Thank you for summarizing the discussion.
>
> Proposed approaches to register custom COPY formats:
> a. Create a function that has the same name of custom COPY
> format
> b. Call a register function from _PG_init()
>
> FYI: I proposed c. approach that uses a. but it always
> requires schema name for format name in other e-mail.
With approach (c), do you mean that we require users to change all
FORMAT option values like from 'text' to 'pg_catalog.text' after the
upgrade? Or are we exempt the built-in formats?
>
> Users can register the same format name:
> a. Yes
> * Users can distinct the same format name by schema name
> * If format name doesn't have schema name, the used
> format depends on search_path
> * Pros:
> * Using schema for it is consistent with other
> PostgreSQL mechanisms
> * Custom format never conflict with built-in
> format. For example, an extension register "xml" and
> PostgreSQL adds "xml" later, they are never
> conflicted because PostgreSQL's "xml" is registered
> to pg_catalog.
> * Cons: Different format may be used with the same
> input. For example, "jsonlines" may choose
> "jsonlines" implemented by extension X or implemented
> by extension Y when search_path is different.
> b. No
> * Users can use "${schema}.${name}" for format name
> that mimics PostgreSQL's builtin schema (but it's just
> a string)
>
>
> Built-in formats (text/csv/binary) should be able to
> overwritten by extensions:
> a. (The current patch is no but David's answer is) Yes
> * Pros: Users can use drop-in replacement faster
> implementation without changing input
> * Cons: Users may overwrite them accidentally.
> It may break pg_dump result.
> (This is called as "backward incompatibility.")
> b. No
The summary matches my understanding. I think the second point is
important. If we go with a tablesample-like API, I agree with David's
point that all FORMAT values including the built-in formats should
depend on the search_path value. While it provides a similar user
experience to other database objects, there is a possibility that a
COPY with built-in format could work differently on v19 than v18 or
earlier depending on the search_path value.
> Are there any missing or wrong items?
I think the approach (b) provides more flexibility than (a) in terms
of API design as with (a) we need to do everything based on one
handler function and callbacks.
> If we can summarize
> the current discussion here correctly, others will be able
> to chime in this discussion. (At least I can do it.)
+1
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com