Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id 20250624.161009.195478431761017966.kou@clear-code.com
Whole thread Raw
In response to Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
Hi,

In <CAD21AoC8-d=GF-hOvGqUyq2xFg=QGpYfCiWJbcp4wcn0UidrPw@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 24 Jun 2025 15:24:23 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:

>> It's natural to add more related APIs with this
>> approach. The single registration API provides one feature
>> by one operation. If we use the RegisterCopyRoutine() for
>> FROM and TO formats API, it's not natural that we add more
>> related APIs. In this case, some APIs may provide multiple
>> features by one operation and other APIs may provide single
>> feature by one operation. Developers may be confused with
>> the API. For example, developers may think "what does mean
>> NULL here?" or "can we use NULL here?" for
>> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
>> NULL)".
> 
> We can document it in the comment for the registration function.

I think that API that can be understandable without the
additional note is better API than API that needs some
notes.

Why do you suggest the RegisterCopyRoutine("new-format",
NewFormatFromRoutine, NewFormatToRoutine) API? You want to
remove the duplicated codes in
RegisterCopy{From,To}Routine(), right? I think that we can
do it by creating a convenient function that has the
duplicated codes extracted from
RegisterCopy{From,To}Routine() and
RegisterExtensionExplainOption().

BTW, what do you think about my answer (one feature by one
operation API is more extendable API) for your question
(extendability and API compatibility)?

> Suppose that extension-A implements only CopyToRoutine for the
> custom-format-X with the format name 'myformat' and extension-B
> implements only CopyFromRoutine for the custom-format-Y with the same
> name, if users load both extension-A and extension-B, it seems to me
> that extension-A registers the custom-format-X format as 'myformat'
> only with CopyToRoutine, and extension-B overwrites the 'myformat'
> registration by adding custom-format-Y's CopyFromRoutine. However, if
> users register extension-C that implements both routines with the
> format name 'myformat', they can register neither extension-A nor
> extension-B, which seems to me that we don't allow overwriting the
> registration in this case.

Do you assume that users use extension-A, extension-B and
extension-C without reading their documentation? If users
read their documentation before users use them, users can
know all of them use the same format name 'myformat' and
which extension provides Copy{From,To}Routine.

In this case, these users (who don't read documentation)
will be confused with the RegisterCopyRoutine("new-format",
NewFormatFromRoutine, NewFormatToRoutine) API too. Do we
really need to care about this case?

> I think the core issue appears to be the internal management of custom
> format entries but the current patch does enable registration
> overwriting in the former case (extension-A and extension-B case).

This is the same behavior as existing custom EXPLAIN option
implementation. Should we use different behavior here?

>> >>    FYI: RegisterCopy{From,To}Routine() uses the same logic
>> >>    as RegisterExtensionExplainOption().
>> >
>> > I'm concerned that the patch has duplicated logics for the
>> > registration of COPY FROM and COPY TO.
>>
>> We can implement a convenient routine that can be used for
>> RegisterExtensionExplainOption() and
>> RegisterCopy{From,To}Routine() if it's needed.
> 
> I meant there are duplicated codes in COPY FROM and COPY TO. For
> instance, RegisterCopyFromRoutine() and RegisterCopyToRoutine() have
> the same logic.

Yes, I understand it. I wanted to say that we can remove the
duplicated codes by introducing a RegisterSomething()
function that can be used by
RegisterExtensionExplainOption() and
RegisterCopy{From,To}Routine():

void
RegisterSomething(...)
{
  /* Common codes in RegisterExtensionExplainOption() and
     RegisterCopy{From,To}Routine()
     ...
   */
}

void
RegisterExtensionExplainOption(...)
{
  RegisterSomething(...);
}

void
RegisterCopyFromRoutine(...)
{
  RegisterSomething(...);
}

void
RegisterCopyToRoutine(...)
{
  RegisterSomething(...);
}

You think that this approach can't remove the duplicated
codes, right?

>> > InitPostgres() is not a correct function as it's a process
>> > initialization function. Probably we don't necessarily need to
>> > register the built-in formats in the same way as custom formats. A
>> > simple solution would be to have separate arrays for built-in formats
>> > and custom formats and have the GetCopy[To|From]Routine() search both
>> > arrays (built-in array first).
>>
>> We had a discussion that we should dog-food APIs:
>>
>>
https://www.postgresql.org/message-id/flat/CAKFQuwaCHhrS%2BRE4p_OO6d7WEskd9b86-2cYcvChNkrP%2B7PJ7A%40mail.gmail.com#e6d1cdd04dac53eafe34b784ac47b68b
>>
>> > We should (and usually do) dog-food APIs when reasonable
>> > and this situation seems quite reasonable.
>>
>> In this case, we don't need to dog-food APIs, right?
> 
> Yes, I think so.

OK. I don't have a strong opinion for it. If nobody objects
it, I'll do it when I update the patch set.


Thanks,
-- 
kou



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Improve doc on parallel stream changes for Stream Abort message
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: Simplify VM counters in vacuum code