Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id 20250624.141150.380751236499958086.kou@clear-code.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi,

In <CAD21AoA57owo6qYTPTxOtCjDmcuj1tGL1aGs95cvEnoLQvwF0A@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 24 Jun 2025 11:59:17 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:

>> 1. This provides 2 registration APIs
>>    (RegisterCopy{From,To}Routine(name, routine)) instead of
>>    1 registration API (RegisterCopyFormat(name,
>>    from_routine, to_routine)).
>>
>>    It's for simple implementation and easy to extend without
>>    breaking APIs in the future. (And some formats may
>>    provide only FROM routine or TO routine.)
>>
>>    Is this design acceptable?
> 
> With the single registration API idea, we can register the custom
> format routine that supports only FROM routine using the API like:
> 
> RegisterCopyRoutine("new-format", NewFormatFromRoutine, NULL);
> 
> Compared to this approach, what points do you think having separate
> two registration APIs is preferable in terms of extendability and API
> compatibility?

It's natural to add more related APIs with this
approach. The single registration API provides one feature
by one operation. If we use the RegisterCopyRoutine() for
FROM and TO formats API, it's not natural that we add more
related APIs. In this case, some APIs may provide multiple
features by one operation and other APIs may provide single
feature by one operation. Developers may be confused with
the API. For example, developers may think "what does mean
NULL here?" or "can we use NULL here?" for
"RegisterCopyRoutine("new-format", NewFormatFromRoutine,
NULL)".


>                I think it would be rather confusing for example if
> each COPY TO routine and COPY FROM routine is registered by different
> extensions with the same format name.

Hmm, I don't think so. Who is confused by the case? DBA?
Users who use COPY? Why is it confused?

Do you assume the case that the same name is used for
different format? For example, "json" is used for JSON Lines
for COPY FROM and and normal JSON for COPY TO by different
extensions.

>>    FYI: RegisterCopy{From,To}Routine() uses the same logic
>>    as RegisterExtensionExplainOption().
> 
> I'm concerned that the patch has duplicated logics for the
> registration of COPY FROM and COPY TO.

We can implement a convenient routine that can be used for
RegisterExtensionExplainOption() and
RegisterCopy{From,To}Routine() if it's needed.

>> 3. I want to register the built-in COPY {FROM,TO} routines
>>    in the PostgreSQL initialization phase. Where should we
>>    do it? In 0002, it's done in InitPostgres() but I'm not
>>    sure whether it's a suitable location or not.
> 
> InitPostgres() is not a correct function as it's a process
> initialization function. Probably we don't necessarily need to
> register the built-in formats in the same way as custom formats. A
> simple solution would be to have separate arrays for built-in formats
> and custom formats and have the GetCopy[To|From]Routine() search both
> arrays (built-in array first).

We had a discussion that we should dog-food APIs:


https://www.postgresql.org/message-id/flat/CAKFQuwaCHhrS%2BRE4p_OO6d7WEskd9b86-2cYcvChNkrP%2B7PJ7A%40mail.gmail.com#e6d1cdd04dac53eafe34b784ac47b68b

> We should (and usually do) dog-food APIs when reasonable
> and this situation seems quite reasonable.

In this case, we don't need to dog-food APIs, right?

>> 4. 0002 adds CopyFormatOptions::routine as union:
>>
>>    @@ -87,9 +91,14 @@ typedef struct CopyFormatOptions
>>            CopyLogVerbosityChoice log_verbosity;   /* verbosity of logged messages */
>>            int64           reject_limit;   /* maximum tolerable number of errors */
>>            List       *convert_select; /* list of column names (can be NIL) */
>>    +       union
>>    +       {
>>    +               const struct CopyFromRoutine *from; /* for COPY FROM */
>>    +               const struct CopyToRoutine *to; /* for COPY TO */
>>    +       }                       routine;                /* routine to process the specified format */
>>     } CopyFormatOptions;
>>
>>    Because one of Copy{From,To}Routine is only needed at
>>    once. Is this union usage strange in PostgreSQL?
> 
> I think we can live with having two fields as there are other options
> that are used only in either COPY FROM or COPY TO.

OK.


Thanks,
-- 
kou



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Minor patch; missing comment update in worker.c
Next
From: Amit Kapila
Date:
Subject: Re: Improve doc on parallel stream changes for Stream Abort message