On Fri, May 2, 2025 at 10:36 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Thursday, May 1, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> In light of these concerns, I've been contemplating alternative
>> interface designs. One promising approach would involve registering
>> custom copy formats via a C function during module loading
>> (specifically, in _PG_init()). This method would require extension
>> authors to invoke a registration function, say
>> RegisterCustomCopyFormat(), in _PG_init() as follows:
>>
>> JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
>> &JsonLinesCopyToRoutine,
>> &JsonLinesCopyFromRoutine);
>>
>> The registration function would validate the format name and store it
>> in TopMemoryContext. It would then return a unique identifier that can
>> be used subsequently to reference the custom copy format extension.
>
>
> How does this fix the search_path concern? Are query writers supposed to put JsonLinesFormatId into their queries?
Orare you just prohibiting a DBA from ever installing an extension that wants to register a format name that is already
registeredso that no namespace is ever required?
Users can specify "jsonlines", passed in the first argument to the
register function, to the COPY FORMAT option in this case. While
JsonLinesFormatId is reserved for internal operations such as module
processing and monitoring, any attempt to load another custom COPY
format module named 'jsonlines' will result in an error.
> ISTM accommodating a namespace for formats is required just like we do for virtually every other named object in the
system. At least, if we want to play nice with extension authors. It doesn’t have to be within the existing pg_proc
scope,we can create a new scope if desired, but abolishing it seems unwise.
>
> It would be more consistent with established policy if we didn’t make exceptions for text/csv/binary - if the DBA
permitsa text format to exist in a different schema and that schema appears first in the search_path, unqualified
referencesto text would resolve to the non-core handler. We already protect ourselves with safe search_paths. This is
reallyno different than if someone wanted to implement a now() function and people are putting pg_catalog from of
existingusage. It’s the DBAs problem, not ours.
I'm concerned about allowing multiple 'text' format implementations
with identical names within the database, as this could lead to
considerable confusion. When users specify 'text', it would be more
logical to guarantee that the built-in 'text' format is consistently
used. This principle aligns with other customizable components, such
as custom resource managers, wait events, lightweight locks, and
custom scans. These components maintain their built-in data/types and
explicitly prevent the registration of duplicate names.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com