Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAKFQuwY9ZNN-vgW8JnL=GV7UFVPWFzqeqVqEQN81NXQF+pAb7w@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
On Friday, March 21, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Mar 21, 2025 at 5:32 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <kou@clear-code.com> wrote:
>>
>> Hi,
>>
>> In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA@mail.gmail.com>
>>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Mar 2025 13:50:03 -0700,
>>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> > I think that built-in formats also need to have their handler
>> > functions. This seems to be a conventional way for customizable
>> > features such as tablesample and access methods, and we can simplify
>> > this function.
>>
>> OK. 0008 in the attached v37 patch set does it.
>>
>
> tl/dr;
>
> We need to exclude from our SQL function search any function that doesn't declare copy_handler as its return type.
> ("function text must return type copy_handler" is not an acceptable error message)
>
> We need to accept identifiers in FORMAT and parse the optional catalog, schema, and object name portions.
> (Restrict our function search to the named schema if provided.)
>
> Detail:
>
> Fun thing...(not sure how much of this is covered above: I do see, but didn't scour, the security discussion):
>
> -- create some poison
> create function public.text(internal) returns boolean language c as '/home/davidj/gotya/gotya', 'gotit';
> CREATE FUNCTION
>
> -- inject it
> postgres=# set search_path to public,pg_catalog;
> SET
>
> -- watch it die
> postgres=# copy (select 1) to stdout (format text);
> ERROR:  function text must return type copy_handler
> LINE 1: copy (select 1) to stdout (format text);
>
> I'm especially concerned about extensions here.
>
> We shouldn't be locating any SQL function that doesn't have a copy_handler return type.  Unfortunately, LookupFuncName seems incapable of doing what we want here.  I suggest we create a new lookup routine where we can specify the return argument type as a required element.  That would cleanly mitigate the denial-of-service attack/accident vector demonstrated above (the text returning function should have zero impact on how this feature behaves).  If someone does create a handler SQL function without using copy_handler return type we'd end up showing "COPY format 'david' not recognized" - a developer should be able to figure out they didn't put a correct return type on their handler function and that is why the system did not register it.

Just to be clear, the patch checks the function's return type before calling it:

       funcargtypes[0] = INTERNALOID;
       handlerOid = LookupFuncName(list_make1(makeString(format)), 1,

funcargtypes, true);
       if (!OidIsValid(handlerOid))
               ereport(ERROR,
                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                                errmsg("COPY format \"%s\" not
recognized", format),
                                parser_errposition(pstate, defel->location)));

       /* check that handler has correct return type */
       if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
               ereport(ERROR,
                               (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                                errmsg("function %s must return type %s",
                                               format, "copy_handler"),
                                parser_errposition(pstate, defel->location)));

So would changing the error message to like "COPY format 'text' not
recognized" untangle your concern?

In my example above copy should not fail at all.  The text function created in public that returns Boolean would never be seen and the real one in pg_catalog would then be found and behave as expected.
 

FYI the same is true for TABLESAMPLE; it invokes a function with the
specified method name and checks the returned Node type:

=# select * from pg_class tablesample text (0);
ERROR:  function text must return type tsm_handler

Then this would benefit from the new function I suggest creating since it apparently has the same, IMO, bug.

David J.

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Next
From: Michael Paquier
Date:
Subject: Re: Proposal - Allow extensions to set a Plan Identifier