Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAD21AoCjK=NC62Z+VUip5-rc_rZz-6h9Dqacxo5G5PdK4kKKwQ@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
On Fri, Mar 21, 2025 at 5:32 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <kou@clear-code.com> wrote:
>>
>> Hi,
>>
>> In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA@mail.gmail.com>
>>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Mar 2025 13:50:03 -0700,
>>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> > I think that built-in formats also need to have their handler
>> > functions. This seems to be a conventional way for customizable
>> > features such as tablesample and access methods, and we can simplify
>> > this function.
>>
>> OK. 0008 in the attached v37 patch set does it.
>>
>
> tl/dr;
>
> We need to exclude from our SQL function search any function that doesn't declare copy_handler as its return type.
> ("function text must return type copy_handler" is not an acceptable error message)
>
> We need to accept identifiers in FORMAT and parse the optional catalog, schema, and object name portions.
> (Restrict our function search to the named schema if provided.)
>
> Detail:
>
> Fun thing...(not sure how much of this is covered above: I do see, but didn't scour, the security discussion):
>
> -- create some poison
> create function public.text(internal) returns boolean language c as '/home/davidj/gotya/gotya', 'gotit';
> CREATE FUNCTION
>
> -- inject it
> postgres=# set search_path to public,pg_catalog;
> SET
>
> -- watch it die
> postgres=# copy (select 1) to stdout (format text);
> ERROR:  function text must return type copy_handler
> LINE 1: copy (select 1) to stdout (format text);
>
> I'm especially concerned about extensions here.
>
> We shouldn't be locating any SQL function that doesn't have a copy_handler return type.  Unfortunately,
LookupFuncNameseems incapable of doing what we want here.  I suggest we create a new lookup routine where we can
specifythe return argument type as a required element.  That would cleanly mitigate the denial-of-service
attack/accidentvector demonstrated above (the text returning function should have zero impact on how this feature
behaves). If someone does create a handler SQL function without using copy_handler return type we'd end up showing
"COPYformat 'david' not recognized" - a developer should be able to figure out they didn't put a correct return type on
theirhandler function and that is why the system did not register it. 

Just to be clear, the patch checks the function's return type before calling it:

       funcargtypes[0] = INTERNALOID;
       handlerOid = LookupFuncName(list_make1(makeString(format)), 1,

funcargtypes, true);
       if (!OidIsValid(handlerOid))
               ereport(ERROR,
                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                                errmsg("COPY format \"%s\" not
recognized", format),
                                parser_errposition(pstate, defel->location)));

       /* check that handler has correct return type */
       if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
               ereport(ERROR,
                               (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                                errmsg("function %s must return type %s",
                                               format, "copy_handler"),
                                parser_errposition(pstate, defel->location)));

So would changing the error message to like "COPY format 'text' not
recognized" untangle your concern?

FYI the same is true for TABLESAMPLE; it invokes a function with the
specified method name and checks the returned Node type:

=# select * from pg_class tablesample text (0);
ERROR:  function text must return type tsm_handler

A difference between TABLESAMPLE and COPY format is that the former
accepts a qualified name but the latter doesn't:

=# create extension tsm_system_rows ;
=# create schema s1;
=# create function s1.system_rows(internal) returns void language c as
'tsm_system_rows.so', 'tsm_system_rows_handler';
=# \df *.system_rows
                          List of functions
 Schema |    Name     | Result data type | Argument data types | Type
--------+-------------+------------------+---------------------+------
 public | system_rows | tsm_handler      | internal            | func
 s1     | system_rows | void             | internal            | func
(2 rows)
postgres(1:1194923)=# select count(*) from pg_class tablesample system_rows(0);
 count
-------
     0
(1 row)

postgres(1:1194923)=# select count(*) from pg_class tablesample
s1.system_rows(0);
ERROR:  function s1.system_rows must return type tsm_handler

> A second concern is simply people wanting to name things the same; or, why namespaces were invented.

Yeah, I think that the custom COPY format should support qualified
names at least.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: [PATCH] SVE popcount support
Next
From: "David G. Johnston"
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations