Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAD21AoCjK=NC62Z+VUip5-rc_rZz-6h9Dqacxo5G5PdK4kKKwQ@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations ("David G. Johnston" <david.g.johnston@gmail.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
On Fri, Mar 21, 2025 at 5:32 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <kou@clear-code.com> wrote: >> >> Hi, >> >> In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA@mail.gmail.com> >> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Mar 2025 13:50:03 -0700, >> Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> > I think that built-in formats also need to have their handler >> > functions. This seems to be a conventional way for customizable >> > features such as tablesample and access methods, and we can simplify >> > this function. >> >> OK. 0008 in the attached v37 patch set does it. >> > > tl/dr; > > We need to exclude from our SQL function search any function that doesn't declare copy_handler as its return type. > ("function text must return type copy_handler" is not an acceptable error message) > > We need to accept identifiers in FORMAT and parse the optional catalog, schema, and object name portions. > (Restrict our function search to the named schema if provided.) > > Detail: > > Fun thing...(not sure how much of this is covered above: I do see, but didn't scour, the security discussion): > > -- create some poison > create function public.text(internal) returns boolean language c as '/home/davidj/gotya/gotya', 'gotit'; > CREATE FUNCTION > > -- inject it > postgres=# set search_path to public,pg_catalog; > SET > > -- watch it die > postgres=# copy (select 1) to stdout (format text); > ERROR: function text must return type copy_handler > LINE 1: copy (select 1) to stdout (format text); > > I'm especially concerned about extensions here. > > We shouldn't be locating any SQL function that doesn't have a copy_handler return type. Unfortunately, LookupFuncNameseems incapable of doing what we want here. I suggest we create a new lookup routine where we can specifythe return argument type as a required element. That would cleanly mitigate the denial-of-service attack/accidentvector demonstrated above (the text returning function should have zero impact on how this feature behaves). If someone does create a handler SQL function without using copy_handler return type we'd end up showing "COPYformat 'david' not recognized" - a developer should be able to figure out they didn't put a correct return type on theirhandler function and that is why the system did not register it. Just to be clear, the patch checks the function's return type before calling it: funcargtypes[0] = INTERNALOID; handlerOid = LookupFuncName(list_make1(makeString(format)), 1, funcargtypes, true); if (!OidIsValid(handlerOid)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("COPY format \"%s\" not recognized", format), parser_errposition(pstate, defel->location))); /* check that handler has correct return type */ if (get_func_rettype(handlerOid) != COPY_HANDLEROID) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("function %s must return type %s", format, "copy_handler"), parser_errposition(pstate, defel->location))); So would changing the error message to like "COPY format 'text' not recognized" untangle your concern? FYI the same is true for TABLESAMPLE; it invokes a function with the specified method name and checks the returned Node type: =# select * from pg_class tablesample text (0); ERROR: function text must return type tsm_handler A difference between TABLESAMPLE and COPY format is that the former accepts a qualified name but the latter doesn't: =# create extension tsm_system_rows ; =# create schema s1; =# create function s1.system_rows(internal) returns void language c as 'tsm_system_rows.so', 'tsm_system_rows_handler'; =# \df *.system_rows List of functions Schema | Name | Result data type | Argument data types | Type --------+-------------+------------------+---------------------+------ public | system_rows | tsm_handler | internal | func s1 | system_rows | void | internal | func (2 rows) postgres(1:1194923)=# select count(*) from pg_class tablesample system_rows(0); count ------- 0 (1 row) postgres(1:1194923)=# select count(*) from pg_class tablesample s1.system_rows(0); ERROR: function s1.system_rows must return type tsm_handler > A second concern is simply people wanting to name things the same; or, why namespaces were invented. Yeah, I think that the custom COPY format should support qualified names at least. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: