Hi,
In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA@mail.gmail.com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Mar 2025 13:50:03 -0700,
Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I think that built-in formats also need to have their handler
> functions. This seems to be a conventional way for customizable
> features such as tablesample and access methods, and we can simplify
> this function.
OK. 0008 in the attached v37 patch set does it.
tl/dr;
We need to exclude from our SQL function search any function that doesn't declare copy_handler as its return type.
("function text must return type copy_handler" is not an acceptable error message)
We need to accept identifiers in FORMAT and parse the optional catalog, schema, and object name portions.
(Restrict our function search to the named schema if provided.)
Detail:
Fun thing...(not sure how much of this is covered above: I do see, but didn't scour, the security discussion):
-- create some poison
create function public.text(internal) returns boolean language c as '/home/davidj/gotya/gotya', 'gotit';
CREATE FUNCTION
-- inject it
postgres=# set search_path to public,pg_catalog;
SET
-- watch it die
postgres=# copy (select 1) to stdout (format text);
ERROR: function text must return type copy_handler
LINE 1: copy (select 1) to stdout (format text);
I'm especially concerned about extensions here.
We shouldn't be locating any SQL function that doesn't have a copy_handler return type. Unfortunately, LookupFuncName seems incapable of doing what we want here. I suggest we create a new lookup routine where we can specify the return argument type as a required element. That would cleanly mitigate the denial-of-service attack/accident vector demonstrated above (the text returning function should have zero impact on how this feature behaves). If someone does create a handler SQL function without using copy_handler return type we'd end up showing "COPY format 'david' not recognized" - a developer should be able to figure out they didn't put a correct return type on their handler function and that is why the system did not register it.
A second concern is simply people wanting to name things the same; or, why namespaces were invented.
Can we just accept a proper identifier after FORMAT so we can use schema-qualified names?
(FORMAT "davescopyformat"."david")
We can special case the internal schema-less names and internally force pg_catalog to avoid them being shadowed.
David J.