Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAD21AoDkoGL6yJ_HjNOg9cU=aAdW8uQ3rSQOeRS0SX85LPPNwQ@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
Re: Make COPY format extendable: Extract COPY TO format implementations |
List | pgsql-hackers |
On Fri, Dec 8, 2023 at 2:17 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Dec 08, 2023 at 10:32:27AM +0800, Junwang Zhao wrote: > > I can see FDW related utility commands but no TABLESAMPLE related, > > and there is a pg_foreign_data_wrapper system catalog which has > > a *fdwhandler* field. > > + */ +CATALOG(pg_copy_handler,4551,CopyHandlerRelationId) > > Using a catalog is an over-engineered design. Others have provided > hints about that upthread, but it would be enough to have one or two > handler types that are wrapped around one or two SQL *functions*, like > tablesamples. It seems like you've missed it, but feel free to read > about tablesample-method.sgml, that explains how this is achieved for > tablesamples. Agreed. My previous example of FDW was not a good one, I missed something. > > > If we want extensions to create a new copy handler, I think > > something like pg_copy_hander should be necessary. > > A catalog is not necessary, that's the point, because it can be > replaced by a scan of pg_proc with the function name defined in a COPY > query (be it through a FORMAT, or different option in a DefElem). > An example of extension with tablesamples is contrib/tsm_system_rows/, > that just uses a function returning a tsm_handler: > CREATE FUNCTION system_rows(internal) > RETURNS tsm_handler > AS 'MODULE_PATHNAME', 'tsm_system_rows_handler' > LANGUAGE C STRICT; > > Then SELECT queries rely on the contents of the TABLESAMPLE clause to > find the set of callbacks it should use by calling the function. > > +/* Routines for a COPY HANDLER implementation. */ > +typedef struct CopyRoutine > +{ > > FWIW, I find weird the concept of having one handler for both COPY > FROM and COPY TO as each one of them has callbacks that are mutually > exclusive to the other, but I'm OK if there is a consensus of only > one. So I'd suggest to use *two* NodeTags instead for a cleaner > split, meaning that we'd need two functions for each method. My point > is that a custom COPY handler could just define a COPY TO handler or a > COPY FROM handler, though it mostly comes down to a matter of taste > regarding how clean the error handling becomes if one tries to use a > set of callbacks with a COPY type (TO or FROM) not matching it. I tend to agree to have separate two functions for each method. But given we implement it in tablesample-way, I think we need to make it clear how to call one of the two functions depending on COPY TO and FROM. IIUC in tablesamples cases, we scan pg_proc to find the handler function like system_rows(internal) by the method name specified in the query. On the other hand, in COPY cases, the queries would be going to be like: COPY tab TO stdout WITH (format = 'arrow'); and COPY tab FROM stdin WITH (format = 'arrow'); So a custom COPY extension would not be able to define SQL functions just like arrow(internal) for example. We might need to define a rule like the function returning copy_in/out_handler must be defined as <method name>_to(internal) and <method_name>_from(internal). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: