Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id 20240125.174543.1042528588176841299.kou@clear-code.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
Hi,

In <ZbHS439y-Bs6HIAR@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 12:17:55 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:

> +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
> +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
> +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
> +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot);
> +typedef void (*CopyToEnd_function) (CopyToState cstate);
> 
> We don't really need a set of typedefs here, let's put the definitions
> in the CopyToRoutine struct instead.

OK. I'll do it.

> +extern CopyToRoutine CopyToRoutineText;
> +extern CopyToRoutine CopyToRoutineCSV;
> +extern CopyToRoutine CopyToRoutineBinary;
> 
> All that should IMO remain in copyto.c and copyfrom.c in the initial
> patch doing the refactoring.  Why not using a fetch function instead
> that uses a string in input?  Then you can call that once after
> parsing the List of options in ProcessCopyOptions().

OK. How about the following for the fetch function
signature?

extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);

We may introduce an enum and use it:

typedef enum CopyBuiltinFormat
{
    COPY_BUILTIN_FORMAT_TEXT = 0,
    COPY_BUILTIN_FORMAT_CSV,
    COPY_BUILTIN_FORMAT_BINARY,
} CopyBuiltinFormat;

extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format);

> +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
> + * move the code to here later. */
> Some areas, like this comment, are written in an incorrect format.

Oh, sorry. I assumed that the comment style was adjusted by
pgindent.

I'll use the following style:

/*
 * ...
 */

> +    getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
> +    fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
> 
> Actually, this split is interesting.  It is possible for a custom
> format to plug in a custom set of out functions.  Did you make use of
> something custom for your own stuff?

I didn't. My PoC custom COPY format handler for Apache Arrow
just handles integer and text for now. It doesn't use
cstate->out_functions because cstate->out_functions may not
return a valid binary format value for Apache Arrow. So it
formats each value by itself.

I'll chose one of them for a custom type (that isn't
supported by Apache Arrow, e.g. PostGIS types):

1. Report an unsupported error
2. Call output function for Apache Arrow provided by the
   custom type

>                                       Actually, could it make sense to
> split the assignment of cstate->out_functions into its own callback?

Yes. Because we need to use getTypeBinaryOutputInfo() for
"binary" and use getTypeOutputInfo() for "text" and "csv".

> Sure, that's part of the start phase, but at least it would make clear
> that a custom method *has* to assign these OIDs to work.  The patch
> implies that as a rule, without a comment that CopyToStart *must* set
> up these OIDs.

CopyToStart doesn't need to set up them if the handler
doesn't use cstate->out_functions.

> I think that 0001 and 0005 should be handled first, as pieces
> independent of the rest.  Then we could move on with 0002~0004 and
> 0006~0008.

OK. I'll focus on 0001 and 0005 for now. I'll restart
0002-0004/0006-0008 after 0001 and 0005 are accepted.


Thanks,
-- 
kou



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Split index and table statistics into different types of stats
Next
From: Sutou Kouhei
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations