Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Sutou Kouhei |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | 20240125.174543.1042528588176841299.kou@clear-code.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
Hi, In <ZbHS439y-Bs6HIAR@paquier.xyz> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 12:17:55 +0900, Michael Paquier <michael@paquier.xyz> wrote: > +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel); > +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate); > +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc); > +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot); > +typedef void (*CopyToEnd_function) (CopyToState cstate); > > We don't really need a set of typedefs here, let's put the definitions > in the CopyToRoutine struct instead. OK. I'll do it. > +extern CopyToRoutine CopyToRoutineText; > +extern CopyToRoutine CopyToRoutineCSV; > +extern CopyToRoutine CopyToRoutineBinary; > > All that should IMO remain in copyto.c and copyfrom.c in the initial > patch doing the refactoring. Why not using a fetch function instead > that uses a string in input? Then you can call that once after > parsing the List of options in ProcessCopyOptions(). OK. How about the following for the fetch function signature? extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format); We may introduce an enum and use it: typedef enum CopyBuiltinFormat { COPY_BUILTIN_FORMAT_TEXT = 0, COPY_BUILTIN_FORMAT_CSV, COPY_BUILTIN_FORMAT_BINARY, } CopyBuiltinFormat; extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format); > +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may > + * move the code to here later. */ > Some areas, like this comment, are written in an incorrect format. Oh, sorry. I assumed that the comment style was adjusted by pgindent. I'll use the following style: /* * ... */ > + getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); > + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); > > Actually, this split is interesting. It is possible for a custom > format to plug in a custom set of out functions. Did you make use of > something custom for your own stuff? I didn't. My PoC custom COPY format handler for Apache Arrow just handles integer and text for now. It doesn't use cstate->out_functions because cstate->out_functions may not return a valid binary format value for Apache Arrow. So it formats each value by itself. I'll chose one of them for a custom type (that isn't supported by Apache Arrow, e.g. PostGIS types): 1. Report an unsupported error 2. Call output function for Apache Arrow provided by the custom type > Actually, could it make sense to > split the assignment of cstate->out_functions into its own callback? Yes. Because we need to use getTypeBinaryOutputInfo() for "binary" and use getTypeOutputInfo() for "text" and "csv". > Sure, that's part of the start phase, but at least it would make clear > that a custom method *has* to assign these OIDs to work. The patch > implies that as a rule, without a comment that CopyToStart *must* set > up these OIDs. CopyToStart doesn't need to set up them if the handler doesn't use cstate->out_functions. > I think that 0001 and 0005 should be handled first, as pieces > independent of the rest. Then we could move on with 0002~0004 and > 0006~0008. OK. I'll focus on 0001 and 0005 for now. I'll restart 0002-0004/0006-0008 after 0001 and 0005 are accepted. Thanks, -- kou
pgsql-hackers by date: