Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Junwang Zhao |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAEG8a3LxnBwNRPRwvmimDvOkPvYL8pB1+rhLBnxjeddFt3MeNw@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
On Fri, Feb 2, 2024 at 2:21 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Feb 02, 2024 at 09:40:56AM +0900, Sutou Kouhei wrote: > > Thanks. It'll help us. > > I have done a review of v10, see v11 attached which is still WIP, with > the patches for COPY TO and COPY FROM merged together. Note that I'm > thinking to merge them into a single commit. > > @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions > bool convert_selectively; /* do selective binary conversion? */ > CopyOnErrorChoice on_error; /* what to do when error happened */ > List *convert_select; /* list of column names (can be NIL) */ > + const CopyToRoutine *to_routine; /* callback routines for COPY TO */ > } CopyFormatOptions; > > Adding the routines to the structure for the format options is in my > opinion incorrect. The elements of this structure are first processed > in the option deparsing path, and then we need to use the options to > guess which routines we need. A more natural location is cstate > itself, so as the pointer to the routines is isolated within copyto.c I agree CopyToRoutine should be placed into CopyToStateData, but why set it after ProcessCopyOptions, the implementation of CopyToGetRoutine doesn't make sense if we want to support custom format in the future. Seems the refactor of v11 only considered performance but not *extendable copy format*. > and copyfrom_internal.h. My point is: the routines are an > implementation detail that the centralized copy.c has no need to know > about. This also led to a strange separation with > ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit > the hole in-between. > > The separation between cstate and the format-related fields could be > much better, though I am not sure if it is worth doing as it > introduces more duplication. For example, max_fields and raw_fields > are specific to text and csv, while binary does not care much. > Perhaps this is just useful to be for custom formats. I think those can be placed in format specific fields by utilizing the opaque space, but yeah, this will introduce duplication. > > copyapi.h needs more documentation, like what is expected for > extension developers when using these, what are the arguments, etc. I > have added what I had in mind for now. > > +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m); > > CopyReadAttributes and PostpareColumnValue are also callbacks specific > to text and csv, except that they are used within the per-row > callbacks. The same can be said about CopyAttributeOutHeaderFunction. > It seems to me that it would be less confusing to store pointers to > them in the routine structures, where the final picture involves not > having multiple layers of APIs like CopyToCSVStart, > CopyAttributeOutTextValue, etc. These *have* to be documented > properly in copyapi.h, and this is much easier now that cstate stores > the routine pointers. That would also make simpler function stacks. > Note that I have not changed that in the v11 attached. > > This business with the extra callbacks required for csv and text is my > main point of contention, but I'd be OK once the model of the APIs is > more linear, with everything in Copy{From,To}State. The changes would > be rather simple, and I'd be OK to put my hands on it. Just, > Sutou-san, would you agree with my last point about these extra > callbacks? > -- > Michael If V7 and V10 have no performance reduction, then I think V6 is also good with performance, since most of the time goes to CopyToOneRow and CopyFromOneRow. I just think we should take the *extendable* into consideration at the beginning. -- Regards Junwang Zhao
pgsql-hackers by date: