Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAD21AoAfWrjpTDJ0garVUoXY0WC3Ud4Cu51q+ccWiotm1uo_2A@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Sutou Kouhei <kou@clear-code.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
|
List | pgsql-hackers |
On Wed, Mar 19, 2025 at 6:25 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAKFQuwaMAFMHqxDXR=SxA0mDjdmntrwxZd2w=nSruLNFH-OzLw@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 19 Mar 2025 17:49:49 -0700, > "David G. Johnston" <david.g.johnston@gmail.com> wrote: > > >> And could someone help (take over if possible) writing a > >> document for this feature? I'm not good at writing a > >> document in English... 0009 in the attached v37 patch set > >> has a draft of it. It's based on existing documents in > >> doc/src/sgml/ and *.h. > >> > >> > > I haven't touched the innards of the structs aside from changing > > programlisting to synopsis. And redoing the two section opening paragraphs > > to better integrate with the content in the chapter opening. > > > > The rest I kinda went to town on... > > Thanks!!! It's very helpful!!! > > I've applied your patch. 0009 is only changed. Thank you for updating the patches. I've reviewed the main part of supporting the custom COPY format. Here are some random comments: --- +/* + * Process the "format" option. + * + * This function checks whether the option value is a built-in format such as + * "text" and "csv" or not. If the option value isn't a built-in format, this + * function finds a COPY format handler that returns a CopyToRoutine (for + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY + * format handler is found, this function reports an error. + */ I think this comment needs to be updated as the part "If the option value isn't ..." is no longer true. I think we don't necessarily need to create a separate function ProcessCopyOptionFormat for processing the format option. We need more regression tests for handling the given format name. For example, - more various input patterns. - a function with the specified format name exists but it returns an unexpected Node. - looking for a handler function in a different namespace. etc. --- I think that we should accept qualified names too as the format name like tablesample does. That way, different extensions implementing the same format can be used. --- + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%u did not return " + "CopyFromRoutine struct", + opts->handler))); It's not conventional to put a new line between 'ereport(' and 'ERROR' (similarly between 'errcode(' and 'ERRCODE_...'. Also, we don't need to split the error message into multiple lines as it's not long. --- + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%u did not return " + "CopyToRoutine struct", + opts->handler))); Same as the above comment. --- + descr => 'pseudo-type for the result of a copy to/from method function', s/method function/format function/ --- + Oid handler; /* handler function for custom format routine */ 'handler' is used also for built-in formats. --- +static void +CopyFromInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid))); +} OIDs could be changed across major versions even for built-in types. I think it's better to avoid using it for tests. --- +static void +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid))); +} Similar to the above comment, the field name 'tts_nvalid' might also be changed in the future, let's use another name. --- +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { + .type = T_CopyFromRoutine, + .CopyFromInFunc = CopyFromInFunc, + .CopyFromStart = CopyFromStart, + .CopyFromOneRow = CopyFromOneRow, + .CopyFromEnd = CopyFromEnd, +}; I'd suggest not using the same function names as the fields. --- +/* + * Export CopySendEndOfRow() for extensions. We want to keep + * CopySendEndOfRow() as a static function for + * optimization. CopySendEndOfRow() calls in this file may be optimized by a + * compiler. + */ +void +CopyToStateFlush(CopyToState cstate) +{ + CopySendEndOfRow(cstate); +} Is there any reason to use a different name for public functions? --- +/* + * Export CopyGetData() for extensions. We want to keep CopyGetData() as a + * static function for optimization. CopyGetData() calls in this file may be + * optimized by a compiler. + */ +int +CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread) +{ + return CopyGetData(cstate, dest, minread, maxread); +} + The same as the above comment. --- + /* For custom format implementation */ + void *opaque; /* private space */ How about renaming 'private'? --- I've not reviewed the documentation patch yet but I think the patch seems to miss the updates to the description of the FORMAT option in the COPY command section. --- I think we can reorganize the patch set as follows: 1. Create copyto_internal.h and change COPY_XXX to COPY_SOURCE_XXX and COPY_DEST_XXX accordingly. 2. Support custom format for both COPY TO and COPY FROM. 3. Expose necessary helper functions such as CopySendEndOfRow(). 4. Add CopyFromSkipErrorRow(). 5. Documentation. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: