Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | CAD21AoBKMNsO+b6wahb6847xwFci1JCfV+JykoMziVgiFxB6cw@mail.gmail.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Sutou Kouhei <kou@clear-code.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
Re: Make COPY format extendable: Extract COPY TO format implementations |
List | pgsql-hackers |
On Wed, Mar 26, 2025 at 8:28 PM Sutou Kouhei <kou@clear-code.com> wrote: > > > We need more regression tests for handling the given format name. For example, > > > > - more various input patterns. > > - a function with the specified format name exists but it returns an > > unexpected Node. > > - looking for a handler function in a different namespace. > > etc. > > I've added the following tests: > > * Wrong input type handler without namespace > * Wrong input type handler with namespace > * Wrong return type handler without namespace > * Wrong return type handler with namespace > * Wrong return value (Copy*Routine isn't returned) handler without namespace > * Wrong return value (Copy*Routine isn't returned) handler with namespace > * Nonexistent handler > * Invalid qualified name > * Valid handler without namespace and without search_path > * Valid handler without namespace and with search_path > * Valid handler with namespace Probably we can merge these newly added four files into one .sql file? Also we need to add some comments to describe what these queries test. For example, it's not clear to me at a glance what queries in no-schema.sql are going to test as there is no comment there. > > 0002 also includes this. > > > I think that we should accept qualified names too as the format name > > like tablesample does. That way, different extensions implementing the > > same format can be used. > > Implemented. It's implemented after parsing SQL. Is it OK? > (It seems that tablesample does it in parsing SQL.) I think it's okay. One problem in the following chunk I can see is: + qualified_format = stringToQualifiedNameList(format, NULL); + DeconstructQualifiedName(qualified_format, &schema, &fmt); + if (!schema || strcmp(schema, "pg_catalog") == 0) + { + if (strcmp(fmt, "csv") == 0) + opts_out->csv_mode = true; + else if (strcmp(fmt, "binary") == 0) + opts_out->binary = true; + } Non-qualified names depend on the search_path value so it's not necessarily a built-in format. If the user specifies 'csv' with seach_patch = 'myschema, pg_catalog', the COPY command unnecessarily sets csv_mode true. I think we can instead check if the retrieved handler function's OID matches the built-in formats' ones. Also, it's weired to me that cstate has csv_mode and binary fields even though the format should have already been known by the callback functions. Regarding the documentation for the existing options, it says "... only when not using XXX format." some places, where XXX can be replaced with binary or CSV. Once we support custom formats, 'non-CSV mode' would actually include custom formats as well, so we need to update the description too. > > > --- > > +static void > > +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) > > +{ > > + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", > > slot->tts_nvalid))); > > +} > > > > Similar to the above comment, the field name 'tts_nvalid' might also > > be changed in the future, let's use another name. > > Hmm. If the field name is changed, we need to change this > code. Yes, but if we use independe name in the NOTICE message we would not need to update the expected files. > > > --- > > +/* > > + * Export CopySendEndOfRow() for extensions. We want to keep > > + * CopySendEndOfRow() as a static function for > > + * optimization. CopySendEndOfRow() calls in this file may be optimized by a > > + * compiler. > > + */ > > +void > > +CopyToStateFlush(CopyToState cstate) > > +{ > > + CopySendEndOfRow(cstate); > > +} > > > > Is there any reason to use a different name for public functions? > > In this patch set, I use "CopyFrom"/"CopyTo" prefixes for > public APIs for custom COPY FORMAT handler extensions. It > will help understanding related APIs. Is it strange in > PostgreSQL? I see your point. Probably we need to find a better name as the name CopyToStateFlush doesn't sound well like this API should be called only once at the end of a row (IOW user might try to call it multiple times to 'flush' the state while processing a row). How about CopyToEndOfRow()? > > > --- > > + /* For custom format implementation */ > > + void *opaque; /* private space */ > > > > How about renaming 'private'? > > We should not use "private" because it's a keyword in > C++. If we use "private" here, we can't include this file > from C++ code. Understood. > > > --- > > I've not reviewed the documentation patch yet but I think the patch > > seems to miss the updates to the description of the FORMAT option in > > the COPY command section. > > I defer this for now. We can revisit the last documentation > patch after we finalize our API. (Or could someone help us?) > > > I think we can reorganize the patch set as follows: > > > > 1. Create copyto_internal.h and change COPY_XXX to COPY_SOURCE_XXX and > > COPY_DEST_XXX accordingly. > > 2. Support custom format for both COPY TO and COPY FROM. > > 3. Expose necessary helper functions such as CopySendEndOfRow(). > > 4. Add CopyFromSkipErrorRow(). > > 5. Documentation. > > The attached v39 patch set uses the followings: > > 0001: Create copyto_internal.h and change COPY_XXX to > COPY_SOURCE_XXX and COPY_DEST_XXX accordingly. > (Same as 1. in your suggestion) > 0002: Support custom format for both COPY TO and COPY FROM. > (Same as 2. in your suggestion) > 0003: Expose necessary helper functions such as CopySendEndOfRow() > and add CopyFromSkipErrorRow(). > (3. + 4. in your suggestion) > 0004: Define handler functions for built-in formats. > (Not included in your suggestion) > 0005: Documentation. (WIP) > (Same as 5. in your suggestion) Can we merge 0002 and 0004? > We can merge 0001 quickly, right? I don't think it makes sense to push only 0001 as it's a completely preliminary patch for subsequent patches. It would be prudent to push it once other patches are ready too. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: