Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAD21AoAfWrjpTDJ0garVUoXY0WC3Ud4Cu51q+ccWiotm1uo_2A@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
On Wed, Mar 19, 2025 at 6:25 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAKFQuwaMAFMHqxDXR=SxA0mDjdmntrwxZd2w=nSruLNFH-OzLw@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 19 Mar 2025 17:49:49 -0700,
>   "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>
> >> And could someone help (take over if possible) writing a
> >> document for this feature? I'm not good at writing a
> >> document in English... 0009 in the attached v37 patch set
> >> has a draft of it. It's based on existing documents in
> >> doc/src/sgml/ and *.h.
> >>
> >>
> > I haven't touched the innards of the structs aside from changing
> > programlisting to synopsis.  And redoing the two section opening paragraphs
> > to better integrate with the content in the chapter opening.
> >
> > The rest I kinda went to town on...
>
> Thanks!!! It's very helpful!!!
>
> I've applied your patch. 0009 is only changed.

Thank you for updating the patches. I've reviewed the main part of
supporting the custom COPY format. Here are some random comments:

---
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
+ */

I think this comment needs to be updated as the part "If the option
value isn't ..." is no longer true.

I think we don't necessarily need to create a separate function
ProcessCopyOptionFormat for processing the format option.

We need more regression tests for handling the given format name. For example,

- more various input patterns.
- a function with the specified format name exists but it returns an
unexpected Node.
- looking for a handler function in a different namespace.
etc.

---
I think that we should accept qualified names too as the format name
like tablesample does. That way, different extensions implementing the
same format can be used.

---
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+                ereport(
+                                ERROR,
+                                (errcode(
+
ERRCODE_INVALID_PARAMETER_VALUE),
+                                 errmsg("COPY handler function "
+                                                "%u did not return "
+                                                "CopyFromRoutine struct",
+                                                opts->handler)));

It's not conventional to put a new line between 'ereport(' and 'ERROR'
(similarly between 'errcode(' and 'ERRCODE_...'. Also, we don't need
to split the error message into multiple lines as it's not long.

---
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+                ereport(
+                                ERROR,
+                                (errcode(
+
ERRCODE_INVALID_PARAMETER_VALUE),
+                                 errmsg("COPY handler function "
+                                                "%u did not return "
+                                                "CopyToRoutine struct",
+                                                opts->handler)));

Same as the above comment.

---
+  descr => 'pseudo-type for the result of a copy to/from method function',

s/method function/format function/

---
+        Oid                    handler;                /* handler
function for custom format routine */

'handler' is used also for built-in formats.

---
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+                           FmgrInfo *finfo, Oid *typioparam)
+{
+        ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}

OIDs could be changed across major versions even for built-in types. I
think it's better to avoid using it for tests.

---
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+        ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u",
slot->tts_nvalid)));
+}

Similar to the above comment, the field name 'tts_nvalid' might also
be changed in the future, let's use another name.

---
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+        .type = T_CopyFromRoutine,
+        .CopyFromInFunc = CopyFromInFunc,
+        .CopyFromStart = CopyFromStart,
+        .CopyFromOneRow = CopyFromOneRow,
+        .CopyFromEnd = CopyFromEnd,
+};

I'd suggest not using the same function names as the fields.

---
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+        CopySendEndOfRow(cstate);
+}

Is there any reason to use a different name for public functions?

---
+/*
+ * Export CopyGetData() for extensions. We want to keep CopyGetData() as a
+ * static function for optimization. CopyGetData() calls in this file may be
+ * optimized by a compiler.
+ */
+int
+CopyFromStateGetData(CopyFromState cstate, void *dest, int minread,
int maxread)
+{
+        return CopyGetData(cstate, dest, minread, maxread);
+}
+

The same as the above comment.

---
+        /* For custom format implementation */
+        void      *opaque;                     /* private space */

How about renaming 'private'?

---
I've not reviewed the documentation patch yet but I think the patch
seems to miss the updates to the description of the FORMAT option in
the COPY command section.

---
I think we can reorganize the patch set as follows:

1. Create copyto_internal.h and change COPY_XXX to COPY_SOURCE_XXX and
COPY_DEST_XXX accordingly.
2. Support custom format for both COPY TO and COPY FROM.
3. Expose necessary helper functions such as CopySendEndOfRow().
4. Add CopyFromSkipErrorRow().
5. Documentation.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Reduce "Var IS [NOT] NULL" quals during constant folding
Next
From: Richard Guo
Date:
Subject: Re: Reduce "Var IS [NOT] NULL" quals during constant folding