Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id 20250327.122840.991624455773269772.kou@clear-code.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
Re: Make COPY format extendable: Extract COPY TO format implementations
Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
Hi,

In <CAD21AoAfWrjpTDJ0garVUoXY0WC3Ud4Cu51q+ccWiotm1uo_2A@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sun, 23 Mar 2025 02:01:59 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> ---
> +/*
> + * Process the "format" option.
> + *
> + * This function checks whether the option value is a built-in format such as
> + * "text" and "csv" or not. If the option value isn't a built-in format, this
> + * function finds a COPY format handler that returns a CopyToRoutine (for
> + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
> + * format handler is found, this function reports an error.
> + */
> 
> I think this comment needs to be updated as the part "If the option
> value isn't ..." is no longer true.
> 
> I think we don't necessarily need to create a separate function
> ProcessCopyOptionFormat for processing the format option.

Hmm. I think that this separated function will increase
readability by reducing indentation. But I've removed the
separation as you suggested. So the comment is also removed
entirely.

0002 includes this.

> We need more regression tests for handling the given format name. For example,
> 
> - more various input patterns.
> - a function with the specified format name exists but it returns an
> unexpected Node.
> - looking for a handler function in a different namespace.
> etc.

I've added the following tests:

* Wrong input type handler without namespace
* Wrong input type handler with namespace
* Wrong return type handler without namespace
* Wrong return type handler with namespace
* Wrong return value (Copy*Routine isn't returned) handler without namespace
* Wrong return value (Copy*Routine isn't returned) handler with namespace
* Nonexistent handler
* Invalid qualified name
* Valid handler without namespace and without search_path
* Valid handler without namespace and with search_path
* Valid handler with namespace

0002 also includes this.

> I think that we should accept qualified names too as the format name
> like tablesample does. That way, different extensions implementing the
> same format can be used.

Implemented. It's implemented after parsing SQL. Is it OK?
(It seems that tablesample does it in parsing SQL.)

Because "WITH (FORMAT XXX)" is processed as a generic option
in gram.y. All generic options are processed as strings. So
I keep this.

Syntax is "COPY ... WITH (FORMAT 'NAMESPACE.HANDLER_NAME')"
not "COPY ... WITH (FORMAT 'NAMESPACE'.'HANDLER_NAME')"
because of this choice.


0002 also includes this.

> ---
> +        if (routine == NULL || !IsA(routine, CopyFromRoutine))
> +                ereport(
> +                                ERROR,
> +                                (errcode(
> +
> ERRCODE_INVALID_PARAMETER_VALUE),
> +                                 errmsg("COPY handler function "
> +                                                "%u did not return "
> +                                                "CopyFromRoutine struct",
> +                                                opts->handler)));
> 
> It's not conventional to put a new line between 'ereport(' and 'ERROR'
> (similarly between 'errcode(' and 'ERRCODE_...'. Also, we don't need
> to split the error message into multiple lines as it's not long.

Oh, sorry. I can't remember why I used this... I think I
trusted pgindent...

> ---
> +  descr => 'pseudo-type for the result of a copy to/from method function',
> 
> s/method function/format function/

Good catch. I used "handler function" not "format function"
because we use "handler" in other places.

> ---
> +        Oid                    handler;                /* handler
> function for custom format routine */
> 
> 'handler' is used also for built-in formats.

Updated in 0004.

> ---
> +static void
> +CopyFromInFunc(CopyFromState cstate, Oid atttypid,
> +                           FmgrInfo *finfo, Oid *typioparam)
> +{
> +        ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
> +}
> 
> OIDs could be changed across major versions even for built-in types. I
> think it's better to avoid using it for tests.

Oh, I didn't know it. I've changed to use type name instead
of OID. It'll be more stable than OID.

> ---
> +static void
> +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
> +{
> +        ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u",
> slot->tts_nvalid)));
> +}
> 
> Similar to the above comment, the field name 'tts_nvalid' might also
> be changed in the future, let's use another name.

Hmm. If the field name is changed, we need to change this
code. So changing tests too isn't strange. Anyway, I used
more generic text.

> ---
> +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
> +        .type = T_CopyFromRoutine,
> +        .CopyFromInFunc = CopyFromInFunc,
> +        .CopyFromStart = CopyFromStart,
> +        .CopyFromOneRow = CopyFromOneRow,
> +        .CopyFromEnd = CopyFromEnd,
> +};
> 
> I'd suggest not using the same function names as the fields.

OK. I've added "Test" prefix.

> ---
> +/*
> + * Export CopySendEndOfRow() for extensions. We want to keep
> + * CopySendEndOfRow() as a static function for
> + * optimization. CopySendEndOfRow() calls in this file may be optimized by a
> + * compiler.
> + */
> +void
> +CopyToStateFlush(CopyToState cstate)
> +{
> +        CopySendEndOfRow(cstate);
> +}
> 
> Is there any reason to use a different name for public functions?

In this patch set, I use "CopyFrom"/"CopyTo" prefixes for
public APIs for custom COPY FORMAT handler extensions. It
will help understanding related APIs. Is it strange in
PostgreSQL?

> ---
> +        /* For custom format implementation */
> +        void      *opaque;                     /* private space */
> 
> How about renaming 'private'?

We should not use "private" because it's a keyword in
C++. If we use "private" here, we can't include this file
from C++ code.

> ---
> I've not reviewed the documentation patch yet but I think the patch
> seems to miss the updates to the description of the FORMAT option in
> the COPY command section.

I defer this for now. We can revisit the last documentation
patch after we finalize our API. (Or could someone help us?)

> I think we can reorganize the patch set as follows:
> 
> 1. Create copyto_internal.h and change COPY_XXX to COPY_SOURCE_XXX and
> COPY_DEST_XXX accordingly.
> 2. Support custom format for both COPY TO and COPY FROM.
> 3. Expose necessary helper functions such as CopySendEndOfRow().
> 4. Add CopyFromSkipErrorRow().
> 5. Documentation.

The attached v39 patch set uses the followings:

0001: Create copyto_internal.h and change COPY_XXX to
      COPY_SOURCE_XXX and COPY_DEST_XXX accordingly.
      (Same as 1. in your suggestion)
0002: Support custom format for both COPY TO and COPY FROM.
      (Same as 2. in your suggestion)
0003: Expose necessary helper functions such as CopySendEndOfRow()
      and add CopyFromSkipErrorRow().
      (3. + 4. in your suggestion)
0004: Define handler functions for built-in formats.
      (Not included in your suggestion)
0005: Documentation. (WIP)
      (Same as 5. in your suggestion)

We can merge 0001 quickly, right?


Thanks,
-- 
kou
From 76f8134652f14210817e872daab3c0a8b3c0318a Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v39 1/5] Export CopyDest as private data

This is a preparation to export CopyToStateData as private data.

CopyToStateData depends on CopyDest. So we need to export CopyDest
too.

But CopyDest and CopySource has the same names. So we can't export
CopyDest as-is.

This uses the COPY_DEST_ prefix for CopyDest enum values. CopySource
uses the COPY_FROM_ prefix for consistency.
---
 src/backend/commands/copyfrom.c          |  4 ++--
 src/backend/commands/copyfromparse.c     | 10 ++++-----
 src/backend/commands/copyto.c            | 28 ++++++++----------------
 src/include/commands/copyfrom_internal.h |  6 ++---
 src/include/commands/copyto_internal.h   | 28 ++++++++++++++++++++++++
 5 files changed, 47 insertions(+), 29 deletions(-)
 create mode 100644 src/include/commands/copyto_internal.h

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..f58497d4187 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1709,7 +1709,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1837,7 +1837,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..17e51f02e04 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..05ad87d8220 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -36,17 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
 /*
  * This struct contains all the state variables used throughout a COPY TO
  * operation.
@@ -401,7 +391,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -448,7 +438,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -482,11 +472,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -507,7 +497,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -515,7 +505,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -900,12 +890,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..3a306e3286e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -24,9 +24,9 @@
  */
 typedef enum CopySource
 {
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
 } CopySource;
 
 /*
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
new file mode 100644
index 00000000000..42ddb37a8a2
--- /dev/null
+++ b/src/include/commands/copyto_internal.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyto_internal.h
+ *      Internal definitions for COPY TO command.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyto_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYTO_INTERNAL_H
+#define COPYTO_INTERNAL_H
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+#endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.2

From bcc2c19e59b4dbc15fcfd3b5c7d4837314e00518 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 27 Mar 2025 11:14:43 +0900
Subject: [PATCH v39 2/5] Add support for adding custom COPY format

This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a handler returns a CopyToRoutine for COPY TO and a CopyFromRoutine
for COPY FROM.

Whether COPY TO or COPY FROM is passed as the "is_from" argument:

    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine

This also add a test module for custom COPY handler.
---
 src/backend/commands/copy.c                   |  31 ++++-
 src/backend/commands/copyfrom.c               |  20 +++-
 src/backend/commands/copyto.c                 |  70 +++--------
 src/backend/nodes/Makefile                    |   1 +
 src/backend/nodes/gen_node_support.pl         |   2 +
 src/backend/utils/adt/pseudotypes.c           |   1 +
 src/include/catalog/pg_proc.dat               |   6 +
 src/include/catalog/pg_type.dat               |   6 +
 src/include/commands/copy.h                   |   3 +-
 src/include/commands/copyapi.h                |   4 +
 src/include/commands/copyto_internal.h        |  55 +++++++++
 src/include/nodes/meson.build                 |   1 +
 src/test/modules/Makefile                     |   1 +
 src/test/modules/meson.build                  |   1 +
 src/test/modules/test_copy_format/.gitignore  |   4 +
 src/test/modules/test_copy_format/Makefile    |  23 ++++
 .../test_copy_format/expected/invalid.out     |  61 ++++++++++
 .../test_copy_format/expected/no_schema.out   |  23 ++++
 .../test_copy_format/expected/schema.out      |  56 +++++++++
 src/test/modules/test_copy_format/meson.build |  35 ++++++
 .../modules/test_copy_format/sql/invalid.sql  |  29 +++++
 .../test_copy_format/sql/no_schema.sql        |   8 ++
 .../modules/test_copy_format/sql/schema.sql   |  24 ++++
 .../test_copy_format--1.0.sql                 |  24 ++++
 .../test_copy_format/test_copy_format.c       | 113 ++++++++++++++++++
 .../test_copy_format/test_copy_format.control |   4 +
 26 files changed, 549 insertions(+), 57 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/invalid.out
 create mode 100644 src/test/modules/test_copy_format/expected/no_schema.out
 create mode 100644 src/test/modules/test_copy_format/expected/schema.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/invalid.sql
 create mode 100644 src/test/modules/test_copy_format/sql/no_schema.sql
 create mode 100644 src/test/modules/test_copy_format/sql/schema.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..2539aee3a53 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,10 +32,12 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/regproc.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 
@@ -531,10 +533,31 @@ ProcessCopyOptions(ParseState *pstate,
             else if (strcmp(fmt, "binary") == 0)
                 opts_out->binary = true;
             else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            {
+                List       *qualified_format;
+                Oid            arg_types[1];
+                Oid            handler = InvalidOid;
+
+                qualified_format = stringToQualifiedNameList(fmt, NULL);
+                arg_types[0] = INTERNALOID;
+                handler = LookupFuncName(qualified_format, 1,
+                                         arg_types, true);
+                if (!OidIsValid(handler))
+                    ereport(ERROR,
+                            (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                             errmsg("COPY format \"%s\" not recognized", fmt),
+                             parser_errposition(pstate, defel->location)));
+
+                /* check that handler has correct return type */
+                if (get_func_rettype(handler) != COPY_HANDLEROID)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                             errmsg("function %s must return type %s",
+                                    fmt, "copy_handler"),
+                             parser_errposition(pstate, defel->location)));
+
+                opts_out->handler = handler;
+            }
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f58497d4187..91f44193abf 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
 
 /* text format */
 static const CopyFromRoutine CopyFromRoutineText = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromTextOneRow,
@@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 
 /* CSV format */
 static const CopyFromRoutine CopyFromRoutineCSV = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromCSVOneRow,
@@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 
 /* binary format */
 static const CopyFromRoutine CopyFromRoutineBinary = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
     .CopyFromOneRow = CopyFromBinaryOneRow,
@@ -155,7 +158,22 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts->csv_mode)
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
+
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct",
+                            get_namespace_name(get_func_namespace(opts->handler)),
+                            get_func_name(opts->handler))));
+        return castNode(CopyFromRoutine, routine);
+    }
+    else if (opts->csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts->binary)
         return &CopyFromRoutineBinary;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 05ad87d8220..b7ff6466ce3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -37,56 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -140,6 +90,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 
 /* text format */
 static const CopyToRoutine CopyToRoutineText = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToTextOneRow,
@@ -148,6 +99,7 @@ static const CopyToRoutine CopyToRoutineText = {
 
 /* CSV format */
 static const CopyToRoutine CopyToRoutineCSV = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToCSVOneRow,
@@ -156,6 +108,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
     .CopyToOneRow = CopyToBinaryOneRow,
@@ -166,7 +119,22 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts->csv_mode)
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
+
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function %s.%s did not return CopyToRoutine struct",
+                            get_namespace_name(get_func_namespace(opts->handler)),
+                            get_func_name(opts->handler))));
+        return castNode(CopyToRoutine, routine);
+    }
+    else if (opts->csv_mode)
         return &CopyToRoutineCSV;
     else if (opts->binary)
         return &CopyToRoutineBinary;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 77ddb9ca53f..dc6c1087361 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -50,6 +50,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 40994b53fb2..d7e8d16e789
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -62,6 +62,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -86,6 +87,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index 317a1f2b282..f2ebc21ca56 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0d29ef50ff2..79b5a36088c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7838,6 +7838,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 6dca77e0a22..bddf9fb4fbe 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a COPY TO/FROM handler function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..6df1f8a3b9b 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,9 +87,10 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* These are private in commands/copy[from|to]_internal.h */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2a2d2f9876b..53ad3337f86 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -22,6 +22,8 @@
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
@@ -60,6 +62,8 @@ typedef struct CopyToRoutine
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 42ddb37a8a2..12c4a0f5979 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -14,6 +14,11 @@
 #ifndef COPYTO_INTERNAL_H
 #define COPYTO_INTERNAL_H
 
+#include "commands/copy.h"
+#include "executor/execdesc.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
 /*
  * Represents the different dest cases we need to worry about at
  * the bottom level
@@ -25,4 +30,54 @@ typedef enum CopyDest
     COPY_DEST_CALLBACK,            /* to callback function */
 } CopyDest;
 
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index d1ca24dd32f..96e70e7f38b 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -12,6 +12,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 4e4be3fa511..c9da440eed0 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -16,6 +16,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 2b057451473..d33bbbd4092 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -15,6 +15,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/invalid.out
b/src/test/modules/test_copy_format/expected/invalid.out
new file mode 100644
index 00000000000..306c9928431
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/invalid.out
@@ -0,0 +1,61 @@
+CREATE SCHEMA test_schema;
+CREATE EXTENSION test_copy_format WITH SCHEMA test_schema;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+SET search_path = public,test_schema;
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_input_type');
+ERROR:  COPY format "test_copy_format_wrong_input_type" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_w...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_input_type');
+ERROR:  COPY format "test_copy_format_wrong_input_type" not recognized
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wr...
+                                         ^
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_return_type');
+ERROR:  function test_copy_format_wrong_return_type must return type copy_handler
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_w...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_return_type');
+ERROR:  function test_copy_format_wrong_return_type must return type copy_handler
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wr...
+                                         ^
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_return_value');
+ERROR:  COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyFromRoutine struct
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_return_value');
+ERROR:  COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyToRoutine struct
+RESET search_path;
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+ERROR:  COPY format "test_schema.test_copy_format_wrong_input_type" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_c...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+ERROR:  COPY format "test_schema.test_copy_format_wrong_input_type" not recognized
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'test_schema.test_co...
+                                         ^
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+ERROR:  function test_schema.test_copy_format_wrong_return_type must return type copy_handler
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_c...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+ERROR:  function test_schema.test_copy_format_wrong_return_type must return type copy_handler
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'test_schema.test_co...
+                                         ^
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+ERROR:  COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyFromRoutine struct
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+ERROR:  COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyToRoutine struct
+COPY public.test FROM stdin WITH (FORMAT 'nonexistent');
+ERROR:  COPY format "nonexistent" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'nonexistent');
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'nonexistent');
+ERROR:  COPY format "nonexistent" not recognized
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'nonexistent');
+                                         ^
+COPY public.test FROM stdin WITH (FORMAT 'invalid.qualified.name');
+ERROR:  cross-database references are not implemented: invalid.qualified.name
+COPY public.test TO stdout WITH (FORMAT 'invalid.qualified.name');
+ERROR:  cross-database references are not implemented: invalid.qualified.name
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
+DROP SCHEMA test_schema;
diff --git a/src/test/modules/test_copy_format/expected/no_schema.out
b/src/test/modules/test_copy_format/expected/no_schema.out
new file mode 100644
index 00000000000..d5903632b2e
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/no_schema.out
@@ -0,0 +1,23 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
diff --git a/src/test/modules/test_copy_format/expected/schema.out
b/src/test/modules/test_copy_format/expected/schema.out
new file mode 100644
index 00000000000..698189fbeae
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/schema.out
@@ -0,0 +1,56 @@
+CREATE SCHEMA test_schema;
+CREATE EXTENSION test_copy_format WITH SCHEMA test_schema;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- Qualified name
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+-- No schema, no search path
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+                                         ^
+-- No schema, with search path
+SET search_path = test_schema,public;
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+RESET search_path;
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
+DROP SCHEMA test_schema;
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..8010659585b
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,35 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'invalid',
+      'no_schema',
+      'schema',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/invalid.sql b/src/test/modules/test_copy_format/sql/invalid.sql
new file mode 100644
index 00000000000..e475f6a38c6
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/invalid.sql
@@ -0,0 +1,29 @@
+CREATE SCHEMA test_schema;
+CREATE EXTENSION test_copy_format WITH SCHEMA test_schema;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
+SET search_path = public,test_schema;
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_input_type');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_input_type');
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_return_type');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_return_type');
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_return_value');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_return_value');
+RESET search_path;
+
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+
+COPY public.test FROM stdin WITH (FORMAT 'nonexistent');
+COPY public.test TO stdout WITH (FORMAT 'nonexistent');
+COPY public.test FROM stdin WITH (FORMAT 'invalid.qualified.name');
+COPY public.test TO stdout WITH (FORMAT 'invalid.qualified.name');
+
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
+DROP SCHEMA test_schema;
diff --git a/src/test/modules/test_copy_format/sql/no_schema.sql b/src/test/modules/test_copy_format/sql/no_schema.sql
new file mode 100644
index 00000000000..1e049f799f0
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/no_schema.sql
@@ -0,0 +1,8 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+\.
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
diff --git a/src/test/modules/test_copy_format/sql/schema.sql b/src/test/modules/test_copy_format/sql/schema.sql
new file mode 100644
index 00000000000..ab9492158e1
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/schema.sql
@@ -0,0 +1,24 @@
+CREATE SCHEMA test_schema;
+CREATE EXTENSION test_copy_format WITH SCHEMA test_schema;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
+-- Qualified name
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format');
+\.
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format');
+
+-- No schema, no search path
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+
+-- No schema, with search path
+SET search_path = test_schema,public;
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+\.
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+RESET search_path;
+
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
+DROP SCHEMA test_schema;
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..c1a137181f8
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,24 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION test_copy_format_wrong_input_type(bool)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION test_copy_format_wrong_return_type(internal)
+    RETURNS bool
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION test_copy_format_wrong_return_value(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format_wrong_return_value'
+    LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..1d754201336
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,113 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+static void
+TestCopyFromInFunc(CopyFromState cstate, Oid atttypid,
+                   FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: attribute: %s", format_type_be(atttypid))));
+}
+
+static void
+TestCopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: the number of attributes: %d", tupDesc->natts)));
+}
+
+static bool
+TestCopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+TestCopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = TestCopyFromInFunc,
+    .CopyFromStart = TestCopyFromStart,
+    .CopyFromOneRow = TestCopyFromOneRow,
+    .CopyFromEnd = TestCopyFromEnd,
+};
+
+static void
+TestCopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: attribute: %s", format_type_be(atttypid))));
+}
+
+static void
+TestCopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: the number of attributes: %d", tupDesc->natts)));
+}
+
+static void
+TestCopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: the number of valid values: %u", slot->tts_nvalid)));
+}
+
+static void
+TestCopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = TestCopyToOutFunc,
+    .CopyToStart = TestCopyToStart,
+    .CopyToOneRow = TestCopyToOneRow,
+    .CopyToEnd = TestCopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
+
+PG_FUNCTION_INFO_V1(test_copy_format_wrong_return_value);
+Datum
+test_copy_format_wrong_return_value(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_CSTRING(pstrdup("is_from=true"));
+    else
+        PG_RETURN_CSTRING(pstrdup("is_from=false"));
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.47.2

From f0fcd9ee75b82d7365306317f99dbe1efff1f34b Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 27 Mar 2025 11:24:15 +0900
Subject: [PATCH v39 3/5] Add support for implementing custom COPY handler as
 extension

* TO: Add CopyToStateData::opaque that can be used to keep
  data for custom COPY TO handler implementation
* TO: Export CopySendEndOfRow() to send end of row data as
  CopyToStateFlush()
* FROM: Add CopyFromStateData::opaque that can be used to
  keep data for custom COPY FROM handler implementation
* FROM: Export CopyGetData() to get the next data as
  CopyFromStateGetData()
* FROM: Add CopyFromSkipErrorRow() for "ON_ERROR stop" and
  "LOG_VERBOSITY verbose"

COPY FROM extensions must call CopyFromSkipErrorRow() when
CopyFromOneRow callback reports an error by
errsave(). CopyFromSkipErrorRow() handles "ON_ERROR stop" and
"LOG_VERBOSITY verbose" cases.
---
 src/backend/commands/copyfromparse.c          | 93 ++++++++++++-------
 src/backend/commands/copyto.c                 | 12 +++
 src/include/commands/copyapi.h                |  6 ++
 src/include/commands/copyfrom_internal.h      |  3 +
 src/include/commands/copyto_internal.h        |  3 +
 .../test_copy_format/expected/no_schema.out   | 47 ++++++++++
 .../test_copy_format/sql/no_schema.sql        | 24 +++++
 .../test_copy_format/test_copy_format.c       | 80 +++++++++++++++-
 8 files changed, 231 insertions(+), 37 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 17e51f02e04..2070f51a963 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -739,6 +739,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * Export CopyGetData() for extensions. We want to keep CopyGetData() as a
+ * static function for optimization. CopyGetData() calls in this file may be
+ * optimized by a compiler.
+ */
+int
+CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread)
+{
+    return CopyGetData(cstate, dest, minread, maxread);
+}
+
 /*
  * This function is exposed for use by extensions that read raw fields in the
  * next line. See NextCopyFromRawFieldsInternal() for details.
@@ -927,6 +938,51 @@ CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
     return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
 }
 
+/*
+ * Call this when you report an error by errsave() in your CopyFromOneRow
+ * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases
+ * for you.
+ */
+void
+CopyFromSkipErrorRow(CopyFromState cstate)
+{
+    Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+    cstate->num_errors++;
+
+    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+    {
+        /*
+         * Since we emit line number and column info in the below notice
+         * message, we suppress error context information other than the
+         * relation name.
+         */
+        Assert(!cstate->relname_only);
+        cstate->relname_only = true;
+
+        if (cstate->cur_attval)
+        {
+            char       *attval;
+
+            attval = CopyLimitPrintoutLength(cstate->cur_attval);
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname,
+                           attval));
+            pfree(attval);
+        }
+        else
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname));
+
+        /* reset relname_only */
+        cstate->relname_only = false;
+    }
+}
+
 /*
  * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
  *
@@ -1033,42 +1089,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
                                         (Node *) cstate->escontext,
                                         &values[m]))
         {
-            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-            cstate->num_errors++;
-
-            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-            {
-                /*
-                 * Since we emit line number and column info in the below
-                 * notice message, we suppress error context information other
-                 * than the relation name.
-                 */
-                Assert(!cstate->relname_only);
-                cstate->relname_only = true;
-
-                if (cstate->cur_attval)
-                {
-                    char       *attval;
-
-                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname,
-                                   attval));
-                    pfree(attval);
-                }
-                else
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname));
-
-                /* reset relname_only */
-                cstate->relname_only = false;
-            }
-
+            CopyFromSkipErrorRow(cstate);
             return true;
         }
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index b7ff6466ce3..23cbdad184c 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -456,6 +456,18 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
  * line termination and do common appropriate things for the end of row.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 53ad3337f86..500ece7d5bb 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -56,6 +56,8 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 /*
  * API structure for a COPY FROM format implementation. Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
@@ -106,4 +108,8 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
+
+extern void CopyFromSkipErrorRow(CopyFromState cstate);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3a306e3286e..af425cf5fd9 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,9 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
 extern void ReceiveCopyBegin(CopyFromState cstate);
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 12c4a0f5979..14ee0f50588 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -78,6 +78,9 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/test/modules/test_copy_format/expected/no_schema.out
b/src/test/modules/test_copy_format/expected/no_schema.out
index d5903632b2e..05d160c1eae 100644
--- a/src/test/modules/test_copy_format/expected/no_schema.out
+++ b/src/test/modules/test_copy_format/expected/no_schema.out
@@ -1,6 +1,8 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=true
 NOTICE:  CopyFromInFunc: attribute: smallint
@@ -8,7 +10,50 @@ NOTICE:  CopyFromInFunc: attribute: integer
 NOTICE:  CopyFromInFunc: attribute: bigint
 NOTICE:  CopyFromStart: the number of attributes: 3
 NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  invalid value: "6"
+CONTEXT:  COPY test, line 2, column a: "6"
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
 NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  skipping row due to data type incompatibility at line 2 for column "a": "6"
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  too much lines: 3
+CONTEXT:  COPY test, line 3
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: attribute: smallint
@@ -18,6 +63,8 @@ NOTICE:  CopyToStart: the number of attributes: 3
 NOTICE:  CopyToOneRow: the number of valid values: 3
 NOTICE:  CopyToOneRow: the number of valid values: 3
 NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
 NOTICE:  CopyToEnd
 DROP TABLE public.test;
 DROP EXTENSION test_copy_format;
diff --git a/src/test/modules/test_copy_format/sql/no_schema.sql b/src/test/modules/test_copy_format/sql/no_schema.sql
index 1e049f799f0..1901c4a9f43 100644
--- a/src/test/modules/test_copy_format/sql/no_schema.sql
+++ b/src/test/modules/test_copy_format/sql/no_schema.sql
@@ -1,7 +1,31 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+321
 \.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 DROP TABLE public.test;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index 1d754201336..34ec693a7ec 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "commands/copyapi.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "utils/builtins.h"
 
@@ -35,8 +36,85 @@ TestCopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
 static bool
 TestCopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
 {
+    int            n_attributes = list_length(cstate->attnumlist);
+    char       *line;
+    int            line_size = n_attributes + 1;    /* +1 is for new line */
+    int            read_bytes;
+
     ereport(NOTICE, (errmsg("CopyFromOneRow")));
-    return false;
+
+    cstate->cur_lineno++;
+    line = palloc(line_size);
+    read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size);
+    if (read_bytes == 0)
+        return false;
+    if (read_bytes != line_size)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("one line must be %d bytes: %d",
+                        line_size, read_bytes)));
+
+    if (cstate->cur_lineno == 1)
+    {
+        /* Success */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        ListCell   *cur;
+        int            i = 0;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            int            m = attnum - 1;
+            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+            if (att->atttypid == INT2OID)
+            {
+                values[i] = Int16GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT4OID)
+            {
+                values[i] = Int32GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT8OID)
+            {
+                values[i] = Int64GetDatum(line[i] - '0');
+            }
+            nulls[i] = false;
+            i++;
+        }
+    }
+    else if (cstate->cur_lineno == 2)
+    {
+        /* Soft error */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        int            attnum = lfirst_int(list_head(cstate->attnumlist));
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+        char        value[2];
+
+        cstate->cur_attname = NameStr(att->attname);
+        value[0] = line[0];
+        value[1] = '\0';
+        cstate->cur_attval = value;
+        errsave((Node *) cstate->escontext,
+                (
+                 errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("invalid value: \"%c\"", line[0])));
+        CopyFromSkipErrorRow(cstate);
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+        return true;
+    }
+    else
+    {
+        /* Hard error */
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("too much lines: %llu",
+                        (unsigned long long) cstate->cur_lineno)));
+    }
+
+    return true;
 }
 
 static void
-- 
2.47.2

From 78c7ba67fb58ff44b492a207cf1833e296547175 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 27 Mar 2025 11:56:45 +0900
Subject: [PATCH v39 4/5] Use copy handlers for built-in formats

This adds copy handlers for text, csv and binary. We can simplify
Copy{To,From}GetRoutine() by this. We'll be able to remove
CopyFormatOptions::{binary,csv_mode} when we add more callbacks to
Copy{To,From}Routine and move format specific routines to
Copy{To,From}Routine::*.
---
 src/backend/commands/copy.c                   | 101 ++++++++++++------
 src/backend/commands/copyfrom.c               |  42 ++++----
 src/backend/commands/copyto.c                 |  42 ++++----
 src/include/catalog/pg_proc.dat               |  11 ++
 src/include/commands/copy.h                   |   2 +-
 src/include/commands/copyfrom_internal.h      |   6 +-
 src/include/commands/copyto_internal.h        |   6 +-
 .../test_copy_format/expected/builtin.out     |  34 ++++++
 src/test/modules/test_copy_format/meson.build |   1 +
 .../modules/test_copy_format/sql/builtin.sql  |  30 ++++++
 .../test_copy_format--1.0.sql                 |  15 +++
 11 files changed, 209 insertions(+), 81 deletions(-)
 create mode 100644 src/test/modules/test_copy_format/expected/builtin.out
 create mode 100644 src/test/modules/test_copy_format/sql/builtin.sql

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2539aee3a53..2bd8989b1ae 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -22,7 +22,9 @@
 #include "access/table.h"
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -521,43 +523,45 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
+            char       *format = defGetString(defel);
+            List       *qualified_format;
+            char       *schema;
+            char       *fmt;
+            Oid            arg_types[1];
+            Oid            handler = InvalidOid;
 
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
+
+            qualified_format = stringToQualifiedNameList(format, NULL);
+            DeconstructQualifiedName(qualified_format, &schema, &fmt);
+            if (!schema || strcmp(schema, "pg_catalog") == 0)
             {
-                List       *qualified_format;
-                Oid            arg_types[1];
-                Oid            handler = InvalidOid;
-
-                qualified_format = stringToQualifiedNameList(fmt, NULL);
-                arg_types[0] = INTERNALOID;
-                handler = LookupFuncName(qualified_format, 1,
-                                         arg_types, true);
-                if (!OidIsValid(handler))
-                    ereport(ERROR,
-                            (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                             errmsg("COPY format \"%s\" not recognized", fmt),
-                             parser_errposition(pstate, defel->location)));
-
-                /* check that handler has correct return type */
-                if (get_func_rettype(handler) != COPY_HANDLEROID)
-                    ereport(ERROR,
-                            (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-                             errmsg("function %s must return type %s",
-                                    fmt, "copy_handler"),
-                             parser_errposition(pstate, defel->location)));
-
-                opts_out->handler = handler;
+                if (strcmp(fmt, "csv") == 0)
+                    opts_out->csv_mode = true;
+                else if (strcmp(fmt, "binary") == 0)
+                    opts_out->binary = true;
             }
+
+            arg_types[0] = INTERNALOID;
+            handler = LookupFuncName(qualified_format, 1,
+                                     arg_types, true);
+            if (!OidIsValid(handler))
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                         errmsg("COPY format \"%s\" not recognized", format),
+                         parser_errposition(pstate, defel->location)));
+
+            /* check that handler has correct return type */
+            if (get_func_rettype(handler) != COPY_HANDLEROID)
+                ereport(ERROR,
+                        (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                         errmsg("function %s must return type %s",
+                                format, "copy_handler"),
+                         parser_errposition(pstate, defel->location)));
+
+            opts_out->handler = handler;
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
@@ -1040,3 +1044,36 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 
     return attnums;
 }
+
+Datum
+copy_text_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineText);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineText);
+}
+
+Datum
+copy_csv_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineCSV);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineCSV);
+}
+
+Datum
+copy_binary_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineBinary);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineBinary);
+}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 91f44193abf..7244eb6368a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -45,6 +45,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/portal.h"
@@ -128,7 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
  */
 
 /* text format */
-static const CopyFromRoutine CopyFromRoutineText = {
+const CopyFromRoutine CopyFromRoutineText = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
@@ -137,7 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 };
 
 /* CSV format */
-static const CopyFromRoutine CopyFromRoutineCSV = {
+const CopyFromRoutine CopyFromRoutineCSV = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
@@ -146,7 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 };
 
 /* binary format */
-static const CopyFromRoutine CopyFromRoutineBinary = {
+const CopyFromRoutine CopyFromRoutineBinary = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
@@ -158,28 +159,23 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (OidIsValid(opts->handler))
-    {
-        Datum        datum;
-        Node       *routine;
-
-        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
-        routine = (Node *) DatumGetPointer(datum);
-        if (routine == NULL || !IsA(routine, CopyFromRoutine))
-            ereport(ERROR,
-                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct",
-                            get_namespace_name(get_func_namespace(opts->handler)),
-                            get_func_name(opts->handler))));
-        return castNode(CopyFromRoutine, routine);
-    }
-    else if (opts->csv_mode)
-        return &CopyFromRoutineCSV;
-    else if (opts->binary)
-        return &CopyFromRoutineBinary;
+    Oid            handler = opts->handler;
+    Datum        datum;
+    Node       *routine;
 
     /* default is text */
-    return &CopyFromRoutineText;
+    if (!OidIsValid(handler))
+        handler = F_TEXT_INTERNAL;
+
+    datum = OidFunctionCall1(handler, BoolGetDatum(true));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyFromRoutine))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct",
+                        get_namespace_name(get_func_namespace(handler)),
+                        get_func_name(handler))));
+    return castNode(CopyFromRoutine, routine);
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 23cbdad184c..b244167a568 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -32,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -89,7 +90,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
  */
 
 /* text format */
-static const CopyToRoutine CopyToRoutineText = {
+const CopyToRoutine CopyToRoutineText = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
@@ -98,7 +99,7 @@ static const CopyToRoutine CopyToRoutineText = {
 };
 
 /* CSV format */
-static const CopyToRoutine CopyToRoutineCSV = {
+const CopyToRoutine CopyToRoutineCSV = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
@@ -107,7 +108,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 };
 
 /* binary format */
-static const CopyToRoutine CopyToRoutineBinary = {
+const CopyToRoutine CopyToRoutineBinary = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
@@ -119,28 +120,23 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (OidIsValid(opts->handler))
-    {
-        Datum        datum;
-        Node       *routine;
-
-        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
-        routine = (Node *) DatumGetPointer(datum);
-        if (routine == NULL || !IsA(routine, CopyToRoutine))
-            ereport(ERROR,
-                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("COPY handler function %s.%s did not return CopyToRoutine struct",
-                            get_namespace_name(get_func_namespace(opts->handler)),
-                            get_func_name(opts->handler))));
-        return castNode(CopyToRoutine, routine);
-    }
-    else if (opts->csv_mode)
-        return &CopyToRoutineCSV;
-    else if (opts->binary)
-        return &CopyToRoutineBinary;
+    Oid            handler = opts->handler;
+    Datum        datum;
+    Node       *routine;
 
     /* default is text */
-    return &CopyToRoutineText;
+    if (!OidIsValid(handler))
+        handler = F_TEXT_INTERNAL;
+
+    datum = OidFunctionCall1(handler, BoolGetDatum(false));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyToRoutine))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function %s.%s did not return CopyToRoutine struct",
+                        get_namespace_name(get_func_namespace(handler)),
+                        get_func_name(handler))));
+    return castNode(CopyToRoutine, routine);
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 79b5a36088c..09bacb80084 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12485,4 +12485,15 @@
   proargtypes => 'int4',
   prosrc => 'gist_stratnum_common' },
 
+# COPY handlers
+{ oid => '8100', descr => 'text COPY FORMAT handler',
+  proname => 'text', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_text_handler' },
+{ oid => '8101', descr => 'csv COPY FORMAT handler',
+  proname => 'csv', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_csv_handler' },
+{ oid => '8102', descr => 'binary COPY FORMAT handler',
+  proname => 'binary', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_binary_handler' },
+
 ]
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 6df1f8a3b9b..4525261fcc4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,7 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
-    Oid            handler;        /* handler function for custom format routine */
+    Oid            handler;        /* handler function */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to]_internal.h */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index af425cf5fd9..abeccf85c1c 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYFROM_INTERNAL_H
 #define COPYFROM_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -197,4 +197,8 @@ extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
 extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
                                  Datum *values, bool *nulls);
 
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineText;
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineCSV;
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineBinary;
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 14ee0f50588..1bd83fa7f14 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYTO_INTERNAL_H
 #define COPYTO_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
@@ -83,4 +83,8 @@ typedef struct CopyToStateData
     void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineText;
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineCSV;
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineBinary;
+
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/test/modules/test_copy_format/expected/builtin.out
b/src/test/modules/test_copy_format/expected/builtin.out
new file mode 100644
index 00000000000..11b1053c84e
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/builtin.out
@@ -0,0 +1,34 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- public.text must not be used
+COPY public.test FROM stdin WITH (FORMAT text);
+COPY public.test TO stdout WITH (FORMAT text);
+1    2    3
+12    34    56
+123    456    789
+COPY public.test FROM stdin WITH (FORMAT 'pg_catalog.text');
+COPY public.test TO stdout WITH (FORMAT 'pg_catalog.text');
+1    2    3
+12    34    56
+123    456    789
+-- public.csv must not be used
+COPY public.test FROM stdin WITH (FORMAT csv);
+COPY public.test TO stdout WITH (FORMAT csv);
+1,2,3
+12,34,56
+123,456,789
+COPY public.test FROM stdin WITH (FORMAT 'pg_catalog.csv');
+COPY public.test TO stdout WITH (FORMAT 'pg_catalog.csv');
+1,2,3
+12,34,56
+123,456,789
+-- public.binary must not be used
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/binary.data'
+COPY public.test TO :'filename' WITH (FORMAT binary);
+COPY public.test FROM :'filename' WITH (FORMAT binary);
+COPY public.test TO :'filename' WITH (FORMAT 'pg_catalog.binary');
+COPY public.test FROM :'filename' WITH (FORMAT 'pg_catalog.binary');
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
index 8010659585b..86ba610d4ff 100644
--- a/src/test/modules/test_copy_format/meson.build
+++ b/src/test/modules/test_copy_format/meson.build
@@ -27,6 +27,7 @@ tests += {
   'bd': meson.current_build_dir(),
   'regress': {
     'sql': [
+      'builtin',
       'invalid',
       'no_schema',
       'schema',
diff --git a/src/test/modules/test_copy_format/sql/builtin.sql b/src/test/modules/test_copy_format/sql/builtin.sql
new file mode 100644
index 00000000000..2d24069b538
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/builtin.sql
@@ -0,0 +1,30 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
+-- public.text must not be used
+COPY public.test FROM stdin WITH (FORMAT text);
+\.
+COPY public.test TO stdout WITH (FORMAT text);
+COPY public.test FROM stdin WITH (FORMAT 'pg_catalog.text');
+\.
+COPY public.test TO stdout WITH (FORMAT 'pg_catalog.text');
+
+-- public.csv must not be used
+COPY public.test FROM stdin WITH (FORMAT csv);
+\.
+COPY public.test TO stdout WITH (FORMAT csv);
+COPY public.test FROM stdin WITH (FORMAT 'pg_catalog.csv');
+\.
+COPY public.test TO stdout WITH (FORMAT 'pg_catalog.csv');
+
+-- public.binary must not be used
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/binary.data'
+COPY public.test TO :'filename' WITH (FORMAT binary);
+COPY public.test FROM :'filename' WITH (FORMAT binary);
+COPY public.test TO :'filename' WITH (FORMAT 'pg_catalog.binary');
+COPY public.test FROM :'filename' WITH (FORMAT 'pg_catalog.binary');
+
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
index c1a137181f8..bfa1900e828 100644
--- a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -22,3 +22,18 @@ CREATE FUNCTION test_copy_format_wrong_return_value(internal)
     RETURNS copy_handler
     AS 'MODULE_PATHNAME', 'test_copy_format_wrong_return_value'
     LANGUAGE C;
+
+CREATE FUNCTION text(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION csv(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION binary(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
-- 
2.47.2

From e35257a1f7b9ae8aec73c4cbf41428f7acc10823 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 19 Mar 2025 11:46:34 +0900
Subject: [PATCH v39 5/5] Add document how to write a COPY handler

This is WIP because we haven't decided our API yet.

Co-authored-by: David G. Johnston <david.g.johnston@gmail.com>
---
 doc/src/sgml/copy-handler.sgml | 394 +++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml     |   1 +
 doc/src/sgml/postgres.sgml     |   1 +
 src/include/commands/copyapi.h |   9 +-
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 doc/src/sgml/copy-handler.sgml

diff --git a/doc/src/sgml/copy-handler.sgml b/doc/src/sgml/copy-handler.sgml
new file mode 100644
index 00000000000..5bc87d16662
--- /dev/null
+++ b/doc/src/sgml/copy-handler.sgml
@@ -0,0 +1,394 @@
+<!-- doc/src/sgml/copy-handler.sgml -->
+
+<chapter id="copy-handler">
+ <title>Writing a Copy Handler</title>
+
+ <indexterm zone="copy-handler">
+  <primary><literal>COPY</literal> handler</primary>
+ </indexterm>
+
+ <para>
+  <productname>PostgreSQL</productname> supports
+  custom <link linkend="sql-copy"><literal>COPY</literal></link> handlers;
+  adding additional <replaceable>format_name</replaceable> options to
+  the <literal>FORMAT</literal> clause.
+ </para>
+
+ <para>
+  At the SQL level, a copy handler method is represented by a single SQL
+  function (see <xref linkend="sql-createfunction"/>), typically implemented in
+  C, having the signature
+<synopsis>
+<replaceable>format_name</replaceable>(internal) RETURNS <literal>copy_handler</literal>
+</synopsis>
+  The function's name is then accepted as a
+  valid <replaceable>format_name</replaceable>. The return
+  pseudo-type <literal>copy_handler</literal> informs the system that this
+  function needs to be registered as a copy handler.
+  The <type>internal</type> argument is a dummy that prevents this function
+  from being called directly from an SQL command. As the handler
+  implementation must be server-lifetime immutable; this SQL function's
+  volatility should be marked immutable. The <literal>link_symbol</literal>
+  for this function is the name of the implementation function, described
+  next.
+ </para>
+
+ <para>
+  The implementation function signature expected for the function named
+  in the <literal>link_symbol</literal> is:
+<synopsis>
+Datum
+<replaceable>copy_format_handler</replaceable>(PG_FUNCTION_ARGS)
+</synopsis>
+  The convention for the name is to replace the word
+  <replaceable>format</replaceable> in the placeholder above with the value given
+  to <replaceable>format_name</replaceable> in the SQL function.
+  The first argument is a <type>boolean</type> that indicates whether the handler
+  must provide a pointer to its implementation for <literal>COPY FROM</literal>
+  (a <type>CopyFromRoutine *</type>). If <literal>false</literal>, the handler
+  must provide a pointer to its implementation of <literal>COPY TO</literal>
+  (a <type>CopyToRoutine *</type>). These structs are declared in
+  <filename>src/include/commands/copyapi.h</filename>.
+ </para>
+
+ <para>
+  The structs hold pointers to implementation functions for initializing,
+  starting, processing rows, and ending a copy operation. The specific
+  structures vary a bit between <literal>COPY FROM</literal> and
+  <literal>COPY TO</literal> so the next two sections describes each
+  in detail.
+ </para>
+
+ <sect1 id="copy-handler-from">
+  <title>Copy From Handler</title>
+
+  <para>
+   The opening to this chapter describes how the executor will call the main
+   handler function with, in this case,
+   a <type>boolean</type> <literal>true</literal>, and expect to receive a
+   <type>CopyFromRoutine *</type> <type>Datum</type>. This section describes
+   the components of the <type>CopyFromRoutine</type> struct.
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromInFunc(CopyFromState cstate,
+               Oid atttypid,
+               FmgrInfo *finfo,
+               Oid *typioparam);
+</programlisting>
+
+   This sets input function information for the
+   given <literal>atttypid</literal> attribute. This function is called once
+   at the beginning of <literal>COPY FROM</literal>. If
+   this <literal>COPY</literal> handler doesn't use any input functions, this
+   function doesn't need to do anything.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid atttypid</literal></term>
+     <listitem>
+      <para>
+       This is the OID of data type used by the relation's attribute.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>FmgrInfo *finfo</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to provide the catalog information of
+       the input function.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid *typioparam</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to define the OID of the type to
+       pass to the input function.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromStart(CopyFromState cstate,
+              TupleDesc tupDesc);
+</programlisting>
+
+   This starts a <literal>COPY FROM</literal>. This function is called once at
+   the beginning of <literal>COPY FROM</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleDesc tupDesc</literal></term>
+     <listitem>
+      <para>
+       This is the tuple descriptor of the relation where the data needs to be
+       copied. This can be used for any initialization steps required by a
+       format.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+bool
+CopyFromOneRow(CopyFromState cstate,
+               ExprContext *econtext,
+               Datum *values,
+               bool *nulls);
+</programlisting>
+
+   This reads one row from the source and fill <literal>values</literal>
+   and <literal>nulls</literal>. If there is one or more tuples to be read,
+   this must return <literal>true</literal>. If there are no more tuples to
+   read, this must return <literal>false</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>ExprContext *econtext</literal></term>
+     <listitem>
+      <para>
+       This is used to evaluate default expression for each column that is
+       either not read from the file or is using
+       the <literal>DEFAULT</literal> option of <literal>COPY
+       FROM</literal>. It is <literal>NULL</literal> if no default values are
+       used.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Datum *values</literal></term>
+     <listitem>
+      <para>
+       This is an output variable to store read tuples.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>bool *nulls</literal></term>
+     <listitem>
+      <para>
+       This is an output variable to store whether the read columns
+       are <literal>NULL</literal> or not.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromEnd(CopyFromState cstate);
+</programlisting>
+
+   This ends a <literal>COPY FROM</literal>. This function is called once at
+   the end of <literal>COPY FROM</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   TODO: Add CopyFromStateGetData() and CopyFromSkipErrowRow()?
+  </para>
+ </sect1>
+
+ <sect1 id="copy-handler-to">
+  <title>Copy To Handler</title>
+
+  <para>
+   The <literal>COPY</literal> handler function for <literal>COPY
+   TO</literal> returns a <type>CopyToRoutine</type> struct containing
+   pointers to the functions described below. All functions are required.
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToOutFunc(CopyToState cstate,
+              Oid atttypid,
+              FmgrInfo *finfo);
+</programlisting>
+
+   This sets output function information for the
+   given <literal>atttypid</literal> attribute. This function is called once
+   at the beginning of <literal>COPY TO</literal>. If
+   this <literal>COPY</literal> handler doesn't use any output functions, this
+   function doesn't need to do anything.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid atttypid</literal></term>
+     <listitem>
+      <para>
+       This is the OID of data type used by the relation's attribute.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>FmgrInfo *finfo</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to provide the catalog information of
+       the output function.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToStart(CopyToState cstate,
+            TupleDesc tupDesc);
+</programlisting>
+
+   This starts a <literal>COPY TO</literal>. This function is called once at
+   the beginning of <literal>COPY TO</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleDesc tupDesc</literal></term>
+     <listitem>
+      <para>
+       This is the tuple descriptor of the relation where the data is read.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+bool
+CopyToOneRow(CopyToState cstate,
+             TupleTableSlot *slot);
+</programlisting>
+
+   This writes one row stored in <literal>slot</literal> to the destination.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleTableSlot *slot</literal></term>
+     <listitem>
+      <para>
+       This is used to get row to be written.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToEnd(CopyToState cstate);
+</programlisting>
+
+   This ends a <literal>COPY TO</literal>. This function is called once at
+   the end of <literal>COPY TO</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   TODO: Add CopyToStateFlush()?
+  </para>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 25fb99cee69..1fd6d32d5ec 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -107,6 +107,7 @@
 <!ENTITY storage    SYSTEM "storage.sgml">
 <!ENTITY transaction     SYSTEM "xact.sgml">
 <!ENTITY tablesample-method SYSTEM "tablesample-method.sgml">
+<!ENTITY copy-handler SYSTEM "copy-handler.sgml">
 <!ENTITY wal-for-extensions SYSTEM "wal-for-extensions.sgml">
 <!ENTITY generic-wal SYSTEM "generic-wal.sgml">
 <!ENTITY custom-rmgr SYSTEM "custom-rmgr.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index af476c82fcc..8ba319ae2df 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -254,6 +254,7 @@ break is not needed in a wider output rendering.
   &plhandler;
   &fdwhandler;
   &tablesample-method;
+  ©-handler;
   &custom-scan;
   &geqo;
   &tableam;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 500ece7d5bb..24710cb667a 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -28,10 +28,10 @@ typedef struct CopyToRoutine
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
      *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     *
      * 'finfo' can be optionally filled to provide the catalog information of
      * the output function.
-     *
-     * 'atttypid' is the OID of data type used by the relation's attribute.
      */
     void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
                                   FmgrInfo *finfo);
@@ -70,12 +70,13 @@ typedef struct CopyFromRoutine
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
      *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     *
      * 'finfo' can be optionally filled to provide the catalog information of
      * the input function.
      *
      * 'typioparam' can be optionally filled to define the OID of the type to
-     * pass to the input function.'atttypid' is the OID of data type used by
-     * the relation's attribute.
+     * pass to the input function.
      */
     void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
                                    FmgrInfo *finfo, Oid *typioparam);
-- 
2.47.2


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Add Postgres module info
Next
From: wenhui qiu
Date:
Subject: Re: GSoC 2025 - Looking for Beginner-Friendly PostgreSQL Project