Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Sutou Kouhei |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | 20250319.115617.1633674031127518041.kou@clear-code.com Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
Re: Make COPY format extendable: Extract COPY TO format implementations |
List | pgsql-hackers |
Hi, In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Mar 2025 13:50:03 -0700, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I think that built-in formats also need to have their handler > functions. This seems to be a conventional way for customizable > features such as tablesample and access methods, and we can simplify > this function. OK. 0008 in the attached v37 patch set does it. > I think we need to update the documentation to describe how users can > define the handler functions and what each callback function is > responsible for. I agree with it but we haven't finalized public APIs yet. Can we defer it after we finalize public APIs? (Proposed public APIs exist in 0003, 0006 and 0007.) And could someone help (take over if possible) writing a document for this feature? I'm not good at writing a document in English... 0009 in the attached v37 patch set has a draft of it. It's based on existing documents in doc/src/sgml/ and *.h. 0001-0007 aren't changed from v36 patch set. Thanks, -- kou From bd21411860e1ac8f751b77658bc7f1978a110d5f Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 12:19:15 +0900 Subject: [PATCH v37 1/9] Add support for adding custom COPY TO format This uses the handler approach like tablesample. The approach creates an internal function that returns an internal struct. In this case, a COPY TO handler returns a CopyToRoutine. This also add a test module for custom COPY TO handler. --- src/backend/commands/copy.c | 79 ++++++++++++++++--- src/backend/commands/copyto.c | 28 ++++++- src/backend/nodes/Makefile | 1 + src/backend/nodes/gen_node_support.pl | 2 + src/backend/utils/adt/pseudotypes.c | 1 + src/include/catalog/pg_proc.dat | 6 ++ src/include/catalog/pg_type.dat | 6 ++ src/include/commands/copy.h | 1 + src/include/commands/copyapi.h | 2 + src/include/nodes/meson.build | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + src/test/modules/test_copy_format/.gitignore | 4 + src/test/modules/test_copy_format/Makefile | 23 ++++++ .../expected/test_copy_format.out | 17 ++++ src/test/modules/test_copy_format/meson.build | 33 ++++++++ .../test_copy_format/sql/test_copy_format.sql | 5 ++ .../test_copy_format--1.0.sql | 8 ++ .../test_copy_format/test_copy_format.c | 63 +++++++++++++++ .../test_copy_format/test_copy_format.control | 4 + 20 files changed, 269 insertions(+), 17 deletions(-) mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl create mode 100644 src/test/modules/test_copy_format/.gitignore create mode 100644 src/test/modules/test_copy_format/Makefile create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out create mode 100644 src/test/modules/test_copy_format/meson.build create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format.c create mode 100644 src/test/modules/test_copy_format/test_copy_format.control diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfca9d9dc29..8d94bc313eb 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -32,6 +32,7 @@ #include "parser/parse_coerce.h" #include "parser/parse_collate.h" #include "parser/parse_expr.h" +#include "parser/parse_func.h" #include "parser/parse_relation.h" #include "utils/acl.h" #include "utils/builtins.h" @@ -476,6 +477,70 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) return COPY_LOG_VERBOSITY_DEFAULT; /* keep compiler quiet */ } +/* + * Process the "format" option. + * + * This function checks whether the option value is a built-in format such as + * "text" and "csv" or not. If the option value isn't a built-in format, this + * function finds a COPY format handler that returns a CopyToRoutine (for + * is_from == false). If no COPY format handler is found, this function + * reports an error. + */ +static void +ProcessCopyOptionFormat(ParseState *pstate, + CopyFormatOptions *opts_out, + bool is_from, + DefElem *defel) +{ + char *format; + Oid funcargtypes[1]; + Oid handlerOid = InvalidOid; + + format = defGetString(defel); + + opts_out->csv_mode = false; + opts_out->binary = false; + /* built-in formats */ + if (strcmp(format, "text") == 0) + { + /* "csv_mode == false && binary == false" means "text" */ + return; + } + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + return; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + return; + } + + /* custom format */ + if (!is_from) + { + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); + } + if (!OidIsValid(handlerOid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + /* check that handler has correct return type */ + if (get_func_rettype(handlerOid) != COPY_HANDLEROID) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("function %s must return type %s", + format, "copy_handler"), + parser_errposition(pstate, defel->location))); + + opts_out->handler = handlerOid; +} + /* * Process the statement option list for COPY. * @@ -519,22 +584,10 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); - if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; - else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionFormat(pstate, opts_out, is_from, defel); } else if (strcmp(defel->defname, "freeze") == 0) { diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 84a3f3879a8..fce8501dc30 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -150,6 +150,7 @@ static void CopySendInt16(CopyToState cstate, int16 val); /* text format */ static const CopyToRoutine CopyToRoutineText = { + .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToTextOneRow, @@ -158,6 +159,7 @@ static const CopyToRoutine CopyToRoutineText = { /* CSV format */ static const CopyToRoutine CopyToRoutineCSV = { + .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToCSVOneRow, @@ -166,6 +168,7 @@ static const CopyToRoutine CopyToRoutineCSV = { /* binary format */ static const CopyToRoutine CopyToRoutineBinary = { + .type = T_CopyToRoutine, .CopyToStart = CopyToBinaryStart, .CopyToOutFunc = CopyToBinaryOutFunc, .CopyToOneRow = CopyToBinaryOneRow, @@ -176,13 +179,30 @@ static const CopyToRoutine CopyToRoutineBinary = { static const CopyToRoutine * CopyToGetRoutine(const CopyFormatOptions *opts) { - if (opts->csv_mode) + if (OidIsValid(opts->handler)) + { + Datum datum; + Node *routine; + + datum = OidFunctionCall1(opts->handler, BoolGetDatum(false)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%u did not return " + "CopyToRoutine struct", + opts->handler))); + return castNode(CopyToRoutine, routine); + } + else if (opts->csv_mode) return &CopyToRoutineCSV; else if (opts->binary) return &CopyToRoutineBinary; - - /* default is text */ - return &CopyToRoutineText; + else + return &CopyToRoutineText; } /* Implementation of the start callback for text and CSV formats */ diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 77ddb9ca53f..dc6c1087361 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -50,6 +50,7 @@ node_headers = \ access/sdir.h \ access/tableam.h \ access/tsmapi.h \ + commands/copyapi.h \ commands/event_trigger.h \ commands/trigger.h \ executor/tuptable.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl old mode 100644 new mode 100755 index 1a657f7e0ae..fb90635a245 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -62,6 +62,7 @@ my @all_input_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h @@ -86,6 +87,7 @@ my @nodetag_only_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index 317a1f2b282..f2ebc21ca56 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler); +PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(internal); PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 890822eaf79..7c2a510fa3f 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7838,6 +7838,12 @@ { oid => '3312', descr => 'I/O', proname => 'tsm_handler_out', prorettype => 'cstring', proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' }, +{ oid => '8753', descr => 'I/O', + proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler', + proargtypes => 'cstring', prosrc => 'copy_handler_in' }, +{ oid => '8754', descr => 'I/O', + proname => 'copy_handler_out', prorettype => 'cstring', + proargtypes => 'copy_handler', prosrc => 'copy_handler_out' }, { oid => '267', descr => 'I/O', proname => 'table_am_handler_in', proisstrict => 'f', prorettype => 'table_am_handler', proargtypes => 'cstring', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 6dca77e0a22..340e0cd0a8d 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -633,6 +633,12 @@ typcategory => 'P', typinput => 'tsm_handler_in', typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, +{ oid => '8752', + descr => 'pseudo-type for the result of a copy to method function', + typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', + typcategory => 'P', typinput => 'copy_handler_in', + typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', + typalign => 'i' }, { oid => '269', descr => 'pseudo-type for the result of a table AM handler function', typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p', diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef721..332628d67cc 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,6 +87,7 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ + Oid handler; /* handler function for custom format routine */ } CopyFormatOptions; /* These are private in commands/copy[from|to].c */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 2a2d2f9876b..4f4ffabf882 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -22,6 +22,8 @@ */ typedef struct CopyToRoutine { + NodeTag type; + /* * Set output function information. This callback is called once at the * beginning of COPY TO. diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build index d1ca24dd32f..96e70e7f38b 100644 --- a/src/include/nodes/meson.build +++ b/src/include/nodes/meson.build @@ -12,6 +12,7 @@ node_support_input_i = [ 'access/sdir.h', 'access/tableam.h', 'access/tsmapi.h', + 'commands/copyapi.h', 'commands/event_trigger.h', 'commands/trigger.h', 'executor/tuptable.h', diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index 4e4be3fa511..c9da440eed0 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -16,6 +16,7 @@ SUBDIRS = \ spgist_name_ops \ test_bloomfilter \ test_copy_callbacks \ + test_copy_format \ test_custom_rmgrs \ test_ddl_deparse \ test_dsa \ diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index 2b057451473..d33bbbd4092 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -15,6 +15,7 @@ subdir('spgist_name_ops') subdir('ssl_passphrase_callback') subdir('test_bloomfilter') subdir('test_copy_callbacks') +subdir('test_copy_format') subdir('test_custom_rmgrs') subdir('test_ddl_deparse') subdir('test_dsa') diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore new file mode 100644 index 00000000000..5dcb3ff9723 --- /dev/null +++ b/src/test/modules/test_copy_format/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile new file mode 100644 index 00000000000..8497f91624d --- /dev/null +++ b/src/test/modules/test_copy_format/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_copy_format/Makefile + +MODULE_big = test_copy_format +OBJS = \ + $(WIN32RES) \ + test_copy_format.o +PGFILEDESC = "test_copy_format - test custom COPY FORMAT" + +EXTENSION = test_copy_format +DATA = test_copy_format--1.0.sql + +REGRESS = test_copy_format + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_copy_format +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out new file mode 100644 index 00000000000..adfe7d1572a --- /dev/null +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -0,0 +1,17 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +ERROR: COPY format "test_copy_format" not recognized +LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')... + ^ +COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToOutFunc: atttypid=21 +NOTICE: CopyToOutFunc: atttypid=23 +NOTICE: CopyToOutFunc: atttypid=20 +NOTICE: CopyToStart: natts=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build new file mode 100644 index 00000000000..a45a2e0a039 --- /dev/null +++ b/src/test/modules/test_copy_format/meson.build @@ -0,0 +1,33 @@ +# Copyright (c) 2025, PostgreSQL Global Development Group + +test_copy_format_sources = files( + 'test_copy_format.c', +) + +if host_system == 'windows' + test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_copy_format', + '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',]) +endif + +test_copy_format = shared_module('test_copy_format', + test_copy_format_sources, + kwargs: pg_test_mod_args, +) +test_install_libs += test_copy_format + +test_install_data += files( + 'test_copy_format.control', + 'test_copy_format--1.0.sql', +) + +tests += { + 'name': 'test_copy_format', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'regress': { + 'sql': [ + 'test_copy_format', + ], + }, +} diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql new file mode 100644 index 00000000000..810b3d8cedc --- /dev/null +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -0,0 +1,5 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql new file mode 100644 index 00000000000..d24ea03ce99 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql @@ -0,0 +1,8 @@ +/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit + +CREATE FUNCTION test_copy_format(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME' LANGUAGE C; diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c new file mode 100644 index 00000000000..b42d472d851 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -0,0 +1,63 @@ +/*-------------------------------------------------------------------------- + * + * test_copy_format.c + * Code for testing custom COPY format. + * + * Portions Copyright (c) 2025, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/test_copy_format/test_copy_format.c + * + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "commands/copyapi.h" +#include "commands/defrem.h" + +PG_MODULE_MAGIC; + +static void +CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid))); +} + +static void +CopyToStart(CopyToState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts))); +} + +static void +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid))); +} + +static void +CopyToEnd(CopyToState cstate) +{ + ereport(NOTICE, (errmsg("CopyToEnd"))); +} + +static const CopyToRoutine CopyToRoutineTestCopyFormat = { + .type = T_CopyToRoutine, + .CopyToOutFunc = CopyToOutFunc, + .CopyToStart = CopyToStart, + .CopyToOneRow = CopyToOneRow, + .CopyToEnd = CopyToEnd, +}; + +PG_FUNCTION_INFO_V1(test_copy_format); +Datum +test_copy_format(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + ereport(NOTICE, + (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); + + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); +} diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control new file mode 100644 index 00000000000..f05a6362358 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.control @@ -0,0 +1,4 @@ +comment = 'Test code for custom COPY format' +default_version = '1.0' +module_pathname = '$libdir/test_copy_format' +relocatable = true -- 2.47.2 From e59ca937314580133c4590350d9f9d67bb417b2e Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 13:58:33 +0900 Subject: [PATCH v37 2/9] Export CopyToStateData as private data It's for custom COPY TO format handlers implemented as extension. This just moves codes. This doesn't change codes except CopyDest enum values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to COPY_DEST_FILE. Note that this isn't enough to implement custom COPY TO format handlers as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY TO format handler 2. Export CopySendEndOfRow() to flush buffer --- src/backend/commands/copyto.c | 78 +++--------------------- src/include/commands/copy.h | 2 +- src/include/commands/copyto_internal.h | 83 ++++++++++++++++++++++++++ 3 files changed, 93 insertions(+), 70 deletions(-) create mode 100644 src/include/commands/copyto_internal.h diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index fce8501dc30..99c2f2dd699 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -20,6 +20,7 @@ #include "access/tableam.h" #include "commands/copyapi.h" +#include "commands/copyto_internal.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -36,67 +37,6 @@ #include "utils/rel.h" #include "utils/snapmgr.h" -/* - * Represents the different dest cases we need to worry about at - * the bottom level - */ -typedef enum CopyDest -{ - COPY_FILE, /* to file (or a piped program) */ - COPY_FRONTEND, /* to frontend */ - COPY_CALLBACK, /* to callback function */ -} CopyDest; - -/* - * This struct contains all the state variables used throughout a COPY TO - * operation. - * - * Multi-byte encodings: all supported client-side encodings encode multi-byte - * characters by having the first byte's high bit set. Subsequent bytes of the - * character can have the high bit not set. When scanning data in such an - * encoding to look for a match to a single-byte (ie ASCII) character, we must - * use the full pg_encoding_mblen() machinery to skip over multibyte - * characters, else we might find a false match to a trailing byte. In - * supported server encodings, there is no possibility of a false match, and - * it's faster to make useless comparisons to trailing bytes than it is to - * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true - * when we have to do it the hard way. - */ -typedef struct CopyToStateData -{ - /* format-specific routines */ - const CopyToRoutine *routine; - - /* low-level state data */ - CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_FILE */ - StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - FmgrInfo *out_functions; /* lookup info for output functions */ - MemoryContext rowcontext; /* per-row evaluation context */ - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; - /* DestReceiver for COPY (query) TO */ typedef struct { @@ -421,7 +361,7 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate->copy_dest = COPY_DEST_FRONTEND; } static void @@ -468,7 +408,7 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -502,11 +442,11 @@ CopySendEndOfRow(CopyToState cstate) errmsg("could not write to COPY file: %m"))); } break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; - case COPY_CALLBACK: + case COPY_DEST_CALLBACK: cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len); break; } @@ -527,7 +467,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) { switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: /* Default line termination depends on platform */ #ifndef WIN32 CopySendChar(cstate, '\n'); @@ -535,7 +475,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) CopySendString(cstate, "\r\n"); #endif break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* The FE/BE protocol uses \n as newline for all platforms */ CopySendChar(cstate, '\n'); break; @@ -920,12 +860,12 @@ BeginCopyTo(ParseState *pstate, /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); - cstate->copy_dest = COPY_FILE; /* default */ + cstate->copy_dest = COPY_DEST_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate->copy_dest = COPY_DEST_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 332628d67cc..6df1f8a3b9b 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -90,7 +90,7 @@ typedef struct CopyFormatOptions Oid handler; /* handler function for custom format routine */ } CopyFormatOptions; -/* These are private in commands/copy[from|to].c */ +/* These are private in commands/copy[from|to]_internal.h */ typedef struct CopyFromStateData *CopyFromState; typedef struct CopyToStateData *CopyToState; diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h new file mode 100644 index 00000000000..1b58b36c0a3 --- /dev/null +++ b/src/include/commands/copyto_internal.h @@ -0,0 +1,83 @@ +/*------------------------------------------------------------------------- + * + * copyto_internal.h + * Internal definitions for COPY TO command. + * + * + * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyto_internal.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYTO_INTERNAL_H +#define COPYTO_INTERNAL_H + +#include "commands/copy.h" +#include "executor/execdesc.h" +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* + * Represents the different dest cases we need to worry about at + * the bottom level + */ +typedef enum CopyDest +{ + COPY_DEST_FILE, /* to file (or a piped program) */ + COPY_DEST_FRONTEND, /* to frontend */ + COPY_DEST_CALLBACK, /* to callback function */ +} CopyDest; + +/* + * This struct contains all the state variables used throughout a COPY TO + * operation. + * + * Multi-byte encodings: all supported client-side encodings encode multi-byte + * characters by having the first byte's high bit set. Subsequent bytes of the + * character can have the high bit not set. When scanning data in such an + * encoding to look for a match to a single-byte (ie ASCII) character, we must + * use the full pg_encoding_mblen() machinery to skip over multibyte + * characters, else we might find a false match to a trailing byte. In + * supported server encodings, there is no possibility of a false match, and + * it's faster to make useless comparisons to trailing bytes than it is to + * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true + * when we have to do it the hard way. + */ +typedef struct CopyToStateData +{ + /* format-specific routines */ + const struct CopyToRoutine *routine; + + /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + FILE *copy_file; /* used if copy_dest == COPY_FILE */ + StringInfo fe_msgbuf; /* used for all dests during COPY TO */ + + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + QueryDesc *queryDesc; /* executable query to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + + CopyFormatOptions opts; + Node *whereClause; /* WHERE condition (or NULL) */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + FmgrInfo *out_functions; /* lookup info for output functions */ + MemoryContext rowcontext; /* per-row evaluation context */ + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyToStateData; + +#endif /* COPYTO_INTERNAL_H */ -- 2.47.2 From 40bade7212e8894220d7901fe9bb2a2eb07572cd Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:01:18 +0900 Subject: [PATCH v37 3/9] Add support for implementing custom COPY TO format as extension * Add CopyToStateData::opaque that can be used to keep data for custom COPY TO format implementation * Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf as CopyToStateFlush() --- src/backend/commands/copyto.c | 12 ++++++++++++ src/include/commands/copyapi.h | 2 ++ src/include/commands/copyto_internal.h | 3 +++ 3 files changed, 17 insertions(+) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 99c2f2dd699..f5ed3efbace 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -458,6 +458,18 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * Export CopySendEndOfRow() for extensions. We want to keep + * CopySendEndOfRow() as a static function for + * optimization. CopySendEndOfRow() calls in this file may be optimized by a + * compiler. + */ +void +CopyToStateFlush(CopyToState cstate) +{ + CopySendEndOfRow(cstate); +} + /* * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the * line termination and do common appropriate things for the end of row. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 4f4ffabf882..5c5ea6592e3 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -56,6 +56,8 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +extern void CopyToStateFlush(CopyToState cstate); + /* * API structure for a COPY FROM format implementation. Note this must be * allocated in a server-lifetime manner, typically as a static const struct. diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h index 1b58b36c0a3..ce1c33a4004 100644 --- a/src/include/commands/copyto_internal.h +++ b/src/include/commands/copyto_internal.h @@ -78,6 +78,9 @@ typedef struct CopyToStateData FmgrInfo *out_functions; /* lookup info for output functions */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyToStateData; #endif /* COPYTO_INTERNAL_H */ -- 2.47.2 From 7025e1ca17b8c1693e94560c499b814b45a7cc45 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:11:55 +0900 Subject: [PATCH v37 4/9] Add support for adding custom COPY FROM format This uses the same handler for COPY TO and COPY FROM but uses different routine. This uses CopyToRoutine for COPY TO and CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM and false for COPY TO: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine This also add a test module for custom COPY FROM handler. --- src/backend/commands/copy.c | 13 +++---- src/backend/commands/copyfrom.c | 28 +++++++++++-- src/include/catalog/pg_type.dat | 2 +- src/include/commands/copyapi.h | 2 + .../expected/test_copy_format.out | 10 +++-- .../test_copy_format/sql/test_copy_format.sql | 1 + .../test_copy_format/test_copy_format.c | 39 ++++++++++++++++++- 7 files changed, 78 insertions(+), 17 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 8d94bc313eb..b4417bb6819 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) * This function checks whether the option value is a built-in format such as * "text" and "csv" or not. If the option value isn't a built-in format, this * function finds a COPY format handler that returns a CopyToRoutine (for - * is_from == false). If no COPY format handler is found, this function - * reports an error. + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY + * format handler is found, this function reports an error. */ static void ProcessCopyOptionFormat(ParseState *pstate, @@ -518,12 +518,9 @@ ProcessCopyOptionFormat(ParseState *pstate, } /* custom format */ - if (!is_from) - { - funcargtypes[0] = INTERNALOID; - handlerOid = LookupFuncName(list_make1(makeString(format)), 1, - funcargtypes, true); - } + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); if (!OidIsValid(handlerOid)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index bcf66f0adf8..0809766f910 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate); /* text format */ static const CopyFromRoutine CopyFromRoutineText = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, .CopyFromOneRow = CopyFromTextOneRow, @@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = { /* CSV format */ static const CopyFromRoutine CopyFromRoutineCSV = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, .CopyFromOneRow = CopyFromCSVOneRow, @@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = { /* binary format */ static const CopyFromRoutine CopyFromRoutineBinary = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromBinaryInFunc, .CopyFromStart = CopyFromBinaryStart, .CopyFromOneRow = CopyFromBinaryOneRow, @@ -155,13 +158,30 @@ static const CopyFromRoutine CopyFromRoutineBinary = { static const CopyFromRoutine * CopyFromGetRoutine(const CopyFormatOptions *opts) { - if (opts->csv_mode) + if (OidIsValid(opts->handler)) + { + Datum datum; + Node *routine; + + datum = OidFunctionCall1(opts->handler, BoolGetDatum(true)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%u did not return " + "CopyFromRoutine struct", + opts->handler))); + return castNode(CopyFromRoutine, routine); + } + else if (opts->csv_mode) return &CopyFromRoutineCSV; else if (opts->binary) return &CopyFromRoutineBinary; - - /* default is text */ - return &CopyFromRoutineText; + else + return &CopyFromRoutineText; } /* Implementation of the start callback for text and CSV formats */ diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 340e0cd0a8d..63b7d65f982 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -634,7 +634,7 @@ typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, { oid => '8752', - descr => 'pseudo-type for the result of a copy to method function', + descr => 'pseudo-type for the result of a copy to/from method function', typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', typcategory => 'P', typinput => 'copy_handler_in', typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 5c5ea6592e3..895c105d8d8 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -64,6 +64,8 @@ extern void CopyToStateFlush(CopyToState cstate); */ typedef struct CopyFromRoutine { + NodeTag type; + /* * Set input function information. This callback is called once at the * beginning of COPY FROM. diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index adfe7d1572a..016893e7026 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); -ERROR: COPY format "test_copy_format" not recognized -LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')... - ^ +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 810b3d8cedc..0dfdfa00080 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +\. COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index b42d472d851..abafc668463 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -18,6 +18,40 @@ PG_MODULE_MAGIC; +static void +CopyFromInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid))); +} + +static void +CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts))); +} + +static bool +CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + ereport(NOTICE, (errmsg("CopyFromOneRow"))); + return false; +} + +static void +CopyFromEnd(CopyFromState cstate) +{ + ereport(NOTICE, (errmsg("CopyFromEnd"))); +} + +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { + .type = T_CopyFromRoutine, + .CopyFromInFunc = CopyFromInFunc, + .CopyFromStart = CopyFromStart, + .CopyFromOneRow = CopyFromOneRow, + .CopyFromEnd = CopyFromEnd, +}; + static void CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) { @@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS) ereport(NOTICE, (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); - PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat); + else + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); } -- 2.47.2 From 82e7ebbdd3323b05706cce573d8d293c3b559f2c Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:19:34 +0900 Subject: [PATCH v37 5/9] Use COPY_SOURCE_ prefix for CopySource enum values This is for consistency with CopyDest. --- src/backend/commands/copyfrom.c | 4 ++-- src/backend/commands/copyfromparse.c | 10 +++++----- src/include/commands/copyfrom_internal.h | 6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 0809766f910..76662e04260 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1729,7 +1729,7 @@ BeginCopyFrom(ParseState *pstate, pg_encoding_to_char(GetDatabaseEncoding())))); } - cstate->copy_src = COPY_FILE; /* default */ + cstate->copy_src = COPY_SOURCE_FILE; /* default */ cstate->whereClause = whereClause; @@ -1857,7 +1857,7 @@ BeginCopyFrom(ParseState *pstate, if (data_source_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_src = COPY_CALLBACK; + cstate->copy_src = COPY_SOURCE_CALLBACK; cstate->data_source_cb = data_source_cb; } else if (pipe) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index e8128f85e6b..17e51f02e04 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_src = COPY_FRONTEND; + cstate->copy_src = COPY_SOURCE_FRONTEND; cstate->fe_msgbuf = makeStringInfo(); /* We *must* flush here to ensure FE knows it can send. */ pq_flush(); @@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) switch (cstate->copy_src) { - case COPY_FILE: + case COPY_SOURCE_FILE: bytesread = fread(databuf, 1, maxread, cstate->copy_file); if (ferror(cstate->copy_file)) ereport(ERROR, @@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) if (bytesread == 0) cstate->raw_reached_eof = true; break; - case COPY_FRONTEND: + case COPY_SOURCE_FRONTEND: while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof) { int avail; @@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) bytesread += avail; } break; - case COPY_CALLBACK: + case COPY_SOURCE_CALLBACK: bytesread = cstate->data_source_cb(databuf, minread, maxread); break; } @@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) * after \. up to the protocol end of copy data. (XXX maybe better * not to treat \. as special?) */ - if (cstate->copy_src == COPY_FRONTEND) + if (cstate->copy_src == COPY_SOURCE_FRONTEND) { int inbytes; diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index c8b22af22d8..3a306e3286e 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -24,9 +24,9 @@ */ typedef enum CopySource { - COPY_FILE, /* from file (or a piped program) */ - COPY_FRONTEND, /* from frontend */ - COPY_CALLBACK, /* from callback function */ + COPY_SOURCE_FILE, /* from file (or a piped program) */ + COPY_SOURCE_FRONTEND, /* from frontend */ + COPY_SOURCE_CALLBACK, /* from callback function */ } CopySource; /* -- 2.47.2 From 97a71244245787f895fcb4f05ee0b212e78c863c Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:21:39 +0900 Subject: [PATCH v37 6/9] Add support for implementing custom COPY FROM format as extension * Add CopyFromStateData::opaque that can be used to keep data for custom COPY From format implementation * Export CopyGetData() to get the next data as CopyFromStateGetData() --- src/backend/commands/copyfromparse.c | 11 +++++++++++ src/include/commands/copyapi.h | 2 ++ src/include/commands/copyfrom_internal.h | 3 +++ 3 files changed, 16 insertions(+) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 17e51f02e04..d8fd238e72b 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -739,6 +739,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) return copied_bytes; } +/* + * Export CopyGetData() for extensions. We want to keep CopyGetData() as a + * static function for optimization. CopyGetData() calls in this file may be + * optimized by a compiler. + */ +int +CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread) +{ + return CopyGetData(cstate, dest, minread, maxread); +} + /* * This function is exposed for use by extensions that read raw fields in the * next line. See NextCopyFromRawFieldsInternal() for details. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 895c105d8d8..2044d8b8c4c 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -108,4 +108,6 @@ typedef struct CopyFromRoutine void (*CopyFromEnd) (CopyFromState cstate); } CopyFromRoutine; +extern int CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread); + #endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 3a306e3286e..af425cf5fd9 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -181,6 +181,9 @@ typedef struct CopyFromStateData #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyFromStateData; extern void ReceiveCopyBegin(CopyFromState cstate); -- 2.47.2 From e6bf46e4006b25c0de110e14b8c2d57247f48b42 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Wed, 27 Nov 2024 16:23:55 +0900 Subject: [PATCH v37 7/9] Add CopyFromSkipErrorRow() for custom COPY format extension Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow callback reports an error by errsave(). CopyFromSkipErrorRow() handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases. --- src/backend/commands/copyfromparse.c | 82 +++++++++++-------- src/include/commands/copyapi.h | 2 + .../expected/test_copy_format.out | 47 +++++++++++ .../test_copy_format/sql/test_copy_format.sql | 24 ++++++ .../test_copy_format/test_copy_format.c | 80 +++++++++++++++++- 5 files changed, 198 insertions(+), 37 deletions(-) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index d8fd238e72b..2070f51a963 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -938,6 +938,51 @@ CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); } +/* + * Call this when you report an error by errsave() in your CopyFromOneRow + * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases + * for you. + */ +void +CopyFromSkipErrorRow(CopyFromState cstate) +{ + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below notice + * message, we suppress error context information other than the + * relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); + } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname)); + + /* reset relname_only */ + cstate->relname_only = false; + } +} + /* * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). * @@ -1044,42 +1089,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, (Node *) cstate->escontext, &values[m])) { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - - cstate->num_errors++; - - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information other - * than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; - - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; - } - + CopyFromSkipErrorRow(cstate); return true; } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 2044d8b8c4c..500ece7d5bb 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -110,4 +110,6 @@ typedef struct CopyFromRoutine extern int CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread); +extern void CopyFromSkipErrorRow(CopyFromState cstate); + #endif /* COPYAPI_H */ diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index 016893e7026..b9a6baa85c0 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -1,6 +1,8 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=true NOTICE: CopyFromInFunc: atttypid=21 @@ -8,7 +10,50 @@ NOTICE: CopyFromInFunc: atttypid=23 NOTICE: CopyFromInFunc: atttypid=20 NOTICE: CopyFromStart: natts=3 NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: invalid value: "6" +CONTEXT: COPY test, line 2, column a: "6" +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: skipping row due to data type incompatibility at line 2 for column "a": "6" +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility +NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: too much lines: 3 +CONTEXT: COPY test, line 3 COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 @@ -18,4 +63,6 @@ NOTICE: CopyToStart: natts=3 NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 0dfdfa00080..86db71bce7f 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -1,6 +1,30 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +321 \. COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index abafc668463..96a54dab7ec 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -14,6 +14,7 @@ #include "postgres.h" #include "commands/copyapi.h" +#include "commands/copyfrom_internal.h" #include "commands/defrem.h" PG_MODULE_MAGIC; @@ -34,8 +35,85 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) static bool CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) { + int n_attributes = list_length(cstate->attnumlist); + char *line; + int line_size = n_attributes + 1; /* +1 is for new line */ + int read_bytes; + ereport(NOTICE, (errmsg("CopyFromOneRow"))); - return false; + + cstate->cur_lineno++; + line = palloc(line_size); + read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size); + if (read_bytes == 0) + return false; + if (read_bytes != line_size) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("one line must be %d bytes: %d", + line_size, read_bytes))); + + if (cstate->cur_lineno == 1) + { + /* Success */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + ListCell *cur; + int i = 0; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (att->atttypid == INT2OID) + { + values[i] = Int16GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT4OID) + { + values[i] = Int32GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT8OID) + { + values[i] = Int64GetDatum(line[i] - '0'); + } + nulls[i] = false; + i++; + } + } + else if (cstate->cur_lineno == 2) + { + /* Soft error */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + int attnum = lfirst_int(list_head(cstate->attnumlist)); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + char value[2]; + + cstate->cur_attname = NameStr(att->attname); + value[0] = line[0]; + value[1] = '\0'; + cstate->cur_attval = value; + errsave((Node *) cstate->escontext, + ( + errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid value: \"%c\"", line[0]))); + CopyFromSkipErrorRow(cstate); + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + return true; + } + else + { + /* Hard error */ + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("too much lines: %llu", + (unsigned long long) cstate->cur_lineno))); + } + + return true; } static void -- 2.47.2 From 9a4dfa53cdd1683c9386644886654f3f7ee3e49a Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Tue, 18 Mar 2025 19:09:09 +0900 Subject: [PATCH v37 8/9] Use copy handlers for built-in formats This adds copy handlers for text, csv and binary. We can simplify Copy{To,From}GetRoutine() by this. We'll be able to remove CopyFormatOptions::{binary,csv_mode} when we add more callbacks to Copy{To,From}Routine and move format specific routines to Copy{To,From}Routine::*. --- src/backend/commands/copy.c | 48 ++++++++++++++++++------ src/backend/commands/copyfrom.c | 48 +++++++++++------------- src/backend/commands/copyto.c | 48 +++++++++++------------- src/include/catalog/pg_proc.dat | 11 ++++++ src/include/commands/copyfrom_internal.h | 6 ++- src/include/commands/copyto_internal.h | 6 ++- 6 files changed, 102 insertions(+), 65 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index b4417bb6819..24bd2547e3b 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -22,7 +22,9 @@ #include "access/table.h" #include "access/xact.h" #include "catalog/pg_authid.h" -#include "commands/copy.h" +#include "commands/copyapi.h" +#include "commands/copyto_internal.h" +#include "commands/copyfrom_internal.h" #include "commands/defrem.h" #include "executor/executor.h" #include "mb/pg_wchar.h" @@ -500,24 +502,15 @@ ProcessCopyOptionFormat(ParseState *pstate, opts_out->csv_mode = false; opts_out->binary = false; - /* built-in formats */ - if (strcmp(format, "text") == 0) - { - /* "csv_mode == false && binary == false" means "text" */ - return; - } - else if (strcmp(format, "csv") == 0) + if (strcmp(format, "csv") == 0) { opts_out->csv_mode = true; - return; } else if (strcmp(format, "binary") == 0) { opts_out->binary = true; - return; } - /* custom format */ funcargtypes[0] = INTERNALOID; handlerOid = LookupFuncName(list_make1(makeString(format)), 1, funcargtypes, true); @@ -1067,3 +1060,36 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist) return attnums; } + +Datum +copy_text_handler(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineText); + else + PG_RETURN_POINTER(&CopyToRoutineText); +} + +Datum +copy_csv_handler(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineCSV); + else + PG_RETURN_POINTER(&CopyToRoutineCSV); +} + +Datum +copy_binary_handler(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineBinary); + else + PG_RETURN_POINTER(&CopyToRoutineBinary); +} diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 76662e04260..2677f2ac1bc 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -45,6 +45,7 @@ #include "rewrite/rewriteHandler.h" #include "storage/fd.h" #include "tcop/tcopprot.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/portal.h" @@ -128,7 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate); */ /* text format */ -static const CopyFromRoutine CopyFromRoutineText = { +const CopyFromRoutine CopyFromRoutineText = { .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, @@ -137,7 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = { }; /* CSV format */ -static const CopyFromRoutine CopyFromRoutineCSV = { +const CopyFromRoutine CopyFromRoutineCSV = { .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, @@ -146,7 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = { }; /* binary format */ -static const CopyFromRoutine CopyFromRoutineBinary = { +const CopyFromRoutine CopyFromRoutineBinary = { .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromBinaryInFunc, .CopyFromStart = CopyFromBinaryStart, @@ -158,30 +159,25 @@ static const CopyFromRoutine CopyFromRoutineBinary = { static const CopyFromRoutine * CopyFromGetRoutine(const CopyFormatOptions *opts) { - if (OidIsValid(opts->handler)) - { - Datum datum; - Node *routine; + Oid handler = opts->handler; + Datum datum; + Node *routine; - datum = OidFunctionCall1(opts->handler, BoolGetDatum(true)); - routine = (Node *) DatumGetPointer(datum); - if (routine == NULL || !IsA(routine, CopyFromRoutine)) - ereport( - ERROR, - (errcode( - ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY handler function " - "%u did not return " - "CopyFromRoutine struct", - opts->handler))); - return castNode(CopyFromRoutine, routine); - } - else if (opts->csv_mode) - return &CopyFromRoutineCSV; - else if (opts->binary) - return &CopyFromRoutineBinary; - else - return &CopyFromRoutineText; + if (!OidIsValid(handler)) + handler = F_TEXT_INTERNAL; + + datum = OidFunctionCall1(handler, BoolGetDatum(true)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%u did not return " + "CopyFromRoutine struct", + opts->handler))); + return castNode(CopyFromRoutine, routine); } /* Implementation of the start callback for text and CSV formats */ diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index f5ed3efbace..757d24736e3 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -32,6 +32,7 @@ #include "pgstat.h" #include "storage/fd.h" #include "tcop/tcopprot.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/rel.h" @@ -89,7 +90,7 @@ static void CopySendInt16(CopyToState cstate, int16 val); */ /* text format */ -static const CopyToRoutine CopyToRoutineText = { +const CopyToRoutine CopyToRoutineText = { .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, @@ -98,7 +99,7 @@ static const CopyToRoutine CopyToRoutineText = { }; /* CSV format */ -static const CopyToRoutine CopyToRoutineCSV = { +const CopyToRoutine CopyToRoutineCSV = { .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, @@ -107,7 +108,7 @@ static const CopyToRoutine CopyToRoutineCSV = { }; /* binary format */ -static const CopyToRoutine CopyToRoutineBinary = { +const CopyToRoutine CopyToRoutineBinary = { .type = T_CopyToRoutine, .CopyToStart = CopyToBinaryStart, .CopyToOutFunc = CopyToBinaryOutFunc, @@ -119,30 +120,25 @@ static const CopyToRoutine CopyToRoutineBinary = { static const CopyToRoutine * CopyToGetRoutine(const CopyFormatOptions *opts) { - if (OidIsValid(opts->handler)) - { - Datum datum; - Node *routine; + Oid handler = opts->handler; + Datum datum; + Node *routine; - datum = OidFunctionCall1(opts->handler, BoolGetDatum(false)); - routine = (Node *) DatumGetPointer(datum); - if (routine == NULL || !IsA(routine, CopyToRoutine)) - ereport( - ERROR, - (errcode( - ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY handler function " - "%u did not return " - "CopyToRoutine struct", - opts->handler))); - return castNode(CopyToRoutine, routine); - } - else if (opts->csv_mode) - return &CopyToRoutineCSV; - else if (opts->binary) - return &CopyToRoutineBinary; - else - return &CopyToRoutineText; + if (!OidIsValid(handler)) + handler = F_TEXT_INTERNAL; + + datum = OidFunctionCall1(handler, BoolGetDatum(false)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%u did not return " + "CopyToRoutine struct", + opts->handler))); + return castNode(CopyToRoutine, routine); } /* Implementation of the start callback for text and CSV formats */ diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 7c2a510fa3f..0737eb73c9a 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12485,4 +12485,15 @@ proargtypes => 'int4', prosrc => 'gist_stratnum_common' }, +# COPY handlers +{ oid => '8100', descr => 'text COPY FORMAT handler', + proname => 'text', provolatile => 'i', prorettype => 'copy_handler', + proargtypes => 'internal', prosrc => 'copy_text_handler' }, +{ oid => '8101', descr => 'csv COPY FORMAT handler', + proname => 'csv', provolatile => 'i', prorettype => 'copy_handler', + proargtypes => 'internal', prosrc => 'copy_csv_handler' }, +{ oid => '8102', descr => 'binary COPY FORMAT handler', + proname => 'binary', provolatile => 'i', prorettype => 'copy_handler', + proargtypes => 'internal', prosrc => 'copy_binary_handler' }, + ] diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index af425cf5fd9..abeccf85c1c 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -14,7 +14,7 @@ #ifndef COPYFROM_INTERNAL_H #define COPYFROM_INTERNAL_H -#include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -197,4 +197,8 @@ extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); +extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineText; +extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineCSV; +extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineBinary; + #endif /* COPYFROM_INTERNAL_H */ diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h index ce1c33a4004..85412660f7f 100644 --- a/src/include/commands/copyto_internal.h +++ b/src/include/commands/copyto_internal.h @@ -14,7 +14,7 @@ #ifndef COPYTO_INTERNAL_H #define COPYTO_INTERNAL_H -#include "commands/copy.h" +#include "commands/copyapi.h" #include "executor/execdesc.h" #include "executor/tuptable.h" #include "nodes/execnodes.h" @@ -83,4 +83,8 @@ typedef struct CopyToStateData void *opaque; /* private space */ } CopyToStateData; +extern PGDLLIMPORT const CopyToRoutine CopyToRoutineText; +extern PGDLLIMPORT const CopyToRoutine CopyToRoutineCSV; +extern PGDLLIMPORT const CopyToRoutine CopyToRoutineBinary; + #endif /* COPYTO_INTERNAL_H */ -- 2.47.2 From deb16adfcdba9ae9cac0d1c4e55b9cf35ba2a179 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Wed, 19 Mar 2025 11:46:34 +0900 Subject: [PATCH v37 9/9] Add document how to write a COPY handler This is WIP because we haven't decided our API yet. --- doc/src/sgml/copy-handler.sgml | 375 +++++++++++++++++++++++++++++++++ doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/postgres.sgml | 1 + src/include/commands/copyapi.h | 9 +- 4 files changed, 382 insertions(+), 4 deletions(-) create mode 100644 doc/src/sgml/copy-handler.sgml diff --git a/doc/src/sgml/copy-handler.sgml b/doc/src/sgml/copy-handler.sgml new file mode 100644 index 00000000000..f602debae61 --- /dev/null +++ b/doc/src/sgml/copy-handler.sgml @@ -0,0 +1,375 @@ +<!-- doc/src/sgml/copy-handler.sgml --> + +<chapter id="copy-handler"> + <title>Writing a Copy Handler</title> + + <indexterm zone="copy-handler"> + <primary><literal>COPY</literal> handler</primary> + </indexterm> + + <para> + <productname>PostgreSQL</productname> supports + custom <link linkend="sql-copy"><literal>COPY</literal></link> + handlers. The <literal>COPY</literal> handlers can use different copy format + instead of built-in <literal>text</literal>, <literal>csv</literal> + and <literal>binary</literal>. + </para> + + <para> + At the SQL level, a table sampling method is represented by a single SQL + function, typically implemented in C, having the signature +<programlisting> +format_name(internal) RETURNS copy_handler +</programlisting> + The name of the function is the same name appearing in + the <literal>FORMAT</literal> option. The <type>internal</type> argument is + a dummy that simply serves to prevent this function from being called + directly from an SQL command. The real argument is <literal>bool + is_from</literal>. If the handler is used by <literal>COPY FROM</literal>, + it's <literal>true</literal>. If the handler is used by <literal>COPY + FROM</literal>, it's <literal>false</literal>. + </para> + + <para> + The function must return <type>CopyFromRoutine *</type> when + the <literal>is_from</literal> argument is <literal>true</literal>. + The function must return <type>CopyToRoutine *</type> when + the <literal>is_from</literal> argument is <literal>false</literal>. + </para> + + <para> + The <type>CopyFromRoutine</type> and <type>CopyToRoutine</type> struct types + are declared in <filename>src/include/commands/copyapi.h</filename>, + which see for additional details. + </para> + + <sect1 id="copy-handler-from"> + <title>Copy From Handler</title> + + <para> + The <literal>COPY</literal> handler function for <literal>COPY + FROM</literal> returns a <type>CopyFromRoutine</type> struct containing + pointers to the functions described below. All functions are required. + </para> + + <para> +<programlisting> +void +CopyFromInFunc(CopyFromState cstate, + Oid atttypid, + FmgrInfo *finfo, + Oid *typioparam); +</programlisting> + + This sets input function information for the + given <literal>atttypid</literal> attribute. This function is called once + at the beginning of <literal>COPY FROM</literal>. If + this <literal>COPY</literal> handler doesn't use any input functions, this + function doesn't need to do anything. + + <variablelist> + <varlistentry> + <term><literal>CopyFromState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY FROM</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>Oid atttypid</literal></term> + <listitem> + <para> + This is the OID of data type used by the relation's attribute. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>FmgrInfo *finfo</literal></term> + <listitem> + <para> + This can be optionally filled to provide the catalog information of + the input function. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>Oid *typioparam</literal></term> + <listitem> + <para> + This can be optionally filled to define the OID of the type to + pass to the input function. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +void +CopyFromStart(CopyFromState cstate, + TupleDesc tupDesc); +</programlisting> + + This starts a <literal>COPY FROM</literal>. This function is called once at + the beginning of <literal>COPY FROM</literal>. + + <variablelist> + <varlistentry> + <term><literal>CopyFromState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY FROM</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>TupleDesc tupDesc</literal></term> + <listitem> + <para> + This is the tuple descriptor of the relation where the data needs to be + copied. This can be used for any initialization steps required by a + format. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +bool +CopyFromOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls); +</programlisting> + + This reads one row from the source and fill <literal>values</literal> + and <literal>nulls</literal>. If there is one or more tuples to be read, + this must return <literal>true</literal>. If there are no more tuples to + read, this must return <literal>false</literal>. + + <variablelist> + <varlistentry> + <term><literal>CopyFromState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY FROM</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>ExprContext *econtext</literal></term> + <listitem> + <para> + This is used to evaluate default expression for each column that is + either not read from the file or is using + the <literal>DEFAULT</literal> option of <literal>COPY + FROM</literal>. It is <literal>NULL</literal> if no default values are used. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>Datum *values</literal></term> + <listitem> + <para> + This is an output variable to store read tuples. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>bool *nulls</literal></term> + <listitem> + <para> + This is an output variable to store whether the read columns + are <literal>NULL</literal> or not. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +void +CopyFromEnd(CopyFromState cstate); +</programlisting> + + This ends a <literal>COPY FROM</literal>. This function is called once at + the end of <literal>COPY FROM</literal>. + + <variablelist> + <varlistentry> + <term><literal>CopyFromState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY FROM</literal> operation. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> + TODO: Add CopyFromStateGetData() and CopyFromSkipErrowRow()? + </para> + </sect1> + + <sect1 id="copy-handler-to"> + <title>Copy To Handler</title> + + <para> + The <literal>COPY</literal> handler function for <literal>COPY + TO</literal> returns a <type>CopyToRoutine</type> struct containing + pointers to the functions described below. All functions are required. + </para> + + <para> +<programlisting> +void +CopyToOutFunc(CopyToState cstate, + Oid atttypid, + FmgrInfo *finfo); +</programlisting> + + This sets output function information for the + given <literal>atttypid</literal> attribute. This function is called once + at the beginning of <literal>COPY TO</literal>. If + this <literal>COPY</literal> handler doesn't use any output functions, this + function doesn't need to do anything. + + <variablelist> + <varlistentry> + <term><literal>CopyToState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY TO</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>Oid atttypid</literal></term> + <listitem> + <para> + This is the OID of data type used by the relation's attribute. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>FmgrInfo *finfo</literal></term> + <listitem> + <para> + This can be optionally filled to provide the catalog information of + the output function. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +void +CopyToStart(CopyToState cstate, + TupleDesc tupDesc); +</programlisting> + + This starts a <literal>COPY TO</literal>. This function is called once at + the beginning of <literal>COPY TO</literal>. + + <variablelist> + <varlistentry> + <term><literal>CopyToState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY TO</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>TupleDesc tupDesc</literal></term> + <listitem> + <para> + This is the tuple descriptor of the relation where the data is read. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +bool +CopyToOneRow(CopyToState cstate, + TupleTableSlot *slot); +</programlisting> + + This writes one row stored in <literal>slot</literal> to the destination. + + <variablelist> + <varlistentry> + <term><literal>CopyToState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY TO</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>TupleTableSlot *slot</literal></term> + <listitem> + <para> + This is used to get row to be written. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +void +CopyToEnd(CopyToState cstate); +</programlisting> + + This ends a <literal>COPY TO</literal>. This function is called once at + the end of <literal>COPY TO</literal>. + + <variablelist> + <varlistentry> + <term><literal>CopyToState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY TO</literal> operation. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> + TODO: Add CopyToStateFlush()? + </para> + </sect1> +</chapter> diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 25fb99cee69..1fd6d32d5ec 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -107,6 +107,7 @@ <!ENTITY storage SYSTEM "storage.sgml"> <!ENTITY transaction SYSTEM "xact.sgml"> <!ENTITY tablesample-method SYSTEM "tablesample-method.sgml"> +<!ENTITY copy-handler SYSTEM "copy-handler.sgml"> <!ENTITY wal-for-extensions SYSTEM "wal-for-extensions.sgml"> <!ENTITY generic-wal SYSTEM "generic-wal.sgml"> <!ENTITY custom-rmgr SYSTEM "custom-rmgr.sgml"> diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index af476c82fcc..8ba319ae2df 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -254,6 +254,7 @@ break is not needed in a wider output rendering. &plhandler; &fdwhandler; &tablesample-method; + ©-handler; &custom-scan; &geqo; &tableam; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 500ece7d5bb..24710cb667a 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -28,10 +28,10 @@ typedef struct CopyToRoutine * Set output function information. This callback is called once at the * beginning of COPY TO. * + * 'atttypid' is the OID of data type used by the relation's attribute. + * * 'finfo' can be optionally filled to provide the catalog information of * the output function. - * - * 'atttypid' is the OID of data type used by the relation's attribute. */ void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, FmgrInfo *finfo); @@ -70,12 +70,13 @@ typedef struct CopyFromRoutine * Set input function information. This callback is called once at the * beginning of COPY FROM. * + * 'atttypid' is the OID of data type used by the relation's attribute. + * * 'finfo' can be optionally filled to provide the catalog information of * the input function. * * 'typioparam' can be optionally filled to define the OID of the type to - * pass to the input function.'atttypid' is the OID of data type used by - * the relation's attribute. + * pass to the input function. */ void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, FmgrInfo *finfo, Oid *typioparam); -- 2.47.2
pgsql-hackers by date: