Re: Error-safe user functions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Error-safe user functions
Date
Msg-id 2503569.1670104001@sss.pgh.pa.us
Whole thread Raw
In response to Re: Error-safe user functions  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Error-safe user functions  (Corey Huinker <corey.huinker@gmail.com>)
Re: Error-safe user functions  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> Great. Let's hope we can get this settled early next week and then we
> can get to work on the next tranche of functions, those that will let
> the SQL/JSON work restart.

OK, here's a draft proposal.  I should start out by acknowledging that
this steals a great deal from Nikita's original patch as well as yours,
though I editorialized heavily.

0001 is the core infrastructure and documentation for the feature.
(I didn't bother breaking it down further than that.)

0002 fixes boolin and int4in.  That is the work that we're going to
have to replicate in an awful lot of places, and I am pleased by how
short-and-sweet it is.  Of course, stuff like the datetime functions
might be more complex to adapt.

Then 0003 is a quick-hack version of COPY that is able to exercise
all this.  I did not bother with the per-column flags as you had
them, because I'm not sure if they're worth the trouble compared
to a simple boolean; in any case we can add that refinement later.
What I did add was a WARN_ON_ERROR option that exercises the ability
to extract the error message after a soft error.  I'm not proposing
that as a shippable feature, it's just something for testing.

I think there are just a couple of loose ends here:

1. Bikeshedding on my name choices is welcome.  I know Robert is
dissatisfied with "ereturn", but I'm content with that so I didn't
change it here.

2. Everybody has struggled with just where to put the declaration
of the error context structure.  The most natural home for it
probably would be elog.h, but that's out because it cannot depend
on nodes.h, and the struct has to be a Node type to conform to
the fmgr safety guidelines.  What I've done here is to drop it
in nodes.h, as we've done with a couple of other hard-to-classify
node types; but I can't say I'm satisfied with that.

Other plausible answers seem to be:

* Drop it in fmgr.h.  The only real problem is that historically
we've not wanted fmgr.h to depend on nodes.h either.  But I'm not
sure how strong the argument for that really is/was.  If we did
do it like that we could clean up a few kluges, both in this patch
and pre-existing (fmNodePtr at least could go away).

* Invent a whole new header just for this struct.  But then we're
back to the question of what to call it.  Maybe something along the
lines of utils/elog_extras.h ?

            regards, tom lane

diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml
index 693423e524..e68a6d7231 100644
--- a/doc/src/sgml/ref/create_type.sgml
+++ b/doc/src/sgml/ref/create_type.sgml
@@ -900,6 +900,15 @@ CREATE TYPE <replaceable class="parameter">name</replaceable>
    function is written in C.
   </para>

+  <para>
+   In <productname>PostgreSQL</productname> version 16 and later, it is
+   desirable for base types' input functions to return <quote>safe</quote>
+   errors using the new <function>ereturn()</function> mechanism, rather
+   than throwing <function>ereport()</function> exceptions as in previous
+   versions.  See <filename>src/backend/utils/fmgr/README</filename> for
+   more information.
+  </para>
+
  </refsect1>

  <refsect1>
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 2585e24845..6c8736f0c4 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -686,6 +686,153 @@ errfinish(const char *filename, int lineno, const char *funcname)
 }


+/*
+ * ereturn_start --- begin a "safe" error-reporting cycle
+ *
+ * If "context" isn't an ErrorReturnContext node, this behaves as
+ * errstart(ERROR, domain).
+ *
+ * If it is an ErrorReturnContext node, but the node creator only wants
+ * notification of the fact of a safe error without any details, just set
+ * the error_occurred flag in the ErrorReturnContext node and return false,
+ * which will cause us to skip remaining error processing steps.
+ *
+ * Otherwise, create and initialize error stack entry and return true.
+ * Subsequently, errmsg() and perhaps other routines will be called to further
+ * populate the stack entry.  Finally, ereturn_finish() will be called to
+ * tidy up.
+ */
+bool
+ereturn_start(void *context, const char *domain)
+{
+    ErrorReturnContext *ercontext;
+    ErrorData  *edata;
+
+    /*
+     * Do we have a context for safe error reporting?  If not, just punt to
+     * errstart().
+     */
+    if (context == NULL || !IsA(context, ErrorReturnContext))
+        return errstart(ERROR, domain);
+
+    /* Report that an error was detected */
+    ercontext = (ErrorReturnContext *) context;
+    ercontext->error_occurred = true;
+
+    /* Nothing else to do if caller wants no further details */
+    if (!ercontext->details_please)
+        return false;
+
+    /*
+     * Okay, crank up a stack entry to store the info in.
+     */
+
+    recursion_depth++;
+    if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE)
+    {
+        /*
+         * Wups, stack not big enough.  We treat this as a PANIC condition
+         * because it suggests an infinite loop of errors during error
+         * recovery.
+         */
+        errordata_stack_depth = -1; /* make room on stack */
+        ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded")));
+    }
+
+    /* Initialize data for this error frame */
+    edata = &errordata[errordata_stack_depth];
+    MemSet(edata, 0, sizeof(ErrorData));
+    edata->elevel = LOG;        /* signal all is well to ereturn_finish */
+    /* the default text domain is the backend's */
+    edata->domain = domain ? domain : PG_TEXTDOMAIN("postgres");
+    /* initialize context_domain the same way (see set_errcontext_domain()) */
+    edata->context_domain = edata->domain;
+    /* Select default errcode based on the assumed elevel of ERROR */
+    edata->sqlerrcode = ERRCODE_INTERNAL_ERROR;
+    /* errno is saved here so that error parameter eval can't change it */
+    edata->saved_errno = errno;
+
+    /*
+     * Any allocations for this error state level should go into the caller's
+     * context.  We don't need to pollute ErrorContext, or even require it to
+     * exist, in this code path.
+     */
+    edata->assoc_context = CurrentMemoryContext;
+
+    recursion_depth--;
+    return true;
+}
+
+/*
+ * ereturn_finish --- end a "safe" error-reporting cycle
+ *
+ * If ereturn_start() decided this was a regular error, behave as
+ * errfinish().  Otherwise, package up the error details and save
+ * them in the ErrorReturnContext node.
+ */
+void
+ereturn_finish(void *context, const char *filename, int lineno,
+               const char *funcname)
+{
+    ErrorReturnContext *ercontext = (ErrorReturnContext *) context;
+    ErrorData  *edata = &errordata[errordata_stack_depth];
+
+    /* verify stack depth before accessing *edata */
+    CHECK_STACK_DEPTH();
+
+    /*
+     * If ereturn_start punted to errstart, then elevel will be ERROR or
+     * perhaps even PANIC.  Punt likewise to errfinish.
+     */
+    if (edata->elevel >= ERROR)
+        errfinish(filename, lineno, funcname);
+
+    /*
+     * Else, we should package up the stack entry contents and deliver them to
+     * the caller.
+     */
+    recursion_depth++;
+
+    /* Save the last few bits of error state into the stack entry */
+    if (filename)
+    {
+        const char *slash;
+
+        /* keep only base name, useful especially for vpath builds */
+        slash = strrchr(filename, '/');
+        if (slash)
+            filename = slash + 1;
+        /* Some Windows compilers use backslashes in __FILE__ strings */
+        slash = strrchr(filename, '\\');
+        if (slash)
+            filename = slash + 1;
+    }
+
+    edata->filename = filename;
+    edata->lineno = lineno;
+    edata->funcname = funcname;
+    edata->elevel = ERROR;        /* hide the LOG value used above */
+
+    /*
+     * We skip calling backtrace and context functions, which are more likely
+     * to cause trouble than provide useful context; they might act on the
+     * assumption that a transaction abort is about to occur.
+     */
+
+    /*
+     * Make a copy of the error info for the caller.  All the subsidiary
+     * strings are already in the caller's context, so it's sufficient to
+     * flat-copy the stack entry.
+     */
+    ercontext->error_data = palloc_object(ErrorData);
+    memcpy(ercontext->error_data, edata, sizeof(ErrorData));
+
+    /* Exit error-handling context */
+    errordata_stack_depth--;
+    recursion_depth--;
+}
+
+
 /*
  * errcode --- add SQLSTATE error code to the current error
  *
diff --git a/src/backend/utils/fmgr/README b/src/backend/utils/fmgr/README
index 49845f67ac..4aa1ce6d28 100644
--- a/src/backend/utils/fmgr/README
+++ b/src/backend/utils/fmgr/README
@@ -267,6 +267,66 @@ See windowapi.h for more information.
 information about the context of the CALL statement, particularly
 whether it is within an "atomic" execution context.

+* Some callers of datatype input functions (and in future perhaps
+other classes of functions) pass an instance of ErrorReturnContext.
+This indicates that the caller wishes to handle "safe" errors without
+a transaction-terminating exception being thrown: instead, the callee
+should store information about the error cause in the ErrorReturnContext
+struct and return a dummy result value.  Further details appear in
+"Handling Non-Exception Errors" below.
+
+
+Handling Non-Exception Errors
+-----------------------------
+
+Postgres' standard mechanism for reporting errors (ereport() or elog())
+is used for all sorts of error conditions.  This means that throwing
+an exception via ereport(ERROR) requires an expensive transaction or
+subtransaction abort and cleanup, since the exception catcher dare not
+make many assumptions about what has gone wrong.  There are situations
+where we would rather have a lighter-weight mechanism for dealing
+with errors that are known to be safe to recover from without a full
+transaction cleanup.  SQL-callable functions can support this need
+using the ErrorReturnContext context mechanism.
+
+To report a "safe" error, a SQL-callable function should call
+    ereturn(fcinfo->context, ...)
+where it would previously have done
+    ereport(ERROR, ...)
+If the passed "context" is NULL or is not an ErrorReturnContext node,
+then ereturn behaves precisely as ereport(ERROR): the exception is
+thrown via longjmp, so that control does not return.  If "context"
+is an ErrorReturnContext node, then the error information included in
+ereturn's subsidiary reporting calls is stored into the context node
+and control returns normally.  The function should then return a dummy
+value to its caller.  (SQL NULL is recommendable as the dummy value;
+but anything will do, since the caller is expected to ignore the
+function's return value once it sees that an error has been reported
+in the ErrorReturnContext node.)
+
+Considering datatype input functions as examples, typical "safe" error
+conditions include input syntax errors and out-of-range values.  An input
+function typically detects these cases with simple if-tests and can easily
+change the following ereport calls to ereturns.  Error conditions that
+should NOT be handled this way include out-of-memory, internal errors, and
+anything where there is any question about our ability to continue normal
+processing of the transaction.  Those should still be thrown with ereport.
+Because of this restriction, it's typically not necessary to pass the
+error context pointer down very far, as errors reported by palloc or
+other low-level functions are typically reasonable to consider internal.
+
+Because no transaction cleanup will occur, a function that is exiting
+after ereturn() returns normally still bears responsibility for resource
+cleanup.  It is not necessary to be concerned about small leakages of
+palloc'd memory, since the caller should be running the function in a
+short-lived memory context.  However, resources such as locks, open files,
+or buffer pins must be closed out cleanly, as they would be in the
+non-error code path.
+
+Conventions for callers that use the ErrorReturnContext mechanism
+to trap errors are discussed with the declaration of that struct,
+in nodes.h.
+

 Functions Accepting or Returning Sets
 -------------------------------------
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index 3c210297aa..d200b9c296 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1548,6 +1548,63 @@ InputFunctionCall(FmgrInfo *flinfo, char *str, Oid typioparam, int32 typmod)
     return result;
 }

+/*
+ * Call a previously-looked-up datatype input function, with non-exception
+ * handling of "safe" errors.
+ *
+ * This is the same as InputFunctionCall, but the caller also passes a
+ * previously-initialized ErrorReturnContext node.  (We declare that as
+ * "void *" to avoid including nodes.h in fmgr.h, but it had better be an
+ * ErrorReturnContext.)  Any "safe" errors detected by the input function
+ * will be reported by filling the ercontext struct.  The caller must
+ * check ercontext->error_occurred before assuming that the function result
+ * is meaningful.
+ */
+Datum
+InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
+                      Oid typioparam, int32 typmod,
+                      void *ercontext)
+{
+    LOCAL_FCINFO(fcinfo, 3);
+    Datum        result;
+
+    Assert(IsA(ercontext, ErrorReturnContext));
+
+    if (str == NULL && flinfo->fn_strict)
+        return (Datum) 0;        /* just return null result */
+
+    InitFunctionCallInfoData(*fcinfo, flinfo, 3, InvalidOid, ercontext, NULL);
+
+    fcinfo->args[0].value = CStringGetDatum(str);
+    fcinfo->args[0].isnull = false;
+    fcinfo->args[1].value = ObjectIdGetDatum(typioparam);
+    fcinfo->args[1].isnull = false;
+    fcinfo->args[2].value = Int32GetDatum(typmod);
+    fcinfo->args[2].isnull = false;
+
+    result = FunctionCallInvoke(fcinfo);
+
+    /* Result value is garbage, and could be null, if an error was reported */
+    if (((ErrorReturnContext *) ercontext)->error_occurred)
+        return (Datum) 0;
+
+    /* Otherwise, should get null result if and only if str is NULL */
+    if (str == NULL)
+    {
+        if (!fcinfo->isnull)
+            elog(ERROR, "input function %u returned non-NULL",
+                 flinfo->fn_oid);
+    }
+    else
+    {
+        if (fcinfo->isnull)
+            elog(ERROR, "input function %u returned NULL",
+                 flinfo->fn_oid);
+    }
+
+    return result;
+}
+
 /*
  * Call a previously-looked-up datatype output function.
  *
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index 380a82b9de..a95bad117f 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -700,6 +700,9 @@ extern Datum OidFunctionCall9Coll(Oid functionId, Oid collation,
 /* Special cases for convenient invocation of datatype I/O functions. */
 extern Datum InputFunctionCall(FmgrInfo *flinfo, char *str,
                                Oid typioparam, int32 typmod);
+extern Datum InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
+                                   Oid typioparam, int32 typmod,
+                                   void *ercontext);
 extern Datum OidInputFunctionCall(Oid functionId, char *str,
                                   Oid typioparam, int32 typmod);
 extern char *OutputFunctionCall(FmgrInfo *flinfo, Datum val);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 1f33902947..7979aea16d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -430,4 +430,31 @@ typedef enum LimitOption
     LIMIT_OPTION_DEFAULT,        /* No limit present */
 } LimitOption;

+/*
+ * ErrorReturnContext -
+ *    function call context node for handling of "safe" errors
+ *
+ * A caller wishing to trap "safe" errors must initialize a struct like this
+ * with all fields zero/NULL except for the NodeTag.  Optionally, set
+ * details_please = true if more than the bare knowledge that a "safe" error
+ * occurred is required.  After calling code that might report an error this
+ * way, check error_occurred to see if an error happened.  If so, and if
+ * details_please is true, error_data has been filled with error details
+ * (stored in the callee's memory context!).  FreeErrorData() can be called
+ * to release error_data, although this step is typically not necessary
+ * if the called code was run in a short-lived context.
+ *
+ * nodes.h isn't a great place for this, but neither elog.h nor fmgr.h
+ * should depend on nodes.h, so we don't really have a better option.
+ */
+typedef struct ErrorReturnContext
+{
+    pg_node_attr(nodetag_only)    /* this is not a member of parse trees */
+
+    NodeTag        type;
+    bool        error_occurred; /* set to true if we detect a "safe" error */
+    bool        details_please; /* does caller want more info than that? */
+    struct ErrorData *error_data;    /* details of error, if so */
+} ErrorReturnContext;
+
 #endif                            /* NODES_H */
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index f107a818e8..2c8dd3d633 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -235,6 +235,46 @@ extern int    getinternalerrposition(void);
     ereport(elevel, errmsg_internal(__VA_ARGS__))


+/*----------
+ * Support for reporting "safe" errors that don't require a full transaction
+ * abort to clean up.  This is to be used in this way:
+ *        ereturn(context,
+ *                errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ *                errmsg("invalid input syntax for type %s: \"%s\"",
+ *                       "boolean", in_str),
+ *                ... other errxxx() fields as needed ...);
+ *
+ * "context" is a node pointer or NULL, and the remaining auxiliary calls
+ * provide the same error details as in ereport().  If context is not a
+ * pointer to an ErrorReturnContext node, then ereturn(context, ...)
+ * behaves identically to ereport(ERROR, ...).  If context is a pointer
+ * to an ErrorReturnContext node, then the information provided by the
+ * auxiliary calls is stored in the context node and control returns
+ * normally.  The caller of ereturn() must then do any required cleanup
+ * and return control back to its caller.  That caller must check the
+ * ErrorReturnContext node to see whether an error occurred before
+ * it can trust the function's result to be meaningful.
+ *
+ * ereturn_domain() allows a message domain to be specified; it is
+ * precisely analogous to ereport_domain().
+ *----------
+ */
+#define ereturn_domain(context, domain, ...)    \
+    do { \
+        void *context_ = (context); \
+        pg_prevent_errno_in_scope(); \
+        if (ereturn_start(context_, domain)) \
+            __VA_ARGS__, ereturn_finish(context_, __FILE__, __LINE__, __func__); \
+    } while(0)
+
+#define ereturn(context, ...)    \
+    ereturn_domain(context, TEXTDOMAIN, __VA_ARGS__)
+
+extern bool ereturn_start(void *context, const char *domain);
+extern void ereturn_finish(void *context, const char *filename, int lineno,
+                           const char *funcname);
+
+
 /* Support for constructing error strings separately from ereport() calls */

 extern void pre_format_elog_string(int errnumber, const char *domain);
diff --git a/src/backend/utils/adt/bool.c b/src/backend/utils/adt/bool.c
index cd7335287f..d78f862421 100644
--- a/src/backend/utils/adt/bool.c
+++ b/src/backend/utils/adt/bool.c
@@ -148,12 +148,12 @@ boolin(PG_FUNCTION_ARGS)
     if (parse_bool_with_len(str, len, &result))
         PG_RETURN_BOOL(result);

-    ereport(ERROR,
+    ereturn(fcinfo->context,
             (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
              errmsg("invalid input syntax for type %s: \"%s\"",
                     "boolean", in_str)));

-    /* not reached */
+    /* dummy result */
     PG_RETURN_BOOL(false);
 }

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 42ddae99ef..e1837bee71 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -291,7 +291,7 @@ int4in(PG_FUNCTION_ARGS)
 {
     char       *num = PG_GETARG_CSTRING(0);

-    PG_RETURN_INT32(pg_strtoint32(num));
+    PG_RETURN_INT32(pg_strtoint32_safe(num, fcinfo->context));
 }

 /*
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 834ec0b588..7050d5825d 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -164,8 +164,11 @@ invalid_syntax:
 /*
  * Convert input string to a signed 32 bit integer.
  *
- * Allows any number of leading or trailing whitespace characters. Will throw
- * ereport() upon bad input format or overflow.
+ * Allows any number of leading or trailing whitespace characters.
+ *
+ * pg_strtoint32() will throw ereport() upon bad input format or overflow;
+ * while pg_strtoint32_safe() instead returns such complaints in *ercontext,
+ * if it's an ErrorReturnContext.
  *
  * NB: Accumulate input as a negative number, to deal with two's complement
  * representation of the most negative number, which can't be represented as a
@@ -173,6 +176,12 @@ invalid_syntax:
  */
 int32
 pg_strtoint32(const char *s)
+{
+    return pg_strtoint32_safe(s, NULL);
+}
+
+int32
+pg_strtoint32_safe(const char *s, Node *ercontext)
 {
     const char *ptr = s;
     int32        tmp = 0;
@@ -223,18 +232,18 @@ pg_strtoint32(const char *s)
     return tmp;

 out_of_range:
-    ereport(ERROR,
+    ereturn(ercontext,
             (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
              errmsg("value \"%s\" is out of range for type %s",
                     s, "integer")));
+    return 0;                    /* dummy result */

 invalid_syntax:
-    ereport(ERROR,
+    ereturn(ercontext,
             (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
              errmsg("invalid input syntax for type %s: \"%s\"",
                     "integer", s)));
-
-    return 0;                    /* keep compiler quiet */
+    return 0;                    /* dummy result */
 }

 /*
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 81631f1645..894e3d8ba9 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -45,6 +45,7 @@ extern int    namestrcmp(Name name, const char *str);
 /* numutils.c */
 extern int16 pg_strtoint16(const char *s);
 extern int32 pg_strtoint32(const char *s);
+extern int32 pg_strtoint32_safe(const char *s, Node *ercontext);
 extern int64 pg_strtoint64(const char *s);
 extern int    pg_itoa(int16 i, char *a);
 extern int    pg_ultoa_n(uint32 value, char *a);
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..462e4d338b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -42,6 +42,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
     FORCE_NOT_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
     FORCE_NULL ( <replaceable class="parameter">column_name</replaceable> [, ...] )
+    NULL_ON_ERROR [ <replaceable class="parameter">boolean</replaceable> ]
+    WARN_ON_ERROR [ <replaceable class="parameter">boolean</replaceable> ]
     ENCODING '<replaceable class="parameter">encoding_name</replaceable>'
 </synopsis>
  </refsynopsisdiv>
@@ -356,6 +358,27 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
     </listitem>
    </varlistentry>

+   <varlistentry>
+    <term><literal>NULL_ON_ERROR</literal></term>
+    <listitem>
+     <para>
+      Requests silently replacing any erroneous input values with
+      <literal>NULL</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>WARN_ON_ERROR</literal></term>
+    <listitem>
+     <para>
+      Requests replacing any erroneous input values with
+      <literal>NULL</literal>, and emitting a warning message instead of
+      the usual error.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>ENCODING</literal></term>
     <listitem>
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db4c9dbc23..d224167111 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -409,6 +409,7 @@ ProcessCopyOptions(ParseState *pstate,
     bool        format_specified = false;
     bool        freeze_specified = false;
     bool        header_specified = false;
+    bool        on_error_specified = false;
     ListCell   *option;

     /* Support external use for option sanity checking */
@@ -520,6 +521,20 @@ ProcessCopyOptions(ParseState *pstate,
                                 defel->defname),
                          parser_errposition(pstate, defel->location)));
         }
+        else if (strcmp(defel->defname, "null_on_error") == 0)
+        {
+            if (on_error_specified)
+                errorConflictingDefElem(defel, pstate);
+            on_error_specified = true;
+            opts_out->null_on_error = defGetBoolean(defel);
+        }
+        else if (strcmp(defel->defname, "warn_on_error") == 0)
+        {
+            if (on_error_specified)
+                errorConflictingDefElem(defel, pstate);
+            on_error_specified = true;
+            opts_out->warn_on_error = defGetBoolean(defel);
+        }
         else if (strcmp(defel->defname, "convert_selectively") == 0)
         {
             /*
@@ -701,6 +716,30 @@ ProcessCopyOptions(ParseState *pstate,
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("CSV quote character must not appear in the NULL specification")));
+
+    /*
+     * The XXX_ON_ERROR options are only supported for input, and only in text
+     * modes.  We could in future extend safe-errors support to datatype
+     * receive functions, but it'd take a lot more work.  Moreover, it's not
+     * clear that receive functions can detect errors very well, so the
+     * feature likely wouldn't work terribly well.
+     */
+    if (opts_out->null_on_error && !is_from)
+        ereport(ERROR,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                 errmsg("COPY NULL_ON_ERROR only available using COPY FROM")));
+    if (opts_out->null_on_error && opts_out->binary)
+        ereport(ERROR,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                 errmsg("cannot specify NULL_ON_ERROR in BINARY mode")));
+    if (opts_out->warn_on_error && !is_from)
+        ereport(ERROR,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                 errmsg("COPY WARN_ON_ERROR only available using COPY FROM")));
+    if (opts_out->warn_on_error && opts_out->binary)
+        ereport(ERROR,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                 errmsg("cannot specify WARN_ON_ERROR in BINARY mode")));
 }

 /*
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 504afcb811..a268ca05d0 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1599,6 +1599,15 @@ BeginCopyFrom(ParseState *pstate,
         }
     }

+    /* For the XXX_ON_ERROR options, we'll need an ErrorReturnContext */
+    if (cstate->opts.null_on_error ||
+        cstate->opts.warn_on_error)
+    {
+        cstate->er_context = makeNode(ErrorReturnContext);
+        /* Error details are only needed for warnings */
+        if (cstate->opts.warn_on_error)
+            cstate->er_context->details_please = true;
+    }

     /* initialize progress */
     pgstat_progress_start_command(PROGRESS_COMMAND_COPY,
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 097414ef12..901cbea030 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -876,6 +876,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
         char      **field_strings;
         ListCell   *cur;
         int            fldct;
+        bool        safe_mode;
         int            fieldno;
         char       *string;

@@ -889,6 +890,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                      errmsg("extra data after last expected column")));

+        safe_mode = cstate->opts.null_on_error || cstate->opts.warn_on_error;
+
         fieldno = 0;

         /* Loop to read the user attributes on the line. */
@@ -938,12 +941,50 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,

             cstate->cur_attname = NameStr(att->attname);
             cstate->cur_attval = string;
-            values[m] = InputFunctionCall(&in_functions[m],
-                                          string,
-                                          typioparams[m],
-                                          att->atttypmod);
-            if (string != NULL)
-                nulls[m] = false;
+
+            if (safe_mode)
+            {
+                ErrorReturnContext *er_context = cstate->er_context;
+
+                /* Must reset the error_occurred flag each time */
+                er_context->error_occurred = false;
+
+                values[m] = InputFunctionCallSafe(&in_functions[m],
+                                                  string,
+                                                  typioparams[m],
+                                                  att->atttypmod,
+                                                  er_context);
+                if (er_context->error_occurred)
+                {
+                    /* nulls[m] is already true */
+                    if (cstate->opts.warn_on_error)
+                    {
+                        ErrorData  *edata = er_context->error_data;
+
+                        /* Note that our errcontext callback wasn't used */
+                        ereport(WARNING,
+                                errcode(edata->sqlerrcode),
+                                errmsg_internal("invalid input for column %s: %s",
+                                                cstate->cur_attname,
+                                                edata->message),
+                                errcontext("COPY %s, line %llu",
+                                           cstate->cur_relname,
+                                           (unsigned long long) cstate->cur_lineno));
+                    }
+                }
+                else if (string != NULL)
+                    nulls[m] = false;
+            }
+            else
+            {
+                values[m] = InputFunctionCall(&in_functions[m],
+                                              string,
+                                              typioparams[m],
+                                              att->atttypmod);
+                if (string != NULL)
+                    nulls[m] = false;
+            }
+
             cstate->cur_attname = NULL;
             cstate->cur_attval = NULL;
         }
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b77b935005..ee38bd0e28 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -57,6 +57,8 @@ typedef struct CopyFormatOptions
     bool       *force_notnull_flags;    /* per-column CSV FNN flags */
     List       *force_null;        /* list of column names */
     bool       *force_null_flags;    /* per-column CSV FN flags */
+    bool        null_on_error;    /* replace erroneous inputs with NULL? */
+    bool        warn_on_error;    /* ... and warn about it? */
     bool        convert_selectively;    /* do selective binary conversion? */
     List       *convert_select; /* list of column names (can be NIL) */
 } CopyFormatOptions;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 8d9cc5accd..d578c73107 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -97,6 +97,7 @@ typedef struct CopyFromStateData
     int           *defmap;            /* array of default att numbers */
     ExprState **defexprs;        /* array of default att expressions */
     bool        volatile_defexprs;    /* is any of defexprs volatile? */
+    ErrorReturnContext *er_context; /* used for XXX_ON_ERROR options */
     List       *range_table;
     ExprState  *qualexpr;

diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out
index 3fad1c52d1..fa095ec52d 100644
--- a/src/test/regress/expected/copy.out
+++ b/src/test/regress/expected/copy.out
@@ -240,3 +240,25 @@ SELECT * FROM header_copytest ORDER BY a;
 (5 rows)

 drop table header_copytest;
+-- "safe" error handling
+create table on_error_copytest(i int, b bool);
+copy on_error_copytest from stdin with (null_on_error);
+copy on_error_copytest from stdin with (warn_on_error);
+WARNING:  invalid input for column b: invalid input syntax for type boolean: "b"
+WARNING:  invalid input for column i: invalid input syntax for type integer: "err"
+WARNING:  invalid input for column i: invalid input syntax for type integer: "bad"
+WARNING:  invalid input for column b: invalid input syntax for type boolean: "z"
+select * from on_error_copytest;
+ i | b
+---+---
+ 1 |
+   | t
+ 2 | f
+   |
+ 3 | f
+ 4 |
+   | t
+   |
+(8 rows)
+
+drop table on_error_copytest;
diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
index 285022e07c..2d804ad3af 100644
--- a/src/test/regress/sql/copy.sql
+++ b/src/test/regress/sql/copy.sql
@@ -268,3 +268,23 @@ a    c    b

 SELECT * FROM header_copytest ORDER BY a;
 drop table header_copytest;
+
+-- "safe" error handling
+create table on_error_copytest(i int, b bool);
+
+copy on_error_copytest from stdin with (null_on_error);
+1    a
+err    1
+2    f
+bad    x
+\.
+
+copy on_error_copytest from stdin with (warn_on_error);
+3    0
+4    b
+err    t
+bad    z
+\.
+
+select * from on_error_copytest;
+drop table on_error_copytest;

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Questions regarding distinct operation implementation
Next
From: David Rowley
Date:
Subject: Re: [PoC] Reducing planning time when tables have many partitions