Re: Error-safe user functions - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Error-safe user functions
Date
Msg-id 20221205234734.sab3fuxy6q2sokjd@alap3.anarazel.de
Whole thread Raw
In response to Re: Error-safe user functions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Error-safe user functions
List pgsql-hackers
Hi,

On 2022-12-05 16:40:06 -0500, Tom Lane wrote:
> +/*
> + * errsave_start --- begin a "safe" error-reporting cycle
> + *
> + * If "context" isn't an ErrorSaveContext node, this behaves as
> + * errstart(ERROR, domain), and the errsave() macro ends up acting
> + * exactly like ereport(ERROR, ...).
> + *
> + * If "context" is an ErrorSaveContext node, but the node creator only wants
> + * notification of the fact of a safe error without any details, just set
> + * the error_occurred flag in the ErrorSaveContext node and return false,
> + * which will cause us to skip the remaining error processing steps.
> + *
> + * Otherwise, create and initialize error stack entry and return true.
> + * Subsequently, errmsg() and perhaps other routines will be called to further
> + * populate the stack entry.  Finally, errsave_finish() will be called to
> + * tidy up.
> + */
> +bool
> +errsave_start(void *context, const char *domain)

Why is context a void *?


> +{
> +    ErrorSaveContext *escontext;
> +    ErrorData  *edata;
> +
> +    /*
> +     * Do we have a context for safe error reporting?  If not, just punt to
> +     * errstart().
> +     */
> +    if (context == NULL || !IsA(context, ErrorSaveContext))
> +        return errstart(ERROR, domain);

I don't think we should "accept" !IsA(context, ErrorSaveContext) - that
seems likely to hide things like use-after-free.


> +    if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE)
> +    {
> +        /*
> +         * Wups, stack not big enough.  We treat this as a PANIC condition
> +         * because it suggests an infinite loop of errors during error
> +         * recovery.
> +         */
> +        errordata_stack_depth = -1; /* make room on stack */
> +        ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded")));
> +    }

This is the fourth copy of this code...



> +/*
> + * errsave_finish --- end a "safe" error-reporting cycle
> + *
> + * If errsave_start() decided this was a regular error, behave as
> + * errfinish().  Otherwise, package up the error details and save
> + * them in the ErrorSaveContext node.
> + */
> +void
> +errsave_finish(void *context, const char *filename, int lineno,
> +               const char *funcname)
> +{
> +    ErrorSaveContext *escontext = (ErrorSaveContext *) context;
> +    ErrorData  *edata = &errordata[errordata_stack_depth];
> +
> +    /* verify stack depth before accessing *edata */
> +    CHECK_STACK_DEPTH();
> +
> +    /*
> +     * If errsave_start punted to errstart, then elevel will be ERROR or
> +     * perhaps even PANIC.  Punt likewise to errfinish.
> +     */
> +    if (edata->elevel >= ERROR)
> +        errfinish(filename, lineno, funcname);

I'd put a pg_unreachable() or such after the errfinish() call.


> +    /*
> +     * Else, we should package up the stack entry contents and deliver them to
> +     * the caller.
> +     */
> +    recursion_depth++;
> +
> +    /* Save the last few bits of error state into the stack entry */
> +    if (filename)
> +    {
> +        const char *slash;
> +
> +        /* keep only base name, useful especially for vpath builds */
> +        slash = strrchr(filename, '/');
> +        if (slash)
> +            filename = slash + 1;
> +        /* Some Windows compilers use backslashes in __FILE__ strings */
> +        slash = strrchr(filename, '\\');
> +        if (slash)
> +            filename = slash + 1;
> +    }
> +
> +    edata->filename = filename;
> +    edata->lineno = lineno;
> +    edata->funcname = funcname;
> +    edata->elevel = ERROR;        /* hide the LOG value used above */
> +
> +    /*
> +     * We skip calling backtrace and context functions, which are more likely
> +     * to cause trouble than provide useful context; they might act on the
> +     * assumption that a transaction abort is about to occur.
> +     */

This seems like a fair bit of duplicated code.


> + * This is the same as InputFunctionCall, but the caller may also pass a
> + * previously-initialized ErrorSaveContext node.  (We declare that as
> + * "void *" to avoid including miscnodes.h in fmgr.h.)

It seems way cleaner to forward declare ErrorSaveContext instead of
using void *.


> If escontext points
> + * to an ErrorSaveContext, any "safe" errors detected by the input function
> + * will be reported by filling the escontext struct.  The caller must
> + * check escontext->error_occurred before assuming that the function result
> + * is meaningful.

I wonder if we shouldn't instead make InputFunctionCallSafe() return a
boolean and return the Datum via a pointer. As callers are otherwise
going to need to do SAFE_ERROR_OCCURRED(escontext) themselves, I think
it should also lead to more concise (and slightly more efficient) code.


> +Datum
> +InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
> +                      Oid typioparam, int32 typmod,
> +                      void *escontext)

Is there a reason not to provide this infrastructure for
ReceiveFunctionCall() as well?


Not that I have a suggestion for a better name, but I don't particularly
like "Safe" denoting non-erroring input function calls. There's too many
interpretations of safe - e.g. safe against privilege escalation issues
or such.



> @@ -252,10 +254,13 @@ record_in(PG_FUNCTION_ARGS)
>              column_info->column_type = column_type;
>          }
>  
> -        values[i] = InputFunctionCall(&column_info->proc,
> -                                      column_data,
> -                                      column_info->typioparam,
> -                                      att->atttypmod);
> +        values[i] = InputFunctionCallSafe(&column_info->proc,
> +                                          column_data,
> +                                          column_info->typioparam,
> +                                          att->atttypmod,
> +                                          escontext);
> +        if (SAFE_ERROR_OCCURRED(escontext))
> +            PG_RETURN_NULL();

It doesn't *quite* seem right to set ->isnull in case of an error. Not
that it has an obvious harm.

Wonder if it's perhaps worth to add VALGRIND_MAKE_MEM_UNDEFINED() calls
to InputFunctionCallSafe() to more easily detect cases where a caller
ignores that an error occured.


> +            if (safe_mode)
> +            {
> +                ErrorSaveContext *es_context = cstate->es_context;
> +
> +                /* Must reset the error_occurred flag each time */
> +                es_context->error_occurred = false;

I'd put that into the if (es_context->error_occurred) path. Likely the
window for store-forwarding issues is smaller than
InputFunctionCallSafe(), but it's trivial to write it differently...


> diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
> index 285022e07c..ff77d27cfc 100644
> --- a/src/test/regress/sql/copy.sql
> +++ b/src/test/regress/sql/copy.sql
> @@ -268,3 +268,23 @@ a    c    b
>  
>  SELECT * FROM header_copytest ORDER BY a;
>  drop table header_copytest;
> +
> +-- "safe" error handling
> +create table on_error_copytest(i int, b bool, ai int[]);
> +
> +copy on_error_copytest from stdin with (null_on_error);
> +1    a    {1,}
> +err    1    {x}
> +2    f    {3,4}
> +bad    x    {,
> +\.
> +
> +copy on_error_copytest from stdin with (warn_on_error);
> +3    0    [3:4]={3,4}
> +4    b    [0:1000]={3,4}
> +err    t    {}
> +bad    z    {"zed"}
> +\.
> +
> +select * from on_error_copytest;
> +drop table on_error_copytest;

Think it'd be good to have a test for a composite type where one of the
columns safely errors out and the other doesn't.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: Transaction timeout
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] Add native windows on arm64 support