Re: Error-safe user functions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Error-safe user functions
Date
Msg-id 3190059.1665164276@sss.pgh.pa.us
Whole thread Raw
In response to Error-safe user functions  (Nikita Glukhov <n.gluhov@postgrespro.ru>)
Responses Re: Error-safe user functions  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Nikita Glukhov <n.gluhov@postgrespro.ru> writes:
> On 30.09.2022, Tom Lane wrote:
>> I strongly recommend against having a new pg_proc column at all.

> I think the only way to avoid catalog modification (adding new columns
> to pg_proc or pg_type, introducing new function signatures etc.) and
> to avoid adding some additional code to the entry of error-safe
> functions is to bump version of our calling convention.  I simply added
> flag Pg_finfo_record.errorsafe which is set to true when the new
> PG_FUNCTION_INFO_V2_ERRORSAFE() macro is used.

I don't think you got my point at all.

I do not think we need a new pg_proc column (btw, touching pg_proc.dat
is morally equivalent to a pg_proc column), and I do not think we need
a new call-convention version either, because I think that this sort
of thing:

+        /* check whether input function supports returning errors */
+        if (cstate->opts.null_on_error_flags[attnum - 1] &&
+            !func_is_error_safe(in_func_oid))
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("input function for datatype \"%s\" does not support error handling",
+                            format_type_be(att->atttypid))));

is useless.  It does not benefit anybody to pre-emptively throw an error
because you are afraid that some other code might throw an error later.
That just converts "might fail" to "guaranteed to fail" --- how is that
better?

I think what we want is to document options like NULL_ON_ERROR along the
lines of

    If the input data in one of the specified columns is invalid,
    set the column's value to NULL instead of reporting an error.
    This feature will only work if the column datatype's input
    function has been upgraded to support it; otherwise, an invalid
    input value will result in an error anyway.

and just leave it on the heads of extension authors to get their
code moved forward.  (If we fail to get all the core types converted
by the time v16 reaches feature freeze, we'd have to add some docs
about which ones support this; but maybe we will get all that done
and not need documentation effort.)

Some other recommendations:

* The primary work-product from an initial patch of this sort is an
API specification.  Therefore, your 0001 patch ought to be introducing
some prose documentation somewhere (and I don't mean comments in elog.h,
rather a README file or even the SGML docs --- utils/fmgr/README might
be a good spot).  Getting that text right so that people understand
what to do is more important than any single code detail.  You are not
winning any fans by not bothering with code comments such as per-function
header comments, either.

* Submitting 16-part patch series is a good way to discourage people
from reviewing your work.  I'd toss most of the datatype conversions
overboard for the moment, planning to address them later once the core
patch is committed.  The initial patchset only needs to have one or two
data types done as proof-of-concept.

* I'd toss the test module overboard too.  Once you've got COPY using
the feature, that's a perfectly good testbed.  The effort spent on
the test module would have been better spent on making the COPY support
more complete (ie, get rid of the silly restriction to CSV).

* The 0015 and 0016 patches don't seem to belong here either.  It's
impossible to review these when the code is neither commented nor
connected to any use-case.

* I think that the ereturn macro is the right idea, but I don't understand
the rationale for also inventing PG_RETURN_ERROR.  Also, ereturn's
implementation isn't great --- I don't like duplicating the __VA_ARGS__
text, because that will lead to substantial code bloat.  It'd likely
work better to make ereturn very much like ereport, except with a
different finishing function that contains the throw-or-not logic.
As a small nitpick, I think I'd make ereturn's argument order be return
value then edata then ...; it just seems more sensible that way.

* execnodes.h seems like an *extremely* random place to put struct
FuncCallError; that will force inclusion of execnodes.h in many places
that did not need it before.  Possibly fmgr.h is the right place for it?
In general you need to think about avoiding major inclusion bloat
(and I wonder whether the patchset passes cpluspluscheck).  It might
help to treat ereturn's edata argument as just "void *" and confine
references to the FuncCallError struct to the errfinish-replacement
subroutine, ie drop the tests in PG_GET_ERROR_PTR and do that check
inside elog.c.

* I wonder if there's a way to avoid the CopyErrorData and FreeErrorData
steps in use-cases like this --- that's pure overhead, really, for
COPY's purposes, and it's probably not the only use-case that will
think so.  Maybe we could complicate FuncCallError a little and pass
a flag that indicates that we only want to know whether an error
occurred, not what it was exactly.  On the other hand, if you assume
that errors should be rare, maybe that's useless micro-optimization.

Basically, this patch set should be a lot smaller and not have ambitions
beyond "get the API right" and "make one or two datatypes support COPY
NULL_ON_ERROR".  Add more code once that core functionality gets reviewed
and committed.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Melih Mutlu
Date:
Subject: Re: Mingw task for Cirrus CI
Next
From: Tom Lane
Date:
Subject: Re: Avoid mix char with bool type in comparisons