Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
Date
Msg-id CAMsr+YETfrNbrnuiSjSZq8rhWiFiCKAzNz62iBmoehhd+kpgAQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling  (Stephen Frost <sfrost@snowman.net>)
Responses Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
List pgsql-hackers
On 23 January 2018 at 15:21, Stephen Frost <sfrost@snowman.net> wrote:
Alexey,

* Alexey Kondratov (kondratov.aleksey@gmail.com) wrote:
> On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > On a *very* quick look, please use an enum to return from NextCopyFrom
> > rather than 'int'.  The chunks that change bool to int are very
> > odd-looking.  This would move the comment that explains the value from
> > copy.c to copy.h, obviously.  Also, you seem to be using non-ASCII dashes
> > in the descriptions of those values; please don't.
>
> I will fix it, thank you.
>
> > Or maybe I misunderstood the patch completely.
>
> I hope so. Here is my thoughts how it all works, please correct me,
> where I am wrong:
>
> 1) First, I have simply changed ereport level to WARNING for specific
> validations (extra or missing columns, etc) if INGONE_ERRORS option is
> used. All these checks are inside NextCopyFrom. Thus, this patch
> performs here pretty much the same as before, excepting that it is
> possible to skip bad lines, and this part should be safe as well
>
> 2) About PG_TRY/CATCH. I use it to catch the only one specific
> function call inside NextCopyFrom--it is InputFunctionCall--which is
> used just to parse datatype from the input string. I have no idea how
> WAL write or trigger errors could get here
>
> All of these is done before actually forming a tuple, putting it into
> the heap, firing insert-related triggers, etc. I am not trying to
> catch all errors during the row processing, only input data errors. So
> why is it unsafe?

The issue is that InputFunctionCall is actually calling out to other
functions and they could be allocating memory and doing other things.
What you're missing by using PG_TRY() here without doing a PG_RE_THROW()
is all the cleanup work that AbortTransaction() and friends do- ensuring
there's a valid memory context (cleaning up the ones that might have
been allocated by the input function), releasing locks, resetting user
and security contexts, and a whole slew of other things.

Is all of that always needed for an error thrown by an input function?
Hard to say, really, since input functions can be provided by
extensions.  You might get away with doing this for int4in(), but we
also have to be thinking about extensions like PostGIS which introduce
their own geometry and geography data types that are a great deal more
complicated.


For example, an input function could conceivably take a lwlock, do some work in the heapam, or whatever.

We don't have much in the way of rules about what input functions can or cannot do, so you can't assume much about their behaviour and what must / must not be cleaned up. Nor can you just reset the state in a heavy handed manner like (say) plpgsql does.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Handling better supported channel binding types for SSLimplementations
Next
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] session_replication_role = replica with TRUNCATE