Thread: Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
From
Alexey Kondratov
Date:
On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On a *very* quick look, please use an enum to return from NextCopyFrom > rather than 'int'. The chunks that change bool to int are very > odd-looking. This would move the comment that explains the value from > copy.c to copy.h, obviously. Also, you seem to be using non-ASCII dashes > in the descriptions of those values; please don't. I will fix it, thank you. > > Or maybe I misunderstood the patch completely. > I hope so. Here is my thoughts how it all works, please correct me, where I am wrong: 1) First, I have simply changed ereport level to WARNING for specific validations (extra or missing columns, etc) if INGONE_ERRORS option is used. All these checks are inside NextCopyFrom. Thus, this patch performs here pretty much the same as before, excepting that it is possible to skip bad lines, and this part should be safe as well 2) About PG_TRY/CATCH. I use it to catch the only one specific function call inside NextCopyFrom--it is InputFunctionCall--which is used just to parse datatype from the input string. I have no idea how WAL write or trigger errors could get here All of these is done before actually forming a tuple, putting it into the heap, firing insert-related triggers, etc. I am not trying to catch all errors during the row processing, only input data errors. So why is it unsafe? Best, Alexey
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
From
Stephen Frost
Date:
Alexey, * Alexey Kondratov (kondratov.aleksey@gmail.com) wrote: > On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On a *very* quick look, please use an enum to return from NextCopyFrom > > rather than 'int'. The chunks that change bool to int are very > > odd-looking. This would move the comment that explains the value from > > copy.c to copy.h, obviously. Also, you seem to be using non-ASCII dashes > > in the descriptions of those values; please don't. > > I will fix it, thank you. > > > Or maybe I misunderstood the patch completely. > > I hope so. Here is my thoughts how it all works, please correct me, > where I am wrong: > > 1) First, I have simply changed ereport level to WARNING for specific > validations (extra or missing columns, etc) if INGONE_ERRORS option is > used. All these checks are inside NextCopyFrom. Thus, this patch > performs here pretty much the same as before, excepting that it is > possible to skip bad lines, and this part should be safe as well > > 2) About PG_TRY/CATCH. I use it to catch the only one specific > function call inside NextCopyFrom--it is InputFunctionCall--which is > used just to parse datatype from the input string. I have no idea how > WAL write or trigger errors could get here > > All of these is done before actually forming a tuple, putting it into > the heap, firing insert-related triggers, etc. I am not trying to > catch all errors during the row processing, only input data errors. So > why is it unsafe? The issue is that InputFunctionCall is actually calling out to other functions and they could be allocating memory and doing other things. What you're missing by using PG_TRY() here without doing a PG_RE_THROW() is all the cleanup work that AbortTransaction() and friends do- ensuring there's a valid memory context (cleaning up the ones that might have been allocated by the input function), releasing locks, resetting user and security contexts, and a whole slew of other things. Is all of that always needed for an error thrown by an input function? Hard to say, really, since input functions can be provided by extensions. You might get away with doing this for int4in(), but we also have to be thinking about extensions like PostGIS which introduce their own geometry and geography data types that are a great deal more complicated. In the end, this isn't an acceptable approach to this problem. The approach outlined previously using sub-transactions which attempted insert some number of records and then, on failure, restarted and inserted up until the failed record and then skipped it, starting over again with the next batch, seemed pretty reasonable to me. If you have time to work on that, that'd be great, otherwise we'll probably need to mark this as returned with feedback. Thanks! Stephen
Attachment
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
From
Craig Ringer
Date:
On 23 January 2018 at 15:21, Stephen Frost <sfrost@snowman.net> wrote:
Alexey,
* Alexey Kondratov (kondratov.aleksey@gmail.com) wrote:
> On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > On a *very* quick look, please use an enum to return from NextCopyFrom
> > rather than 'int'. The chunks that change bool to int are very
> > odd-looking. This would move the comment that explains the value from
> > copy.c to copy.h, obviously. Also, you seem to be using non-ASCII dashes
> > in the descriptions of those values; please don't.
>
> I will fix it, thank you.
>
> > Or maybe I misunderstood the patch completely.
>
> I hope so. Here is my thoughts how it all works, please correct me,
> where I am wrong:
>
> 1) First, I have simply changed ereport level to WARNING for specific
> validations (extra or missing columns, etc) if INGONE_ERRORS option is
> used. All these checks are inside NextCopyFrom. Thus, this patch
> performs here pretty much the same as before, excepting that it is
> possible to skip bad lines, and this part should be safe as well
>
> 2) About PG_TRY/CATCH. I use it to catch the only one specific
> function call inside NextCopyFrom--it is InputFunctionCall--which is
> used just to parse datatype from the input string. I have no idea how
> WAL write or trigger errors could get here
>
> All of these is done before actually forming a tuple, putting it into
> the heap, firing insert-related triggers, etc. I am not trying to
> catch all errors during the row processing, only input data errors. So
> why is it unsafe?
The issue is that InputFunctionCall is actually calling out to other
functions and they could be allocating memory and doing other things.
What you're missing by using PG_TRY() here without doing a PG_RE_THROW()
is all the cleanup work that AbortTransaction() and friends do- ensuring
there's a valid memory context (cleaning up the ones that might have
been allocated by the input function), releasing locks, resetting user
and security contexts, and a whole slew of other things.
Is all of that always needed for an error thrown by an input function?
Hard to say, really, since input functions can be provided by
extensions. You might get away with doing this for int4in(), but we
also have to be thinking about extensions like PostGIS which introduce
their own geometry and geography data types that are a great deal more
complicated.
For example, an input function could conceivably take a lwlock, do some work in the heapam, or whatever.
We don't have much in the way of rules about what input functions can or cannot do, so you can't assume much about their behaviour and what must / must not be cleaned up. Nor can you just reset the state in a heavy handed manner like (say) plpgsql does.
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
From
Peter Eisentraut
Date:
On 1/22/18 21:33, Craig Ringer wrote: > We don't have much in the way of rules about what input functions can or > cannot do, so you can't assume much about their behaviour and what must > / must not be cleaned up. Nor can you just reset the state in a heavy > handed manner like (say) plpgsql does. I think one thing to try would to define a special kind of exception that can safely be caught and ignored. Then, input functions can communicate benign parse errors by doing their own cleanup first, then throwing this exception, and then the COPY subsystem can deal with it. Anyway, this patch is clearly not doing this, so I'm setting it to returned with feedback. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
From
Craig Ringer
Date:
On 3 March 2018 at 13:08, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 1/22/18 21:33, Craig Ringer wrote:
> We don't have much in the way of rules about what input functions can or
> cannot do, so you can't assume much about their behaviour and what must
> / must not be cleaned up. Nor can you just reset the state in a heavy
> handed manner like (say) plpgsql does.
I think one thing to try would to define a special kind of exception
that can safely be caught and ignored. Then, input functions can
communicate benign parse errors by doing their own cleanup first, then
throwing this exception, and then the COPY subsystem can deal with it.
That makes sense. We'd only use the error code range in question when it was safe to catch without re-throw, and we'd have to enforce rules around using a specific memory context. Of course no LWLocks could be held, but that's IIRC true when throwing anyway unless you plan to proc_exit() in your handler.
People will immediately ask for it to handle RI errors too, so something similar would be needed there. But frankly, Pg's RI handling for bulk loading desperately needs a major change in how it works to make it efficient anyway, the current model of individual row triggers is horrible for bulk load performance.
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling
From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes: > On 3 March 2018 at 13:08, Peter Eisentraut <peter.eisentraut@2ndquadrant.com > wrote: >> I think one thing to try would to define a special kind of exception >> that can safely be caught and ignored. Then, input functions can >> communicate benign parse errors by doing their own cleanup first, then >> throwing this exception, and then the COPY subsystem can deal with it. > That makes sense. We'd only use the error code range in question when it > was safe to catch without re-throw, and we'd have to enforce rules around > using a specific memory context. I don't think that can possibly work. It would only be safe if, between the thrower and the catcher, there were no other levels of control operating according to the normal error-handling rules. But input functions certainly cannot assume that they are only called by COPY, so how could they safely throw a "soft error"? regards, tom lane
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
From
Peter Eisentraut
Date:
On 3/3/18 00:48, Tom Lane wrote: > I don't think that can possibly work. It would only be safe if, between > the thrower and the catcher, there were no other levels of control > operating according to the normal error-handling rules. But input > functions certainly cannot assume that they are only called by COPY, > so how could they safely throw a "soft error"? That assumes that throwing a soft error in a context that does not handle it specially is not safe. I'd imagine in such situations the soft error just behaves like a normal exception. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling
From
Pavel Stehule
Date:
2018-03-03 15:02 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:
On 3/3/18 00:48, Tom Lane wrote:
> I don't think that can possibly work. It would only be safe if, between
> the thrower and the catcher, there were no other levels of control
> operating according to the normal error-handling rules. But input
> functions certainly cannot assume that they are only called by COPY,
> so how could they safely throw a "soft error"?
That assumes that throwing a soft error in a context that does not
handle it specially is not safe. I'd imagine in such situations the
soft error just behaves like a normal exception.
This soft error solves what? If somebody use fault tolerant COPY, then he will not be satisfied, when only format errors will be ignored. Any constraints, maybe trigger errors should be ignored too.
But is true, so some other databases raises soft error on format issues, and doesn't raise a exception. Isn't better to use some alternative algorithm like
under subtransaction try to insert block of 1000 lines. When some fails do rollback and try to import per block of 100 lines, and try to find wrong line?
or another way - for this specific case reduce the cost of subtransaction? This is special case, subtransaction under one command.
Regards
Pavel
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling
From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 3/3/18 00:48, Tom Lane wrote: >> I don't think that can possibly work. It would only be safe if, between >> the thrower and the catcher, there were no other levels of control >> operating according to the normal error-handling rules. But input >> functions certainly cannot assume that they are only called by COPY, >> so how could they safely throw a "soft error"? > That assumes that throwing a soft error in a context that does not > handle it specially is not safe. I'd imagine in such situations the > soft error just behaves like a normal exception. My point is that neither the thrower of the error, nor the catcher of it, can possibly guarantee that there were no levels of control in between that were expecting their resource leakages to be cleaned up by normal error recovery. As a counterexample consider the chain of control COPY -> domain_in -> SQL function in CHECK -> int4in If int4in throws a "soft" error, and COPY catches it and skips error cleanup, we've likely got problems, depending on what the SQL function did. This could even break domain_in, although probably any such effort would've taken care to harden domain_in against the issue. But the possibility of SQL or PL functions having done nearly-arbitrary things in between means that most of the backend is at hazard. You could imagine making something like this work, but it's gonna take very invasive and bug-inducing changes to our error cleanup rules, affecting a ton of places that have no connection to the goal. regards, tom lane