Thread: Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

From

Alexey Kondratov

Date:

01 December 2017, 17:33:04

On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On a *very* quick look, please use an enum to return from NextCopyFrom
> rather than 'int'.  The chunks that change bool to int are very
> odd-looking.  This would move the comment that explains the value from
> copy.c to copy.h, obviously.  Also, you seem to be using non-ASCII dashes
> in the descriptions of those values; please don't.

I will fix it, thank you.

>
> Or maybe I misunderstood the patch completely.
>

I hope so. Here is my thoughts how it all works, please correct me,
where I am wrong:

1) First, I have simply changed ereport level to WARNING for specific
validations (extra or missing columns, etc) if INGONE_ERRORS option is
used. All these checks are inside NextCopyFrom. Thus, this patch
performs here pretty much the same as before, excepting that it is
possible to skip bad lines, and this part should be safe as well

2) About PG_TRY/CATCH. I use it to catch the only one specific
function call inside NextCopyFrom--it is InputFunctionCall--which is
used just to parse datatype from the input string. I have no idea how
WAL write or trigger errors could get here

All of these is done before actually forming a tuple, putting it into
the heap, firing insert-related triggers, etc. I am not trying to
catch all errors during the row processing, only input data errors. So
why is it unsafe?

Best,

Alexey

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

From

Stephen Frost

Date:

23 January 2018, 05:21:22

Alexey,

* Alexey Kondratov (kondratov.aleksey@gmail.com) wrote:
> On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > On a *very* quick look, please use an enum to return from NextCopyFrom
> > rather than 'int'.  The chunks that change bool to int are very
> > odd-looking.  This would move the comment that explains the value from
> > copy.c to copy.h, obviously.  Also, you seem to be using non-ASCII dashes
> > in the descriptions of those values; please don't.
>
> I will fix it, thank you.
>
> > Or maybe I misunderstood the patch completely.
>
> I hope so. Here is my thoughts how it all works, please correct me,
> where I am wrong:
>
> 1) First, I have simply changed ereport level to WARNING for specific
> validations (extra or missing columns, etc) if INGONE_ERRORS option is
> used. All these checks are inside NextCopyFrom. Thus, this patch
> performs here pretty much the same as before, excepting that it is
> possible to skip bad lines, and this part should be safe as well
>
> 2) About PG_TRY/CATCH. I use it to catch the only one specific
> function call inside NextCopyFrom--it is InputFunctionCall--which is
> used just to parse datatype from the input string. I have no idea how
> WAL write or trigger errors could get here
>
> All of these is done before actually forming a tuple, putting it into
> the heap, firing insert-related triggers, etc. I am not trying to
> catch all errors during the row processing, only input data errors. So
> why is it unsafe?

The issue is that InputFunctionCall is actually calling out to other
functions and they could be allocating memory and doing other things.
What you're missing by using PG_TRY() here without doing a PG_RE_THROW()
is all the cleanup work that AbortTransaction() and friends do- ensuring
there's a valid memory context (cleaning up the ones that might have
been allocated by the input function), releasing locks, resetting user
and security contexts, and a whole slew of other things.

Is all of that always needed for an error thrown by an input function?
Hard to say, really, since input functions can be provided by
extensions.  You might get away with doing this for int4in(), but we
also have to be thinking about extensions like PostGIS which introduce
their own geometry and geography data types that are a great deal more
complicated.

In the end, this isn't an acceptable approach to this problem.  The
approach outlined previously using sub-transactions which attempted
insert some number of records and then, on failure, restarted and
inserted up until the failed record and then skipped it, starting over
again with the next batch, seemed pretty reasonable to me.  If you have
time to work on that, that'd be great, otherwise we'll probably need to
mark this as returned with feedback.

Thanks!

Stephen

Attachment

signature.asc

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

From

Craig Ringer

Date:

23 January 2018, 05:33:01

On 23 January 2018 at 15:21, Stephen Frost <sfrost@snowman.net> wrote:

Alexey,

* Alexey Kondratov (kondratov.aleksey@gmail.com) wrote:
> On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > On a *very* quick look, please use an enum to return from NextCopyFrom
> > rather than 'int'. The chunks that change bool to int are very
> > odd-looking. This would move the comment that explains the value from
> > copy.c to copy.h, obviously. Also, you seem to be using non-ASCII dashes
> > in the descriptions of those values; please don't.
>
> I will fix it, thank you.
>
> > Or maybe I misunderstood the patch completely.
>
> I hope so. Here is my thoughts how it all works, please correct me,
> where I am wrong:
>
> 1) First, I have simply changed ereport level to WARNING for specific
> validations (extra or missing columns, etc) if INGONE_ERRORS option is
> used. All these checks are inside NextCopyFrom. Thus, this patch
> performs here pretty much the same as before, excepting that it is
> possible to skip bad lines, and this part should be safe as well
>
> 2) About PG_TRY/CATCH. I use it to catch the only one specific
> function call inside NextCopyFrom--it is InputFunctionCall--which is
> used just to parse datatype from the input string. I have no idea how
> WAL write or trigger errors could get here
>
> All of these is done before actually forming a tuple, putting it into
> the heap, firing insert-related triggers, etc. I am not trying to
> catch all errors during the row processing, only input data errors. So
> why is it unsafe?

The issue is that InputFunctionCall is actually calling out to other
functions and they could be allocating memory and doing other things.
What you're missing by using PG_TRY() here without doing a PG_RE_THROW()
is all the cleanup work that AbortTransaction() and friends do- ensuring
there's a valid memory context (cleaning up the ones that might have
been allocated by the input function), releasing locks, resetting user
and security contexts, and a whole slew of other things.

Is all of that always needed for an error thrown by an input function?
Hard to say, really, since input functions can be provided by
extensions. You might get away with doing this for int4in(), but we
also have to be thinking about extensions like PostGIS which introduce
their own geometry and geography data types that are a great deal more
complicated.

For example, an input function could conceivably take a lwlock, do some work in the heapam, or whatever.

We don't have much in the way of rules about what input functions can or cannot do, so you can't assume much about their behaviour and what must / must not be cleaned up. Nor can you just reset the state in a heavy handed manner like (say) plpgsql does.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

From

Peter Eisentraut

Date:

03 March 2018, 08:08:45

On 1/22/18 21:33, Craig Ringer wrote:
> We don't have much in the way of rules about what input functions can or
> cannot do, so you can't assume much about their behaviour and what must
> / must not be cleaned up. Nor can you just reset the state in a heavy
> handed manner like (say) plpgsql does.

I think one thing to try would to define a special kind of exception
that can safely be caught and ignored.  Then, input functions can
communicate benign parse errors by doing their own cleanup first, then
throwing this exception, and then the COPY subsystem can deal with it.

Anyway, this patch is clearly not doing this, so I'm setting it to
returned with feedback.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

From

Craig Ringer

Date:

03 March 2018, 08:29:15

On 3 March 2018 at 13:08, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 1/22/18 21:33, Craig Ringer wrote:
> We don't have much in the way of rules about what input functions can or
> cannot do, so you can't assume much about their behaviour and what must
> / must not be cleaned up. Nor can you just reset the state in a heavy
> handed manner like (say) plpgsql does.

I think one thing to try would to define a special kind of exception
that can safely be caught and ignored. Then, input functions can
communicate benign parse errors by doing their own cleanup first, then
throwing this exception, and then the COPY subsystem can deal with it.

That makes sense. We'd only use the error code range in question when it was safe to catch without re-throw, and we'd have to enforce rules around using a specific memory context. Of course no LWLocks could be held, but that's IIRC true when throwing anyway unless you plan to proc_exit() in your handler.

People will immediately ask for it to handle RI errors too, so something similar would be needed there. But frankly, Pg's RI handling for bulk loading desperately needs a major change in how it works to make it efficient anyway, the current model of individual row triggers is horrible for bulk load performance.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

From

Tom Lane

Date:

03 March 2018, 08:48:26

Craig Ringer <craig@2ndquadrant.com> writes:
> On 3 March 2018 at 13:08, Peter Eisentraut <peter.eisentraut@2ndquadrant.com
> wrote:
>> I think one thing to try would to define a special kind of exception
>> that can safely be caught and ignored.  Then, input functions can
>> communicate benign parse errors by doing their own cleanup first, then
>> throwing this exception, and then the COPY subsystem can deal with it.

> That makes sense. We'd only use the error code range in question when it
> was safe to catch without re-throw, and we'd have to enforce rules around
> using a specific memory context.

I don't think that can possibly work.  It would only be safe if, between
the thrower and the catcher, there were no other levels of control
operating according to the normal error-handling rules.  But input
functions certainly cannot assume that they are only called by COPY,
so how could they safely throw a "soft error"?

            regards, tom lane

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

From

Peter Eisentraut

Date:

03 March 2018, 17:02:23

On 3/3/18 00:48, Tom Lane wrote:
> I don't think that can possibly work.  It would only be safe if, between
> the thrower and the catcher, there were no other levels of control
> operating according to the normal error-handling rules.  But input
> functions certainly cannot assume that they are only called by COPY,
> so how could they safely throw a "soft error"?

That assumes that throwing a soft error in a context that does not
handle it specially is not safe.  I'd imagine in such situations the
soft error just behaves like a normal exception.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY executionwith errors handling

From

Pavel Stehule

Date:

03 March 2018, 17:30:02

2018-03-03 15:02 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:

On 3/3/18 00:48, Tom Lane wrote:
> I don't think that can possibly work. It would only be safe if, between
> the thrower and the catcher, there were no other levels of control
> operating according to the normal error-handling rules. But input
> functions certainly cannot assume that they are only called by COPY,
> so how could they safely throw a "soft error"?

That assumes that throwing a soft error in a context that does not
handle it specially is not safe. I'd imagine in such situations the
soft error just behaves like a normal exception.

This soft error solves what? If somebody use fault tolerant COPY, then he will not be satisfied, when only format errors will be ignored. Any constraints, maybe trigger errors should be ignored too.

But is true, so some other databases raises soft error on format issues, and doesn't raise a exception. Isn't better to use some alternative algorithm like

under subtransaction try to insert block of 1000 lines. When some fails do rollback and try to import per block of 100 lines, and try to find wrong line?

or another way - for this specific case reduce the cost of subtransaction? This is special case, subtransaction under one command.

Regards

Pavel

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

From

Tom Lane

Date:

03 March 2018, 18:21:36

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 3/3/18 00:48, Tom Lane wrote:
>> I don't think that can possibly work.  It would only be safe if, between
>> the thrower and the catcher, there were no other levels of control
>> operating according to the normal error-handling rules.  But input
>> functions certainly cannot assume that they are only called by COPY,
>> so how could they safely throw a "soft error"?

> That assumes that throwing a soft error in a context that does not
> handle it specially is not safe.  I'd imagine in such situations the
> soft error just behaves like a normal exception.

My point is that neither the thrower of the error, nor the catcher of it,
can possibly guarantee that there were no levels of control in between
that were expecting their resource leakages to be cleaned up by normal
error recovery.  As a counterexample consider the chain of control

    COPY -> domain_in -> SQL function in CHECK -> int4in

If int4in throws a "soft" error, and COPY catches it and skips
error cleanup, we've likely got problems, depending on what the SQL
function did.  This could even break domain_in, although probably any
such effort would've taken care to harden domain_in against the issue.
But the possibility of SQL or PL functions having done nearly-arbitrary
things in between means that most of the backend is at hazard.

You could imagine making something like this work, but it's gonna take
very invasive and bug-inducing changes to our error cleanup rules,
affecting a ton of places that have no connection to the goal.

            regards, tom lane