Re: COPY enhancements - Mailing list pgsql-hackers

From Robert Haas
Subject Re: COPY enhancements
Date
Msg-id 603c8f070910080832o3b83a332p63575301a44c4c23@mail.gmail.com
Whole thread Raw
In response to Re: COPY enhancements  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: COPY enhancements
Re: COPY enhancements
Re: COPY enhancements
List pgsql-hackers
On Thu, Oct 8, 2009 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Lest there be any unclarity, I am NOT trying to shoot down this
>> feature with my laser-powered bazooka.
>
> Well, if you need somebody to do that

Well, I'm trying not to demoralize people who have put in hard work,
however much it may not be usable.  Still, your points are well taken.I did raise some of them (with a lot less
technicaldetail) in my 
review of last night.

> So as far as I can see, the only form of COPY error handling that
> wouldn't be a cruel joke is to run a separate subtransaction for each
> row, and roll back the subtransaction on error.  Of course the problems
> with that are (a) speed, (b) the 2^32 limit on command counter IDs
> would mean a max of 2^32 rows per COPY, which is uncomfortably small
> these days.  Previous discussions of the problem have mentioned trying
> to batch multiple rows per subtransaction to alleviate both issues.
> Not easy of course, but that's why it's not been done yet.  With a
> patch like this you'd also have (c) how to avoid rolling back the
> insertions into the logging table.

Yeah.  I think it's going to be hard to make this work without having
standalone transactions.  One idea would be to start a subtransaction,
insert tuples until one fails, then rollback the subtransaction and
start a new one, and continue on until the error limit is reached.  At
the end, if the number of rollbacks is > 0, then roll back the final
subtransaction also.  This wouldn't have the property of getting the
unerrorred data into the table, but at least it would let you report
all the errors in a single pass, hopefully without being gratingly
slow.  Subcommitting every single row is going to be really painful,
especially after Hot Standby goes in and we have to issue a WAL record
after every 64 subtransactions (AIUI).

Another possible approach, which isn't perfect either, is the idea of
allowing COPY to generate a single column of output of type text[].
That greatly reduces the number of possible error cases, and at least
gets the data into the DB where you can hack on it.  But it's still
going to be painful for some use cases.

...Robert


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: COPY enhancements
Next
From: Robert Haas
Date:
Subject: Re: COPY enhancements