On Wed, 2005-06-01 at 10:35 -0700, Alon Goldshuv wrote:
> I have been working on improving the COPY command performance
> Around 40% for 15 column (mixed types) table.
> Around 90% for 1 column table.
Thats very cool.
> 2) A modified command syntax for introducing a direct single row error
> handling. By direct I mean - a row that if rejected from within the COPY
> command context does not throw an error and rollsback the whole transaction.
> Instead the error is caught and recorded elsewhere, maybe in some error
> table, with some more information that can later on be retrieved. The
> following rows continue to be processed. This way there is barely any error
> handling overhead. Having a recursive row isolation into smaller batches is
> extremely expensive for non-small data sets. It's not an option for serious
> users.
Can we call this the ERRORTABLE clause?
> 5) allow an ERRORLIMIT to allow control of aborting a load after a certain
> number of errors (and a pre-requisite for this is point number 2 above).
The default for which would be ERRORLIMIT 0 to give backwards
compatibility.
2) and 5) seem critical for combined usability & performance with real
world data.
I'm not clear from all of those options whether we still need a LOAD
command, based upon other issues/comments raised on this thread.
However, there are some other arguments for why it might be a good idea
to have a LOAD DATA command separate from COPY. Certainly long term
features would be easier to add with two commands. Trying to maintain
backwards compatibility just because we use COPY seems like an uphill
struggle and is going to mean we have to handle sensible new additions
as options so we don't break existing applications. The most important
one is the lock type held.
[Oracle compatibility isn't one of them, even if it did provide the
command name.]
But things will be clearer when we see the patch.
Best Regards, Simon Riggs