Re: VLDB Features - Mailing list pgsql-hackers

From Tom Lane
Subject Re: VLDB Features
Date
Msg-id 16876.1197680636@sss.pgh.pa.us
Whole thread Raw
In response to Re: VLDB Features  (Neil Conway <neilc@samurai.com>)
Responses Re: VLDB Features  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
Neil Conway <neilc@samurai.com> writes:
> One approach would be to essentially implement the pg_bulkloader
> approach inside the backend. That is, begin by doing a subtransaction
> for every k rows (with k = 1000, say). If you get any errors, then
> either repeat the process with k/2 until you locate the individual
> row(s) causing the trouble, or perhaps just immediately switch to k = 1.
> Fairly ugly though, and would be quite slow for data sets with a high
> proportion of erroneous data.

You could make it self-tuning, perhaps: initially, or after an error,
set k = 1, and increase k after a successful set of rows.

> Another approach would be to distinguish between errors that require a
> subtransaction to recover to a consistent state, and less serious errors
> that don't have this requirement (e.g. invalid input to a data type
> input function). If all the errors that we want to tolerate during a
> bulk load fall into the latter category, we can do without
> subtransactions.

I think such an approach is doomed to hopeless unreliability.  There is
no concept of an error that doesn't require a transaction abort in the
system now, and that doesn't seem to me like something that can be
successfully bolted on after the fact.  Also, there's a lot of
bookkeeping (eg buffer pins) that has to be cleaned up regardless of the
exact nature of the error, and all those mechanisms are hung off
transactions.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Neil Conway
Date:
Subject: Re: VLDB Features
Next
From: Josh Berkus
Date:
Subject: Re: VLDB Features