Re: error tolerant inserting of data - Mailing list pgsql-general

From Jules Alberts
Subject Re: error tolerant inserting of data
Date
Msg-id 20030306080956.10B011CB1DB@koshin.dsl.systemec.nl
Whole thread Raw
In response to Re: error tolerant inserting of data  ("Nigel J. Andrews" <nandrews@investsystems.co.uk>)
List pgsql-general
Op 5 Mar 2003 (14:36), schreef Nigel J. Andrews <nandrews@investsystems.co.uk>:
> On Wed, 5 Mar 2003, Jules Alberts wrote:
> > Hello everybody,
> > I'm working on a new pg database that will contain all data that's
> > currently in our old system. The data is already in the db, in tables
> > with names like import_table1, import_table2 etc., now I have to
> > transfer them to the real tables. The real tables have some
> > constraints, like trigger functions to prevent overlapping date-
> > periods.
> >
> > I have written pl/pgsql code to do the importing bit, but there is one
> > big drawback: as soon as anything in a function (or in a FOR ... SELECT
> > ... LOOP) goes wrong, everything is rolled back. Normally this
> > behaviour would be fine, but not in this situation, because my old data
> > contains rows sometimes conflicts with the constraints.
> >
> > What I want is all conflicting rows to be not imported, just plainly be
> > refused, and let everything that's OK be imported. What is the best way
> > to do this?
>
> I think the only way is to do the conflict testing in your code and not try the
> insert if it's going to fail.

Yes, I did tried and it works but this is _very_ slow.

> It'll slow the process down but if you think in terms of issuing a begin at the
> start of a application and you are hoping for a commit at the end then
> obviously you have to protect against situations that you know will give rise
> to an error. For example, if you have some data from a form that a user
> completes when you come to save that data you may not know just from that data
> that it a foreign key constraint will pass so you have to check that before
> trying an insert otherwise or transaction will abort.

I know, and that OK when the system goes live. But right now I'm
importing data and all I want is the good data to be imported as
swiftly as possible while tha bad data is dropped. The data is
somewhere between 500Mb and 1Gb and the transition from the old system
to the new will be during a weekend (big bang).

When I do errorchecking before actually importing the data one weekend
will be _way_ too short :-(

pgsql-general by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: Alpha-2 of contrib/tsearch
Next
From:
Date:
Subject: Re: My contract has been renewed