Home > mailing lists

Re: Using the database to validate data - Mailing list pgsql-general

From	Zdeněk Bělehrádek
Subject	Re: Using the database to validate data
Date	July 24, 2015 12:18:00
Msg-id	2321692.svzMfXlGnE@blue Whole thread
In response to	Using the database to validate data (JPLapham <lapham@jandr.org>)
Responses	Re: Using the database to validate data Re: Using the database to validate data
List	pgsql-general

Tree view

What about creating a SAVEPOINT before each INSERT, and if the INSERT returns
an error, then ROLLBACK TO SAVEPOINT? This way you will have all the
insertable data in your table, and you can still ROLLBACK the whole
transaction, or COMMIT it if there were no errors.

It will probably be quite slow, but if you have only thousands of lines, it
should be fast enough for your usecase IMHO.

-- Zdeněk Bělehrádek

> Hello,
>
> I have an application that occasionally performs large batch inserts of user
> hand-generated data. Input is a tab delimited file with typically hundreds
> to a thousand lines of data.
>
> Because the data is generated by hand, there are always many
> transaction-stopping errors in a typical input run. For example, missing
> datum in a NOT NULL column, a duplicate value in a UNIQUE column, data type
> mismatch, FOREIGN KEY reference to something non-existing, etc. Of course,
> we chose PostgreSQL exactly because of these problems, because of the robust
> transactional control, rollback on errors, etc.
>
> My question is the following. I would like to *test* the data input for
> integrity in such a way that I can create a report to the user informing
> them of exactly where in their input file to correct the problems.
>
> IDEA 1: My first attempt at this was to simply slurp the data into the
> database, collect the errors, and then rollback. Of course (as I now know),
> this doesn't work because after the first problem, the database reports,
> "current transaction is aborted, commands ignored until end of transaction
> block". This means that I can only report to the user the location of the
> first problem, and then they run the data again, and keep looping through
> the process until the data is good, a huge waste of time.
>
> IDEA 2: My second idea on how to do this was to ROLLBACK after each INSERT.
> This allows me to check for things like NOT NULL and data type issues, but
> not violations of UNIQUE within the new data.
>
> IDEA 3: Write my own data pre-conditioner. Ugh, what a nightmare, I feel
> like I'm writing my own database! Checking for FKEY constraints, UNIQUE, etc
> is not trivial. It seems ridiculous to do this when I have the *actual*
> database available to test against!
>
> Has anyone dealt with this kind of issue before? What are your opinions on
> best practice for this? Of course I do not want to actually COMMIT until the
> data is perfect!
>
> Thanks for your time!
> -Jon
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/Using-the-database-to-validate-data-tp5859046.
> html Sent from the PostgreSQL - general mailing list archive at Nabble.com.

pgsql-general by date:

From: Andrew Sullivan
Date: 24 July 2015, 10:09:29
Subject: Re: Delete rule does not prevent truncate

From: Adrian Klaver
Date: 24 July 2015, 13:37:13
Subject: Re: Delete rule does not prevent truncate

Re: Using the database to validate data - Mailing list pgsql-general

Previous

Next