Home > mailing lists

Using the database to validate data - Mailing list pgsql-general

From	JPLapham
Subject	Using the database to validate data
Date	July 23, 2015 17:53:51
Msg-id	1437656152535-5859046.post@n5.nabble.com Whole thread Raw
Responses	Re: Using the database to validate data Re: Using the database to validate data Re: Using the database to validate data
List	pgsql-general

Tree view

Hello,

I have an application that occasionally performs large batch inserts of user
hand-generated data. Input is a tab delimited file with typically hundreds
to a thousand lines of data.

Because the data is generated by hand, there are always many
transaction-stopping errors in a typical input run. For example, missing
datum in a NOT NULL column, a duplicate value in a UNIQUE column, data type
mismatch, FOREIGN KEY reference to something non-existing, etc. Of course,
we chose PostgreSQL exactly because of these problems, because of the robust
transactional control, rollback on errors, etc.

My question is the following. I would like to *test* the data input for
integrity in such a way that I can create a report to the user informing
them of exactly where in their input file to correct the problems.

IDEA 1: My first attempt at this was to simply slurp the data into the
database, collect the errors, and then rollback. Of course (as I now know),
this doesn't work because after the first problem, the database reports,
"current transaction is aborted, commands ignored until end of transaction
block". This means that I can only report to the user the location of the
first problem, and then they run the data again, and keep looping through
the process until the data is good, a huge waste of time.

IDEA 2: My second idea on how to do this was to ROLLBACK after each INSERT.
This allows me to check for things like NOT NULL and data type issues, but
not violations of UNIQUE within the new data.

IDEA 3: Write my own data pre-conditioner. Ugh, what a nightmare, I feel
like I'm writing my own database! Checking for FKEY constraints, UNIQUE, etc
is not trivial. It seems ridiculous to do this when I have the *actual*
database available to test against!

Has anyone dealt with this kind of issue before? What are your opinions on
best practice for this? Of course I do not want to actually COMMIT until the
data is perfect!

Thanks for your time!
-Jon

--
View this message in context: http://postgresql.nabble.com/Using-the-database-to-validate-data-tp5859046.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

pgsql-general by date:

From: Tim Smith
Date: 23 July 2015, 17:53:45
Subject: Re: Delete rule does not prevent truncate

From: Adrian Klaver
Date: 23 July 2015, 18:02:06
Subject: Re: Using the database to validate data

Using the database to validate data - Mailing list pgsql-general

Previous

Next