Re: Using the database to validate data - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Using the database to validate data
Date
Msg-id 55B12C1E.9080500@aklaver.com
Whole thread Raw
In response to Using the database to validate data  (JPLapham <lapham@jandr.org>)
Responses Re: Using the database to validate data  (Jon Lapham <lapham@jandr.org>)
List pgsql-general
On 07/23/2015 05:55 AM, JPLapham wrote:
> Hello,
>
> I have an application that occasionally performs large batch inserts of user
> hand-generated data. Input is a tab delimited file with typically hundreds
> to a thousand lines of data.
>
> Because the data is generated by hand, there are always many
> transaction-stopping errors in a typical input run. For example, missing
> datum in a NOT NULL column, a duplicate value in a UNIQUE column, data type
> mismatch, FOREIGN KEY reference to something non-existing, etc. Of course,
> we chose PostgreSQL exactly because of these problems, because of the robust
> transactional control, rollback on errors, etc.
>
> My question is the following. I would like to *test* the data input for
> integrity in such a way that I can create a report to the user informing
> them of exactly where in their input file to correct the problems.
>
> IDEA 1: My first attempt at this was to simply slurp the data into the
> database, collect the errors, and then rollback. Of course (as I now know),
> this doesn't work because after the first problem, the database reports,
> "current transaction is aborted, commands ignored until end of transaction
> block". This means that I can only report to the user the location of the
> first problem, and then they run the data again, and keep looping through
> the process until the data is good, a huge waste of time.
>
> IDEA 2: My second idea on how to do this was to ROLLBACK after each INSERT.
> This allows me to check for things like NOT NULL and data type issues, but
> not violations of UNIQUE within the new data.
>
> IDEA 3: Write my own data pre-conditioner. Ugh, what a nightmare, I feel
> like I'm writing my own database! Checking for FKEY constraints, UNIQUE, etc
> is not trivial. It seems ridiculous to do this when I have the *actual*
> database available to test against!

That has already been done:

http://pgloader.io/


>
> Has anyone dealt with this kind of issue before? What are your opinions on
> best practice for this? Of course I do not want to actually COMMIT until the
> data is perfect!
>
> Thanks for your time!
> -Jon
>
>
>
> --
> View this message in context: http://postgresql.nabble.com/Using-the-database-to-validate-data-tp5859046.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: JPLapham
Date:
Subject: Using the database to validate data
Next
From: Dane Foster
Date:
Subject: Re: Q: text query search and