Re: Finding Errors in .csv Input Data - Mailing list pgsql-general

From David Johnston
Subject Re: Finding Errors in .csv Input Data
Date
Msg-id 010f01cbd30e$c4a6cde0$4df469a0$@yahoo.com
Whole thread Raw
In response to Re: Finding Errors in .csv Input Data  (Rich Shepard <rshepard@appl-ecosys.com>)
Responses Re: Finding Errors in .csv Input Data  (Rich Shepard <rshepard@appl-ecosys.com>)
List pgsql-general
Rich,

The data and table structure provided do not seem to correlate.

Regardless, if you changed the delimiter to "|" from "," it is possible that
you converted an embedded "," in one of the textual fields into a "|" when
you should not have.

For Instance:

Value1,value2,"value, with comma",value4

Thus became

Value1|value2|value|with comma|value4

Giving you additional fields that should not be present.

I suggest opening up a testing file (one with the 80 malformed records and
10 to 20 good/control records) in an Excel or equivalent spreadsheet and
import/text-to-columns using the "|" delimiter.  You will be able to quickly
see rows with extra columns and specifically where those extras are
originating.  Then you can decide on how to fix the problem.  I would
suggest manually changing each incorrect "|" into a "," as a form of
self-punishment for improper data conversion - but whatever works for you.

In the odd chance you do not see extra columns in the spreadsheet you can
also scan down the columns and see if you recognize any pattern of
differences between the 80 failing records and the 10 successful records
that might point you further in the correct direction.

If THAT fails you might want to see if anyone will receive the testing file
and "CREATE TABLE" and try to independently diagnose the cause of the
problem.  That or do the Access export and skip your conversion routine and
directly import the CSV into Postgres.

David J.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Rich Shepard
Sent: Tuesday, February 22, 2011 9:59 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Finding Errors in .csv Input Data

On Tue, 22 Feb 2011, Adrian Klaver wrote:

> We are going to need to see at least a sample of the actual data in
> one.csv that is causing the problem. You have an off by two error as
> you suggest, but that could actually have happened earlier in the row.
> For instance the well_finish_date would insert into lot_no because
> lot_no is TEXT and the date value at this point is just text. Same
> with date_cmplt_acc and block_no.

Adrian/Andy,

   The data came out of Access as comma-and-quote csv. I massaged it in
emacs and sed to change the delimiter to a pipe rather than a comma and
removed the double quotes.
Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


pgsql-general by date:

Previous
From: mezgani ali
Date:
Subject: Re: Mysql to Postgresql
Next
From: Adarsh Sharma
Date:
Subject: Backup Fails