Re: Finding Errors in .csv Input Data - Mailing list pgsql-general

From Andy Colson
Subject Re: Finding Errors in .csv Input Data
Date
Msg-id 4D6461FA.5080405@squeakycode.net
Whole thread Raw
In response to Finding Errors in .csv Input Data  (Rich Shepard <rshepard@appl-ecosys.com>)
Responses Re: Finding Errors in .csv Input Data  (Andy Colson <andy@squeakycode.net>)
List pgsql-general
On 02/22/2011 07:10 PM, Rich Shepard wrote:
> I'm sure many of you have solved this problem in the past and can offer
> solutions that will work for me. The context is a 73-column postgres table
> of data that was originally in an Access .mdb file. A colleague loaded the
> file into Access and wrote a .csv file for me to use since we have nothing
> Microsoft here. There are 110,752 rows in the file/table. After a lot of
> cleaning with emacs and sed, the copy command accepted all but 80 rows of
> data. Now I need to figure out why postgres reports them as having too many
> columns.
>
> Starting to work with a single row, I first compared by cleaned row to the
> raw .csv from the Access output. They match column-by-column. Then I copied
> the schema to a text file and started comparing the .csv data
> column-by-column. While this looks OK to me, postgres doesn't like it. For
> example, I get this error message:
>
> nevada=# \copy water_well from 'one.csv' with delimiter '|' null '' CSV;
> ERROR: value too long for type character(1)
> CONTEXT: COPY water_well, line 1, column gravel_packed: "106"
>
> Yet, the column comparison for gravel_packed and surrounding attributes
> does not show this:
>
> lot_no TEXT, |
> block_no TEXT, |
> well_finish_date DATE, 11/15/1948|
> date_cmplt_acc CHAR(1), D|
> gravel_packed CHAR(1), |
> depth_seal INTEGER, |
> depth_drilled INTEGER, 106|
> depth_bedrock INTEGER, |
> aquifer_desc TEXT, |
>
> Notice the NULL value for gravel_packed, while the "106" is for
> depth_drilled, two columns later.
>
> I'm stymied and wonder if there's a tool I can use to fix these 80 rows so
> the copy command will accept them.
>
> Rich
>

Can we see a few lines of one.csv?  If we ignore the CONTEXT line, and just look at the error "too long for char(1)",
itlook like only two columns to think about date_cmplt_acc and gravel_packed.  Sure there is no extra spaces or tabs or
weirdnessin the file for those two columns? 

You might also consider dumping out insert statements.  Might be a little slower, but simpler to debug.

Did access dump out comma separated with quoted fields?  I've done this on several occasions and never had to do any
cleaning. But... I also use perl to split the csv and fire off inserts/copys/updates/etc. 

-Andy

pgsql-general by date:

Previous
From: Rich Shepard
Date:
Subject: Finding Errors in .csv Input Data
Next
From: Andy Colson
Date:
Subject: Re: Finding Errors in .csv Input Data