Re: Finding Errors in .csv Input Data - Mailing list pgsql-general
From | Andy Colson |
---|---|
Subject | Re: Finding Errors in .csv Input Data |
Date | |
Msg-id | 4D6464C5.6040400@squeakycode.net Whole thread Raw |
In response to | Re: Finding Errors in .csv Input Data (Andy Colson <andy@squeakycode.net>) |
List | pgsql-general |
On 02/22/2011 07:25 PM, Andy Colson wrote: > On 02/22/2011 07:10 PM, Rich Shepard wrote: >> I'm sure many of you have solved this problem in the past and can offer >> solutions that will work for me. The context is a 73-column postgres table >> of data that was originally in an Access .mdb file. A colleague loaded the >> file into Access and wrote a .csv file for me to use since we have nothing >> Microsoft here. There are 110,752 rows in the file/table. After a lot of >> cleaning with emacs and sed, the copy command accepted all but 80 rows of >> data. Now I need to figure out why postgres reports them as having too many >> columns. >> >> Starting to work with a single row, I first compared by cleaned row to the >> raw .csv from the Access output. They match column-by-column. Then I copied >> the schema to a text file and started comparing the .csv data >> column-by-column. While this looks OK to me, postgres doesn't like it. For >> example, I get this error message: >> >> nevada=# \copy water_well from 'one.csv' with delimiter '|' null '' CSV; >> ERROR: value too long for type character(1) >> CONTEXT: COPY water_well, line 1, column gravel_packed: "106" >> >> Yet, the column comparison for gravel_packed and surrounding attributes >> does not show this: >> >> lot_no TEXT, | >> block_no TEXT, | >> well_finish_date DATE, 11/15/1948| >> date_cmplt_acc CHAR(1), D| >> gravel_packed CHAR(1), | >> depth_seal INTEGER, | >> depth_drilled INTEGER, 106| >> depth_bedrock INTEGER, | >> aquifer_desc TEXT, | >> >> Notice the NULL value for gravel_packed, while the "106" is for >> depth_drilled, two columns later. >> >> I'm stymied and wonder if there's a tool I can use to fix these 80 rows so >> the copy command will accept them. >> >> Rich >> > > Can we see a few lines of one.csv? If we ignore the CONTEXT line, and just look at the error "too long for char(1)", itlook like only two columns to think about date_cmplt_acc and gravel_packed. Sure there is no extra spaces or tabs or weirdnessin the file for those two columns? > > You might also consider dumping out insert statements. Might be a little slower, but simpler to debug. > > Did access dump out comma separated with quoted fields? I've done this on several occasions and never had to do any cleaning.But... I also use perl to split the csv and fire off inserts/copys/updates/etc. > > -Andy > Hum... and another also: you could change your fields to all be 'text', then do the import. Then clean up the data witha few: select * from water_well where length(date_cmplt_acc) > 1 And I see that you have 73 columns, not just the few you posted. Well, I suppose... one of these days, I'll start payingmore attention :-) -Andy
pgsql-general by date: