Thread: Finding Errors in .csv Input Data

Finding Errors in .csv Input Data

From

Rich Shepard

Date:

22 February 2011, 21:11:29

   I'm sure many of you have solved this problem in the past and can offer
solutions that will work for me. The context is a 73-column postgres table
of data that was originally in an Access .mdb file. A colleague loaded the
file into Access and wrote a .csv file for me to use since we have nothing
Microsoft here. There are 110,752 rows in the file/table. After a lot of
cleaning with emacs and sed, the copy command accepted all but 80 rows of
data. Now I need to figure out why postgres reports them as having too many
columns.

   Starting to work with a single row, I first compared by cleaned row to the
raw .csv from the Access output. They match column-by-column. Then I copied
the schema to a text file and started comparing the .csv data
column-by-column. While this looks OK to me, postgres doesn't like it. For
example, I get this error message:

nevada=# \copy water_well from 'one.csv' with delimiter '|' null '' CSV;
ERROR:  value too long for type character(1)
CONTEXT:  COPY water_well, line 1, column gravel_packed: "106"

   Yet, the column comparison for gravel_packed and surrounding attributes
does not show this:

lot_no TEXT,                    |
block_no TEXT,                  |
well_finish_date DATE,          11/15/1948|
date_cmplt_acc CHAR(1),         D|
gravel_packed CHAR(1),          |
depth_seal INTEGER,             |
depth_drilled INTEGER,          106|
depth_bedrock INTEGER,          |
aquifer_desc TEXT,              |

   Notice the NULL value for gravel_packed, while the "106" is for
depth_drilled, two columns later.

   I'm stymied and wonder if there's a tool I can use to fix these 80 rows so
the copy command will accept them.

Rich

Re: Finding Errors in .csv Input Data

From

Andy Colson

Date:

22 February 2011, 21:25:25

On 02/22/2011 07:10 PM, Rich Shepard wrote:
> I'm sure many of you have solved this problem in the past and can offer
> solutions that will work for me. The context is a 73-column postgres table
> of data that was originally in an Access .mdb file. A colleague loaded the
> file into Access and wrote a .csv file for me to use since we have nothing
> Microsoft here. There are 110,752 rows in the file/table. After a lot of
> cleaning with emacs and sed, the copy command accepted all but 80 rows of
> data. Now I need to figure out why postgres reports them as having too many
> columns.
>
> Starting to work with a single row, I first compared by cleaned row to the
> raw .csv from the Access output. They match column-by-column. Then I copied
> the schema to a text file and started comparing the .csv data
> column-by-column. While this looks OK to me, postgres doesn't like it. For
> example, I get this error message:
>
> nevada=# \copy water_well from 'one.csv' with delimiter '|' null '' CSV;
> ERROR: value too long for type character(1)
> CONTEXT: COPY water_well, line 1, column gravel_packed: "106"
>
> Yet, the column comparison for gravel_packed and surrounding attributes
> does not show this:
>
> lot_no TEXT, |
> block_no TEXT, |
> well_finish_date DATE, 11/15/1948|
> date_cmplt_acc CHAR(1), D|
> gravel_packed CHAR(1), |
> depth_seal INTEGER, |
> depth_drilled INTEGER, 106|
> depth_bedrock INTEGER, |
> aquifer_desc TEXT, |
>
> Notice the NULL value for gravel_packed, while the "106" is for
> depth_drilled, two columns later.
>
> I'm stymied and wonder if there's a tool I can use to fix these 80 rows so
> the copy command will accept them.
>
> Rich
>

Can we see a few lines of one.csv?  If we ignore the CONTEXT line, and just look at the error "too long for char(1)",
itlook like only two columns to think about date_cmplt_acc and gravel_packed.  Sure there is no extra spaces or tabs or
weirdnessin the file for those two columns? 

You might also consider dumping out insert statements.  Might be a little slower, but simpler to debug.

Did access dump out comma separated with quoted fields?  I've done this on several occasions and never had to do any
cleaning. But... I also use perl to split the csv and fire off inserts/copys/updates/etc. 

-Andy

Re: Finding Errors in .csv Input Data

From

Andy Colson

Date:

22 February 2011, 21:37:20

On 02/22/2011 07:25 PM, Andy Colson wrote:
> On 02/22/2011 07:10 PM, Rich Shepard wrote:
>> I'm sure many of you have solved this problem in the past and can offer
>> solutions that will work for me. The context is a 73-column postgres table
>> of data that was originally in an Access .mdb file. A colleague loaded the
>> file into Access and wrote a .csv file for me to use since we have nothing
>> Microsoft here. There are 110,752 rows in the file/table. After a lot of
>> cleaning with emacs and sed, the copy command accepted all but 80 rows of
>> data. Now I need to figure out why postgres reports them as having too many
>> columns.
>>
>> Starting to work with a single row, I first compared by cleaned row to the
>> raw .csv from the Access output. They match column-by-column. Then I copied
>> the schema to a text file and started comparing the .csv data
>> column-by-column. While this looks OK to me, postgres doesn't like it. For
>> example, I get this error message:
>>
>> nevada=# \copy water_well from 'one.csv' with delimiter '|' null '' CSV;
>> ERROR: value too long for type character(1)
>> CONTEXT: COPY water_well, line 1, column gravel_packed: "106"
>>
>> Yet, the column comparison for gravel_packed and surrounding attributes
>> does not show this:
>>
>> lot_no TEXT, |
>> block_no TEXT, |
>> well_finish_date DATE, 11/15/1948|
>> date_cmplt_acc CHAR(1), D|
>> gravel_packed CHAR(1), |
>> depth_seal INTEGER, |
>> depth_drilled INTEGER, 106|
>> depth_bedrock INTEGER, |
>> aquifer_desc TEXT, |
>>
>> Notice the NULL value for gravel_packed, while the "106" is for
>> depth_drilled, two columns later.
>>
>> I'm stymied and wonder if there's a tool I can use to fix these 80 rows so
>> the copy command will accept them.
>>
>> Rich
>>
>
> Can we see a few lines of one.csv? If we ignore the CONTEXT line, and just look at the error "too long for char(1)",
itlook like only two columns to think about date_cmplt_acc and gravel_packed. Sure there is no extra spaces or tabs or
weirdnessin the file for those two columns? 
>
> You might also consider dumping out insert statements. Might be a little slower, but simpler to debug.
>
> Did access dump out comma separated with quoted fields? I've done this on several occasions and never had to do any
cleaning.But... I also use perl to split the csv and fire off inserts/copys/updates/etc. 
>
> -Andy
>

Hum... and another also: you could change your fields to all be 'text', then do the import.  Then clean up the data
witha few: 

select * from water_well where length(date_cmplt_acc) > 1


And I see that you have 73 columns, not just the few you posted.  Well, I suppose... one of these days, I'll start
payingmore attention :-) 

-Andy

Re: Finding Errors in .csv Input Data

From

Adrian Klaver

Date:

22 February 2011, 21:46:08

On Tuesday, February 22, 2011 5:10:34 pm Rich Shepard wrote:
>    I'm sure many of you have solved this problem in the past and can offer
> solutions that will work for me. The context is a 73-column postgres table
> of data that was originally in an Access .mdb file. A colleague loaded the
> file into Access and wrote a .csv file for me to use since we have nothing
> Microsoft here. There are 110,752 rows in the file/table. After a lot of
> cleaning with emacs and sed, the copy command accepted all but 80 rows of
> data. Now I need to figure out why postgres reports them as having too many
> columns.
>
>    Starting to work with a single row, I first compared by cleaned row to
> the raw .csv from the Access output. They match column-by-column. Then I
> copied the schema to a text file and started comparing the .csv data
> column-by-column. While this looks OK to me, postgres doesn't like it. For
> example, I get this error message:
>
> nevada=# \copy water_well from 'one.csv' with delimiter '|' null '' CSV;
> ERROR:  value too long for type character(1)
> CONTEXT:  COPY water_well, line 1, column gravel_packed: "106"
>
>    Yet, the column comparison for gravel_packed and surrounding attributes
> does not show this:
>
> lot_no TEXT,                    |
> block_no TEXT,                  |
> well_finish_date DATE,          11/15/1948|
> date_cmplt_acc CHAR(1),         D|
> gravel_packed CHAR(1),          |
> depth_seal INTEGER,             |
> depth_drilled INTEGER,          106|
> depth_bedrock INTEGER,          |
> aquifer_desc TEXT,              |
>
>    Notice the NULL value for gravel_packed, while the "106" is for
> depth_drilled, two columns later.
>
>    I'm stymied and wonder if there's a tool I can use to fix these 80 rows
> so the copy command will accept them.
>
> Rich

We are going to need to see at least a sample of the actual data in one.csv that
is causing the problem. You have an off by two error as you suggest, but that
could actually have happened earlier in the row. For instance the
well_finish_date would insert into lot_no because lot_no is TEXT and the date
value at this point is just text. Same with date_cmplt_acc and block_no.
--
Adrian Klaver
adrian.klaver@gmail.com

Re: Finding Errors in .csv Input Data

From

Rich Shepard

Date:

22 February 2011, 22:58:48

On Tue, 22 Feb 2011, Adrian Klaver wrote:

> We are going to need to see at least a sample of the actual data in
> one.csv that is causing the problem. You have an off by two error as you
> suggest, but that could actually have happened earlier in the row. For
> instance the well_finish_date would insert into lot_no because lot_no is
> TEXT and the date value at this point is just text. Same with
> date_cmplt_acc and block_no.

Adrian/Andy,

   The data came out of Access as comma-and-quote csv. I massaged it in emacs
and sed to change the delimiter to a pipe rather than a comma and removed
the double quotes.

   I cannot assume that each of the 80 problem rows suffer from the defect in
the same place, so if there's a generic process I can apply it row-by-row.
After all, 80 problem rows out of 110,752 is not bad.

   Here are the schema and the first row, in one.csv. Because I don't know if
attachments are stripped off before the message is allowed to be distributed
to the list subscribers, I'll just include both here.

DDL:

   The column names are original (except for 'ref' that seems to be a
reserved word), but I modifed the data types.

CREATE TABLE water_well (
   sequence_no TEXT PRIMARY KEY,
   well_log TEXT,
   app VARCHAR(20),
   notice_of_intent VARCHAR(6),
   waiver_no VARCHAR(30),
   date_log_rcvd DATE,
   date_log_rcvd_acc CHAR(1),
   site_type CHAR(1),
   work_type CHAR(1),
   work_type_rmks TEXT,
   proposed_use CHAR(1),
   drilling_method CHAR(1),
   sc TEXT,
   ha TEXT,
   twn VARCHAR(3),
   legal_twn VARCHAR(3),
   rng VARCHAR(3),
   legal_rng VARCHAR(3),
   sec TEXT,
   sec_quarters TEXT,
   legal_quarters TEXT,
   quarters_seq TEXT,
   ref TEXT,
   latitude NUMERIC(9,6),
   longitude NUMERIC(9,6),
   lat_long_src VARCHAR(5),
   lat_long_acc CHAR(1),
   owner_current TEXT,
   owner_address TEXT,
   owner_no TEXT,
   parcel_no TEXT,
   subdivision_name TEXT,
   lot_no TEXT,
   block_no TEXT,
   well_finish_date DATE,
   date_cmplt_acc CHAR(1),
   gravel_packed CHAR(1),
   depth_seal INTEGER,
   depth_drilled INTEGER,
   depth_bedrock INTEGER,
   aquifer_desc TEXT,
   depth_cased INTEGER,
   csng_diameter FLOAT,
   csng_reductions INTEGER,
   top_perf INTEGER,
   bottom_perf INTEGER,
   perf_intervals INTEGER,
   static_wl FLOAT,
   temperature FLOAT,
   yield FLOAT,
   drawdown FLOAT,
   hours_pumped FLOAT,
   test_method CHAR(1),
   qual_const_data CHAR(1),
   qual_lith_data CHAR(1),
   remarks TEXT,
   remarks_additional TEXT,
   contractor_lic_no VARCHAR(8),
   contractor_name TEXT,
   contractor_address TEXT,
   contractor_drlr_no VARCHAR(6),
   driller_lic_no VARCHAR(6),
   source_agency TEXT,
   user_id TEXT,
   date_entry DATE,
   update_user_id VARCHAR(16),
   date_update DATE,
   edit_status VARCHAR(16),
   well_start_date DATE,
   gravel_pack_top INTEGER,
   gravel_pack_bot INTEGER,
   utm_x NUMERIC(13,6),
   utm_y NUMERIC(13,6)
);

   Here's one.csv:

68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.44|119.77|NV003|M|KAIPER, R L|||SIERRA
MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIRCOMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1
BOX10 RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23 

   I hope you're able to see what I keep missing as the source of the
problem.

Rich

Re: Finding Errors in .csv Input Data

From

"David Johnston"

Date:

23 February 2011, 00:04:47

Rich,

The data and table structure provided do not seem to correlate.

Regardless, if you changed the delimiter to "|" from "," it is possible that
you converted an embedded "," in one of the textual fields into a "|" when
you should not have.

For Instance:

Value1,value2,"value, with comma",value4

Thus became

Value1|value2|value|with comma|value4

Giving you additional fields that should not be present.

I suggest opening up a testing file (one with the 80 malformed records and
10 to 20 good/control records) in an Excel or equivalent spreadsheet and
import/text-to-columns using the "|" delimiter.  You will be able to quickly
see rows with extra columns and specifically where those extras are
originating.  Then you can decide on how to fix the problem.  I would
suggest manually changing each incorrect "|" into a "," as a form of
self-punishment for improper data conversion - but whatever works for you.

In the odd chance you do not see extra columns in the spreadsheet you can
also scan down the columns and see if you recognize any pattern of
differences between the 80 failing records and the 10 successful records
that might point you further in the correct direction.

If THAT fails you might want to see if anyone will receive the testing file
and "CREATE TABLE" and try to independently diagnose the cause of the
problem.  That or do the Access export and skip your conversion routine and
directly import the CSV into Postgres.

David J.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Rich Shepard
Sent: Tuesday, February 22, 2011 9:59 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Finding Errors in .csv Input Data

On Tue, 22 Feb 2011, Adrian Klaver wrote:

> We are going to need to see at least a sample of the actual data in
> one.csv that is causing the problem. You have an off by two error as
> you suggest, but that could actually have happened earlier in the row.
> For instance the well_finish_date would insert into lot_no because
> lot_no is TEXT and the date value at this point is just text. Same
> with date_cmplt_acc and block_no.

Adrian/Andy,

   The data came out of Access as comma-and-quote csv. I massaged it in
emacs and sed to change the delimiter to a pipe rather than a comma and
removed the double quotes.
Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Finding Errors in .csv Input Data

From

Rich Shepard

Date:

23 February 2011, 10:11:24

On Tue, 22 Feb 2011, David Johnston wrote:

> The data and table structure provided do not seem to correlate.

David,

   That's the problem. However, they should match since they came from the
same .mdb file.

> Regardless, if you changed the delimiter to "|" from "," it is possible that
> you converted an embedded "," in one of the textual fields into a "|" when
> you should not have.

   This is why I worked out a processing flow that avoided this problem in
_almost_ every case. The 80 remaining rows do not appear to have this
problem. At least, it's not immediately obvious they do which is why I want
to learn how to identify where (and why) the data get out of sync with the
schema.

> I suggest opening up a testing file (one with the 80 malformed records and
> 10 to 20 good/control records) in an Excel or equivalent spreadsheet and
> import/text-to-columns using the "|" delimiter.  You will be able to
> quickly see rows with extra columns and specifically where those extras
> are originating.  Then you can decide on how to fix the problem.  I would
> suggest manually changing each incorrect "|" into a "," as a form of
> self-punishment for improper data conversion - but whatever works for you.

   Each of the 80 rows have extra columns, at least one.

   Here are the same two rows from the original .csv export and the
transformation:

68670,724,"",0,,11/27/1948,"D","N","N","","H","C","32031","087","N18","18N","E20","20E","07","","","","MD",39.44,119.77,"NV003","M","KAIPER,
RL","","","","SIERRA MANOR","","",11/15/1948,"D","",,106,,"",106,6.62,0,60,102,1,12.00,,30.00,,,"B","G","G","AIR
COMPRESSORTESTED 30 GPM ALSO","","3","MEL MEYER","RT 1 BOX
10RENO,",,3,"NV003","JSWINGHOLM",1/16/2003,"",,"F",11/11/1948,,,261013.36,4369139.23

68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.44|119.77|NV003|M|KAIPER, R L|||SIERRA
MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIRCOMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1
BOX10 RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23 

   When I compared the two I saw no difference. Why the latter does not want
to fit the schema I don't know.

   I can send the schema again and the whole 80-row file to anyone willing to
hemp me find the problems.

Rich

Re: Finding Errors in .csv Input Data

From

"David Johnston"

Date:

23 February 2011, 10:38:47

Why does your transformed data repeat the first 5 columns out the original
CSV?
Why do values within a given field end up on different lines (e.g., SIERRA
MANOR, 39.44)?

Basically, your transform output as presented makes no sense to me; I don't
even see how you could import any records into Postgres if they have that
structure.

Why can you not just import the CSV as generated by Access?

Ignoring the repeating "header" and the line-wrapping the following
differences appears:
Missing a null/blank field between "KAIPER, R L" and "SIERRA MANOR"
Missing a null/blank field between "SIERRA MANOR" and "11/15/1948"

Thus everything after "SIERRA MANOR" is off-by-two in column position
(SIERRA MANOR itself of off by 1) which seems to match what you are seeing
during the import.  You can add a "|" in those two locations and the record
should then import.

Repeat for all other records.

David J.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Rich Shepard
Sent: Wednesday, February 23, 2011 9:11 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Finding Errors in .csv Input Data
   Each of the 80 rows have extra columns, at least one.

   Here are the same two rows from the original .csv export and the
transformation:

68670,724,"",0,,11/27/1948,"D","N","N","","H","C","32031","087","N18","18N",
"E20","20E","07","","","","MD",39.44,119.77,"NV003","M","KAIPER, R
L","","","","SIERRA
MANOR","","",11/15/1948,"D","",,106,,"",106,6.62,0,60,102,1,12.00,,30.00,,,"
B","G","G","AIR COMPRESSOR TESTED 30 GPM ALSO","","3","MEL MEYER","RT 1 BOX
10RENO,",,3,"NV003","JSWINGHOLM",1/16/2003,"",,"F",11/11/1948,,,261013.36,43
69139.23

68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|3
68670|724||0||9.44|119.77|NV003|M|KAIPER, R L|||SIERRA
68670|724||0||MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.
68670|724||0||00|||B|G|G|AIR COMPRESSOR TESTED 30 GPM ALSO||3|MEL
68670|724||0||MEYER|RT 1 BOX 10
68670|724||0||RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||26101
68670|724||0||3.36|4369139.23

   When I compared the two I saw no difference. Why the latter does not want
to fit the schema I don't know.

   I can send the schema again and the whole 80-row file to anyone willing
to hemp me find the problems.

Rich


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Finding Errors in .csv Input Data

From

Adrian Klaver

Date:

23 February 2011, 10:45:00

On Wednesday, February 23, 2011 6:11:16 am Rich Shepard wrote:
> On Tue, 22 Feb 2011, David Johnston wrote:
> > The data and table structure provided do not seem to correlate.
>
> David,
>
>    That's the problem. However, they should match since they came from the
> same .mdb file.
>
> > Regardless, if you changed the delimiter to "|" from "," it is possible
> > that you converted an embedded "," in one of the textual fields into a
> > "|" when you should not have.
>
>    This is why I worked out a processing flow that avoided this problem in
> _almost_ every case. The 80 remaining rows do not appear to have this
> problem. At least, it's not immediately obvious they do which is why I want
> to learn how to identify where (and why) the data get out of sync with the
> schema.
>
> > I suggest opening up a testing file (one with the 80 malformed records
> > and 10 to 20 good/control records) in an Excel or equivalent spreadsheet
> > and import/text-to-columns using the "|" delimiter.  You will be able to
> > quickly see rows with extra columns and specifically where those extras
> > are originating.  Then you can decide on how to fix the problem.  I
> > would suggest manually changing each incorrect "|" into a "," as a form
> > of self-punishment for improper data conversion - but whatever works for
> > you.
>
>    Each of the 80 rows have extra columns, at least one.
>
>    Here are the same two rows from the original .csv export and the
> transformation:
>
> 68670,724,"",0,,11/27/1948,"D","N","N","","H","C","32031","087","N18","18N"
> ,"E20","20E","07","","","","MD",39.44,119.77,"NV003","M","KAIPER, R
> L","","","","SIERRA
> MANOR","","",11/15/1948,"D","",,106,,"",106,6.62,0,60,102,1,12.00,,30.00,,
> ,"B","G","G","AIR COMPRESSOR TESTED 30 GPM ALSO","","3","MEL MEYER","RT 1
> BOX
> 10RENO,",,3,"NV003","JSWINGHOLM",1/16/2003,"",,"F",11/11/1948,,,261013.36,
> 4369139.23
>
> 68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.4
> 4|119.77|NV003|M|KAIPER, R L|||SIERRA
> MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIR
> COMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1 BOX 10
> RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23
>
>    When I compared the two I saw no difference. Why the latter does not
> want to fit the schema I don't know.
>
>    I can send the schema again and the whole 80-row file to anyone willing
> to hemp me find the problems.
>
> Rich

As a quick test try quoting KAIPER, R L -->"KAIPER, R L". If that does not work
you can send me the 80 line csv file and I will see if anything stands out.

--
Adrian Klaver
adrian.klaver@gmail.com

Re: Finding Errors in .csv Input Data

From

Rich Shepard

Date:

23 February 2011, 13:23:58

On Wed, 23 Feb 2011, David Johnston wrote:

> Why can you not just import the CSV as generated by Access?

   I don't want additional quotation marks on all text.

   Loading the 80 rows in OO.o Calc does let me see the problems and I'm
fixing them one-by-one. I didn't think of doing this.

Thanks,

Rich

Re: Finding Errors in .csv Input Data

From

Joe Conway

Date:

23 February 2011, 14:10:34

On 02/23/2011 09:23 AM, Rich Shepard wrote:
> On Wed, 23 Feb 2011, David Johnston wrote:
>
>> Why can you not just import the CSV as generated by Access?
>
>   I don't want additional quotation marks on all text.

I haven't followed this thread too closely, so maybe I'm missing
something, but isn't this essentially what you need to do?

# cat /tmp/test.csv
1,"testing 123,456","hello world",42

create table testcsv(id int, f1 text, f2 text, f3 int);
copy testcsv from '/tmp/test.csv' with csv delimiter ',' quote '"';
select * from testcsv;
 id |       f1        |     f2      | f3
----+-----------------+-------------+----
  1 | testing 123,456 | hello world | 42
(1 row)

HTH,

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

Attachment

signature.asc

Re: Finding Errors in .csv Input Data

From

Dimitri Fontaine

Date:

06 March 2011, 16:39:32

Rich Shepard <rshepard@appl-ecosys.com> writes:
>   I'm sure many of you have solved this problem in the past and can offer
> solutions that will work for me. The context is a 73-column postgres table
> of data that was originally in an Access .mdb file. A colleague loaded the
> file into Access and wrote a .csv file for me to use since we have nothing
> Microsoft here. There are 110,752 rows in the file/table. After a lot of
> cleaning with emacs and sed, the copy command accepted all but 80 rows of
> data. Now I need to figure out why postgres reports them as having too many
> columns.

Did you try pgloader yet?

  http://pgloader.projects.postgresql.org/

  http://pgfoundry.org/projects/pgloader/
  https://github.com/dimitri/pgloader
  http://packages.debian.org/sid/pgloader

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Finding Errors in .csv Input Data

From

Rich Shepard

Date:

06 March 2011, 17:26:20

On Sun, 6 Mar 2011, Dimitri Fontaine wrote:

> Did you try pgloader yet?

   Nope.

   I did determine the reasons and fixed them so all rows read into the
table.

Thanks for the suggestion,

Rich