Re: Finding Errors in .csv Input Data - Mailing list pgsql-general

From Rich Shepard
Subject Re: Finding Errors in .csv Input Data
Date
Msg-id alpine.LNX.2.00.1102221853060.32123@salmo.appl-ecosys.com
Whole thread Raw
In response to Re: Finding Errors in .csv Input Data  (Adrian Klaver <adrian.klaver@gmail.com>)
Responses Re: Finding Errors in .csv Input Data
List pgsql-general
On Tue, 22 Feb 2011, Adrian Klaver wrote:

> We are going to need to see at least a sample of the actual data in
> one.csv that is causing the problem. You have an off by two error as you
> suggest, but that could actually have happened earlier in the row. For
> instance the well_finish_date would insert into lot_no because lot_no is
> TEXT and the date value at this point is just text. Same with
> date_cmplt_acc and block_no.

Adrian/Andy,

   The data came out of Access as comma-and-quote csv. I massaged it in emacs
and sed to change the delimiter to a pipe rather than a comma and removed
the double quotes.

   I cannot assume that each of the 80 problem rows suffer from the defect in
the same place, so if there's a generic process I can apply it row-by-row.
After all, 80 problem rows out of 110,752 is not bad.

   Here are the schema and the first row, in one.csv. Because I don't know if
attachments are stripped off before the message is allowed to be distributed
to the list subscribers, I'll just include both here.

DDL:

   The column names are original (except for 'ref' that seems to be a
reserved word), but I modifed the data types.

CREATE TABLE water_well (
   sequence_no TEXT PRIMARY KEY,
   well_log TEXT,
   app VARCHAR(20),
   notice_of_intent VARCHAR(6),
   waiver_no VARCHAR(30),
   date_log_rcvd DATE,
   date_log_rcvd_acc CHAR(1),
   site_type CHAR(1),
   work_type CHAR(1),
   work_type_rmks TEXT,
   proposed_use CHAR(1),
   drilling_method CHAR(1),
   sc TEXT,
   ha TEXT,
   twn VARCHAR(3),
   legal_twn VARCHAR(3),
   rng VARCHAR(3),
   legal_rng VARCHAR(3),
   sec TEXT,
   sec_quarters TEXT,
   legal_quarters TEXT,
   quarters_seq TEXT,
   ref TEXT,
   latitude NUMERIC(9,6),
   longitude NUMERIC(9,6),
   lat_long_src VARCHAR(5),
   lat_long_acc CHAR(1),
   owner_current TEXT,
   owner_address TEXT,
   owner_no TEXT,
   parcel_no TEXT,
   subdivision_name TEXT,
   lot_no TEXT,
   block_no TEXT,
   well_finish_date DATE,
   date_cmplt_acc CHAR(1),
   gravel_packed CHAR(1),
   depth_seal INTEGER,
   depth_drilled INTEGER,
   depth_bedrock INTEGER,
   aquifer_desc TEXT,
   depth_cased INTEGER,
   csng_diameter FLOAT,
   csng_reductions INTEGER,
   top_perf INTEGER,
   bottom_perf INTEGER,
   perf_intervals INTEGER,
   static_wl FLOAT,
   temperature FLOAT,
   yield FLOAT,
   drawdown FLOAT,
   hours_pumped FLOAT,
   test_method CHAR(1),
   qual_const_data CHAR(1),
   qual_lith_data CHAR(1),
   remarks TEXT,
   remarks_additional TEXT,
   contractor_lic_no VARCHAR(8),
   contractor_name TEXT,
   contractor_address TEXT,
   contractor_drlr_no VARCHAR(6),
   driller_lic_no VARCHAR(6),
   source_agency TEXT,
   user_id TEXT,
   date_entry DATE,
   update_user_id VARCHAR(16),
   date_update DATE,
   edit_status VARCHAR(16),
   well_start_date DATE,
   gravel_pack_top INTEGER,
   gravel_pack_bot INTEGER,
   utm_x NUMERIC(13,6),
   utm_y NUMERIC(13,6)
);

   Here's one.csv:

68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.44|119.77|NV003|M|KAIPER, R L|||SIERRA
MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIRCOMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1
BOX10 RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23 

   I hope you're able to see what I keep missing as the source of the
problem.

Rich

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Finding Errors in .csv Input Data
Next
From: Jaime Crespo Rincón
Date:
Subject: Re: Mysql to Postgresql