Re: Finding Errors in .csv Input Data - Mailing list pgsql-general
From | Rich Shepard |
---|---|
Subject | Re: Finding Errors in .csv Input Data |
Date | |
Msg-id | alpine.LNX.2.00.1102221853060.32123@salmo.appl-ecosys.com Whole thread Raw |
In response to | Re: Finding Errors in .csv Input Data (Adrian Klaver <adrian.klaver@gmail.com>) |
Responses |
Re: Finding Errors in .csv Input Data
|
List | pgsql-general |
On Tue, 22 Feb 2011, Adrian Klaver wrote: > We are going to need to see at least a sample of the actual data in > one.csv that is causing the problem. You have an off by two error as you > suggest, but that could actually have happened earlier in the row. For > instance the well_finish_date would insert into lot_no because lot_no is > TEXT and the date value at this point is just text. Same with > date_cmplt_acc and block_no. Adrian/Andy, The data came out of Access as comma-and-quote csv. I massaged it in emacs and sed to change the delimiter to a pipe rather than a comma and removed the double quotes. I cannot assume that each of the 80 problem rows suffer from the defect in the same place, so if there's a generic process I can apply it row-by-row. After all, 80 problem rows out of 110,752 is not bad. Here are the schema and the first row, in one.csv. Because I don't know if attachments are stripped off before the message is allowed to be distributed to the list subscribers, I'll just include both here. DDL: The column names are original (except for 'ref' that seems to be a reserved word), but I modifed the data types. CREATE TABLE water_well ( sequence_no TEXT PRIMARY KEY, well_log TEXT, app VARCHAR(20), notice_of_intent VARCHAR(6), waiver_no VARCHAR(30), date_log_rcvd DATE, date_log_rcvd_acc CHAR(1), site_type CHAR(1), work_type CHAR(1), work_type_rmks TEXT, proposed_use CHAR(1), drilling_method CHAR(1), sc TEXT, ha TEXT, twn VARCHAR(3), legal_twn VARCHAR(3), rng VARCHAR(3), legal_rng VARCHAR(3), sec TEXT, sec_quarters TEXT, legal_quarters TEXT, quarters_seq TEXT, ref TEXT, latitude NUMERIC(9,6), longitude NUMERIC(9,6), lat_long_src VARCHAR(5), lat_long_acc CHAR(1), owner_current TEXT, owner_address TEXT, owner_no TEXT, parcel_no TEXT, subdivision_name TEXT, lot_no TEXT, block_no TEXT, well_finish_date DATE, date_cmplt_acc CHAR(1), gravel_packed CHAR(1), depth_seal INTEGER, depth_drilled INTEGER, depth_bedrock INTEGER, aquifer_desc TEXT, depth_cased INTEGER, csng_diameter FLOAT, csng_reductions INTEGER, top_perf INTEGER, bottom_perf INTEGER, perf_intervals INTEGER, static_wl FLOAT, temperature FLOAT, yield FLOAT, drawdown FLOAT, hours_pumped FLOAT, test_method CHAR(1), qual_const_data CHAR(1), qual_lith_data CHAR(1), remarks TEXT, remarks_additional TEXT, contractor_lic_no VARCHAR(8), contractor_name TEXT, contractor_address TEXT, contractor_drlr_no VARCHAR(6), driller_lic_no VARCHAR(6), source_agency TEXT, user_id TEXT, date_entry DATE, update_user_id VARCHAR(16), date_update DATE, edit_status VARCHAR(16), well_start_date DATE, gravel_pack_top INTEGER, gravel_pack_bot INTEGER, utm_x NUMERIC(13,6), utm_y NUMERIC(13,6) ); Here's one.csv: 68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.44|119.77|NV003|M|KAIPER, R L|||SIERRA MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIRCOMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1 BOX10 RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23 I hope you're able to see what I keep missing as the source of the problem. Rich
pgsql-general by date: