Re: [GENERAL] copy command -- foiled by pg_atoi - Mailing list pgsql-general
From | Ed Loehr |
---|---|
Subject | Re: [GENERAL] copy command -- foiled by pg_atoi |
Date | |
Msg-id | 385E95D9.CAD4F50D@austin.rr.com Whole thread Raw |
In response to | copy command -- foiled by pg_atoi (Mike Beller <beller@tradeworx.com>) |
List | pgsql-general |
I have found that judicious placement of a few queries (selects, intentional errors, etc.) within a long sequence of inserts will help segment them for identification of offending lines. Hokie, but it helps me. Cheers, Ed Loehr Thomas Reinke wrote: > I've run into this a few times as well. My strategy to "hunt" down > the offending line has been to do a "bisection" algorithm. > > Essentially, cut your file in half. If the first half loads ok, > you know the problem is in the second half, and vice versa. > Now cut the offending half in half again. Do the same. > With 23,000 rows, you will do this sequence no more than 15 > times, and you will have narrowed down the offender (assuming > 23,000 rows). With 2 million rows (something we've had to contend > with in the past), you'll nail it down in 21 iterations. > > Not pretty to say the least, but workable. > > Cheers, Thomas > > Jose Soares wrote: > > > > This is also my problem. I'm getting '@!?àù§èé+*_|!&/%§¸' to load a > > table with more than 23,000 rows > > because I don't know in which line I have to look for the the error. > > It's obvious that psql reads the input file line by line, and of course > > psql knows in wich line the error > > occurred. This information would be very important to correct any syntax > > error in the file. > > > > Comments? > > > > Jose' > > > > Mike Beller wrote: > > > > > > Hi > > > > > > I'm using postgresql on RedHat/i386 from the RH6.1 RPMs > > > (Linux 2.2.5-15 PostgreSQL-6.5.2). I've been evaluating it, > > > and am quite excited about the possibilty of using this outstanding > > > open-source code in my company's systems. > > > > > > I noticed the following and want to submit it for your consideration: > > > > > > When doing a 'copy' from a big text file, if there is a record, say > > > half way down, which has an unparseable integer field (say 'X' > > > in a column that shoudl be an integer) one gets the following error: > > > > > > ERROR: pg_atoi: error in "X": can't parse "X" > > > > > > However, I don't get a line number reference where the error > > > occurred. This is a problem when you have half a million lines > > > in your file, some of which may contain legitimate X's as > > > well as non-legit ones. I don't even know which attribute failed. > > > > > > Simple example: (imagine a million rows with 12 columns and > > > you're trying to find the problem:) > > > > > > ------- > > > create table foo (x int); > > > copy foo from stdin; > > > 1 > > > 2 > > > 3 > > > X > > > 4 > > > \. > > > ----------- > > > > > > I noticed that in pg_atoi there is this code: > > > > > > if (errno) /* strtol must set ERANGE */ > > > elog(ERROR, "pg_atoi: error reading \"%s\": %m", s); > > > if (badp && *badp && (*badp != c)) > > > elog(ERROR, "pg_atoi: error in \"%s\": can\'t parse > > > \"%s\"",s, badp); > > > > > > This is called (via int4in or some such) from copy.c: > > > > > > values[i] = (Datum) (*fmgr_faddr(&in_functions[i]))(string, > > > elements[i],typmod[i]); > > > /* > > > * Sanity check - by reference attributes cannot > > > * return NULL > > > */ > > > if (!PointerIsValid(values[i]) && > > > !(rel->rd_att->attrs[i]->attbyval)) > > > elog(ERROR, "copy from line %d: Bad file format", > > > lineno); > > > > > > So the lineno information is there, it's just not available when pg_atoi > > > logs its error and bails out. > > > > > > Is it possible for pg_atoi to return an invalid pointer instead > > > of completely bailing out, thus allowing the 'sanity check' code > > > to print the line number? I have no idea how many things this > > > might break... > > > > > > Regards > > > > > > Mike Beller > > > CTO > > > Tradeworx.com > > > > > > ************ > > > > ************ > > -- > ------------------------------------------------------------ > Thomas Reinke Tel: (905) 331-2260 > Director of Technology Fax: (905) 331-2504 > E-Soft Inc. http://www.e-softinc.com > > ************
pgsql-general by date: