Re: [GENERAL] copy command -- foiled by pg_atoi - Mailing list pgsql-general

From Thomas Reinke
Subject Re: [GENERAL] copy command -- foiled by pg_atoi
Date
Msg-id 385E671B.C93F49D6@e-softinc.com
Whole thread Raw
In response to copy command -- foiled by pg_atoi  (Mike Beller <beller@tradeworx.com>)
List pgsql-general
I've run into this a few times as well. My strategy to "hunt" down
the offending line has been to do a "bisection" algorithm.

Essentially, cut your file in half. If the first half loads ok,
you know the problem is in the second half, and vice versa.
Now cut the offending half in half again. Do the same.
With 23,000 rows, you will do this sequence no more than 15
times, and you will have narrowed down the offender (assuming
23,000 rows). With 2 million rows (something we've had to contend
with in the past), you'll nail it down in 21 iterations.

Not pretty to say the least, but workable.

Cheers, Thomas

Jose Soares wrote:
>
> This is also my problem. I'm getting '@!?àù§èé+*_|!&/%§¸' to load a
> table with more than 23,000 rows
> because I don't know in which line I have to look for the the error.
> It's obvious that psql reads the input file line by line, and of course
> psql knows in wich line the error
> occurred. This information would be very important to correct any syntax
> error in the file.
>
> Comments?
>
> Jose'
>
> Mike Beller wrote:
> >
> > Hi
> >
> > I'm using postgresql on RedHat/i386 from the RH6.1 RPMs
> > (Linux 2.2.5-15 PostgreSQL-6.5.2).  I've been evaluating it,
> > and am quite excited about the possibilty of using this outstanding
> > open-source code in my company's systems.
> >
> > I noticed the following and want to submit it for your consideration:
> >
> > When doing a 'copy' from a big text file, if there is a record, say
> > half way down, which has an unparseable integer field (say 'X'
> > in a column that shoudl be an integer) one gets the following error:
> >
> > ERROR:  pg_atoi: error in "X": can't parse "X"
> >
> > However, I don't get a line number reference where the error
> > occurred.  This is a problem when you have half a million lines
> > in your file, some of which  may contain legitimate X's as
> > well as non-legit ones.  I don't even know which attribute failed.
> >
> > Simple example:  (imagine a million rows with 12 columns and
> > you're trying to find the problem:)
> >
> > -------
> > create table foo (x int);
> > copy foo from stdin;
> > 1
> > 2
> > 3
> > X
> > 4
> > \.
> > -----------
> >
> > I noticed that in pg_atoi there is this code:
> >
> >      if (errno)                         /* strtol must set ERANGE */
> >             elog(ERROR, "pg_atoi: error reading \"%s\": %m", s);
> >      if (badp && *badp && (*badp != c))
> >             elog(ERROR, "pg_atoi: error in \"%s\": can\'t parse
> > \"%s\"",s, badp);
> >
> > This is called (via int4in or some such) from copy.c:
> >
> >     values[i] = (Datum) (*fmgr_faddr(&in_functions[i]))(string,
> > elements[i],typmod[i]);
> >            /*
> >           * Sanity check - by reference attributes cannot
> >            * return NULL
> >           */
> >           if (!PointerIsValid(values[i]) &&
> > !(rel->rd_att->attrs[i]->attbyval))
> >                   elog(ERROR, "copy from line %d: Bad file format",
> > lineno);
> >
> > So the lineno information is there, it's just not available when pg_atoi
> > logs its error and bails out.
> >
> > Is it possible for pg_atoi to return an invalid pointer instead
> > of completely bailing out, thus allowing the 'sanity check' code
> > to print the line number?  I have no idea how many things this
> > might break...
> >
> > Regards
> >
> > Mike Beller
> > CTO
> > Tradeworx.com
> >
> > ************
>
> ************

--
------------------------------------------------------------
Thomas Reinke                            Tel: (905) 331-2260
Director of Technology                   Fax: (905) 331-2504
E-Soft Inc.                         http://www.e-softinc.com

pgsql-general by date:

Previous
From: "Moray McConnachie"
Date:
Subject: 0x44
Next
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] recovering a "lost" database