Re: [GENERAL] copy command -- foiled by pg_atoi - Mailing list pgsql-general

From Ed Loehr
Subject Re: [GENERAL] copy command -- foiled by pg_atoi
Date
Msg-id 385E95D9.CAD4F50D@austin.rr.com
Whole thread Raw
In response to copy command -- foiled by pg_atoi  (Mike Beller <beller@tradeworx.com>)
List pgsql-general
I have found that judicious placement of a few queries (selects, intentional
errors, etc.) within a long sequence of inserts will help segment them for
identification of offending lines.  Hokie, but it helps me.

Cheers,
Ed Loehr

Thomas Reinke wrote:

> I've run into this a few times as well. My strategy to "hunt" down
> the offending line has been to do a "bisection" algorithm.
>
> Essentially, cut your file in half. If the first half loads ok,
> you know the problem is in the second half, and vice versa.
> Now cut the offending half in half again. Do the same.
> With 23,000 rows, you will do this sequence no more than 15
> times, and you will have narrowed down the offender (assuming
> 23,000 rows). With 2 million rows (something we've had to contend
> with in the past), you'll nail it down in 21 iterations.
>
> Not pretty to say the least, but workable.
>
> Cheers, Thomas
>
> Jose Soares wrote:
> >
> > This is also my problem. I'm getting '@!?àù§èé+*_|!&/%§¸' to load a
> > table with more than 23,000 rows
> > because I don't know in which line I have to look for the the error.
> > It's obvious that psql reads the input file line by line, and of course
> > psql knows in wich line the error
> > occurred. This information would be very important to correct any syntax
> > error in the file.
> >
> > Comments?
> >
> > Jose'
> >
> > Mike Beller wrote:
> > >
> > > Hi
> > >
> > > I'm using postgresql on RedHat/i386 from the RH6.1 RPMs
> > > (Linux 2.2.5-15 PostgreSQL-6.5.2).  I've been evaluating it,
> > > and am quite excited about the possibilty of using this outstanding
> > > open-source code in my company's systems.
> > >
> > > I noticed the following and want to submit it for your consideration:
> > >
> > > When doing a 'copy' from a big text file, if there is a record, say
> > > half way down, which has an unparseable integer field (say 'X'
> > > in a column that shoudl be an integer) one gets the following error:
> > >
> > > ERROR:  pg_atoi: error in "X": can't parse "X"
> > >
> > > However, I don't get a line number reference where the error
> > > occurred.  This is a problem when you have half a million lines
> > > in your file, some of which  may contain legitimate X's as
> > > well as non-legit ones.  I don't even know which attribute failed.
> > >
> > > Simple example:  (imagine a million rows with 12 columns and
> > > you're trying to find the problem:)
> > >
> > > -------
> > > create table foo (x int);
> > > copy foo from stdin;
> > > 1
> > > 2
> > > 3
> > > X
> > > 4
> > > \.
> > > -----------
> > >
> > > I noticed that in pg_atoi there is this code:
> > >
> > >      if (errno)                         /* strtol must set ERANGE */
> > >             elog(ERROR, "pg_atoi: error reading \"%s\": %m", s);
> > >      if (badp && *badp && (*badp != c))
> > >             elog(ERROR, "pg_atoi: error in \"%s\": can\'t parse
> > > \"%s\"",s, badp);
> > >
> > > This is called (via int4in or some such) from copy.c:
> > >
> > >     values[i] = (Datum) (*fmgr_faddr(&in_functions[i]))(string,
> > > elements[i],typmod[i]);
> > >            /*
> > >           * Sanity check - by reference attributes cannot
> > >            * return NULL
> > >           */
> > >           if (!PointerIsValid(values[i]) &&
> > > !(rel->rd_att->attrs[i]->attbyval))
> > >                   elog(ERROR, "copy from line %d: Bad file format",
> > > lineno);
> > >
> > > So the lineno information is there, it's just not available when pg_atoi
> > > logs its error and bails out.
> > >
> > > Is it possible for pg_atoi to return an invalid pointer instead
> > > of completely bailing out, thus allowing the 'sanity check' code
> > > to print the line number?  I have no idea how many things this
> > > might break...
> > >
> > > Regards
> > >
> > > Mike Beller
> > > CTO
> > > Tradeworx.com
> > >
> > > ************
> >
> > ************
>
> --
> ------------------------------------------------------------
> Thomas Reinke                            Tel: (905) 331-2260
> Director of Technology                   Fax: (905) 331-2504
> E-Soft Inc.                         http://www.e-softinc.com
>
> ************


pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] recovering a "lost" database
Next
From: "Fabiano Ralo Monteiro"
Date:
Subject: PostgreSQL rows limit