Thread: how to continue after error in batch mode with psql

how to continue after error in batch mode with psql

From
Urs Rau
Date:
I have a nightly process that pulls data over into postgresql 8.3.10
from a progress server runing under8.x. unfortunately the progress db
does not enforce record size/length limitations. It does not care if a
record of 20 characters contains 21, or 100 for that matter.

we have a simple pipe that taks the progress data export dump and
imports it into postgresql

su - postgres -c "cat test.csv | psql -X -q test -c \"COPY t_test FROM
stdin WITH DELIMITER AS ',' NULL AS '?' CSV QUOTE AS '\\\"' ESCAPE AS
'\\\"'\""

If any of the fields are over-length, we do get a error message that
tells us which row needs fixing.

ERROR:  value too long for type character varying(20)
CONTEXT:  COPY t_test, line 2, column t_comment_c: "'comment 3 that is
too long'"

But how do we get psql to run through and continue after an error and
import as much as possible of the rest of the import data and give us
error messages about all lines with errors?

So I want:
ERROR:  value too long for type character varying(20)
CONTEXT:  COPY t_test, line 2, column t_comment_c: "'comment 3 that is
too long'"
ERROR:  value too long for type character varying(20)
CONTEXT:  COPY t_test, line 4, column t_comment_c: "'comment 5 that is
too long'"
ERROR:  value too long for type character varying(20)
CONTEXT:  COPY t_test, line 6, column t_comment_c: "'comment 7 that is
too long'"
ERROR:  value too long for type character varying(20)
CONTEXT:  COPY t_test, line 8, column t_comment_c: "'comment 9 that is
too long'"


I have tried adding the following to psql, but that did not make any
difference.

"-v ON_ERROR_STOP"
"-v ON_ERROR_STOP=0"
"--set "ON_ERROR_STOP"
"--set "ON_ERROR_STOP=0"


Here is my test database definition, followed by my test.csv data:

SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: t_test; Type: TABLE; Schema: public; Owner: postgres; Tablespace:
--
CREATE TABLE t_test (
    t_record_i integer DEFAULT 0 NOT NULL,
    t_comment_c character varying(20)
);
ALTER TABLE public.t_test OWNER TO postgres;
--
-- Data for Name: t_test; Type: TABLE DATA; Schema: public; Owner: postgres
--
INSERT INTO t_test VALUES (1, 'comment 1');
--
-- Name: t_table_pkey; Type: CONSTRAINT; Schema: public; Owner:
postgres; Tablespace:
--
ALTER TABLE ONLY t_test
    ADD CONSTRAINT t_table_pkey PRIMARY KEY (t_record_i);
--
-- Name: t_test; Type: ACL; Schema: public; Owner: postgres
--
REVOKE ALL ON TABLE t_test FROM PUBLIC;
REVOKE ALL ON TABLE t_test FROM postgres;
GRANT ALL ON TABLE t_test TO postgres;

and my test.csv data:
2,'comment 2'
3,'comment 3 that is too long'
4,'comment 4'
5,'comment 5 that is too long'
6,'comment 6'
7,'comment 7 that is too long'
8,'comment 8'
9,'comment 9 that is too long'
10,'comment 10'


What I am hoping for is to get an error message about the rows 3,5,7 and
9 (and - if I am lucky all the data up to the defined length in the
record) and definitely all the data for row 2,4,6,8 and 10 in the t_test
table in the test database.

What does it take, top get this working? I am a newbie, am I barking up
the wrong tree with trying to unset ON_ERROR_STOP?

Many thanks for any hints.

Regards,


--
Urs Rau


Re: how to continue after error in batch mode with psql

From
Jasen Betts
Date:
On 2010-04-28, Urs Rau <urs.rau@uk.om.org> wrote:
> I have a nightly process that pulls data over into postgresql 8.3.10
> from a progress server runing under8.x. unfortunately the progress db
> does not enforce record size/length limitations. It does not care if a
> record of 20 characters contains 21, or 100 for that matter.
>
> we have a simple pipe that taks the progress data export dump and
> imports it into postgresql
>
> su - postgres -c "cat test.csv | psql -X -q test -c \"COPY t_test FROM
> stdin WITH DELIMITER AS ',' NULL AS '?' CSV QUOTE AS '\\\"' ESCAPE AS
> '\\\"'\""
>
> If any of the fields are over-length, we do get a error message that
> tells us which row needs fixing.
>
> ERROR:  value too long for type character varying(20)
> CONTEXT:  COPY t_test, line 2, column t_comment_c: "'comment 3 that is
> too long'"
>
> But how do we get psql to run through and continue after an error and
> import as much as possible of the rest of the import data and give us
> error messages about all lines with errors?

why not just copy the whole lot into a temp table (like the target
table but with text fields instead of fixed-width varchar)
and then do a select on that to find the over-length lines and another
select to insert the good data into the real target table.


Re: how to continue after error in batch mode with psql

From
jr
Date:
hi Urs,

this is perhaps not what you meant but...

I'd use a simple script to filter out the offending lines before they go
into the database, like the following written in Tcl (but you could
write this in almost any language):

----------<snip>----------
#!/usr/bin/tclsh
set fp [open badlines.txt a]
while {0 <= [gets stdin line]} {
   set n [string length $line]
   if {$n < 1 ||  $n > 20} {
     puts $fp $line
   } else {
     puts stdout $line
   }
}
close $fp
exit 0
----------<snip>----------

now your pipeline becomes:

su - postgres -c "cat test.csv | badlines | psql -X -q test -c \"COPY
t_test FROM stdin WITH DELIMITER AS ',' NULL AS '?' CSV QUOTE AS '\\\"'
ESCAPE AS '\\\"'\""

and you can process the lines in 'badlines.txt' later, at your leisure.

HTH

--

regards, jr.  (jr@tailorware.org.uk)

Re: how to continue after error in batch mode with psql

From
"Urs Rau (UK)"
Date:
Hi Jasen,

Jasen Betts wrote:

>
> why not just copy the whole lot into a temp table (like the target
> table but with text fields instead of fixed-width varchar)
> and then do a select on that to find the over-length lines and another
> select to insert the good data into the real target table.
>

Thanks, we did that. In fact the postgresql db schema is generated from
a xml file, so we changed fields where this applies to now be text and
made them bigger in postgresql schema. Now the nightly mirroring runs
through. without any field 'over-runs' stopping it.

Regards,

--
Urs Rau


Re: how to continue after error in batch mode with psql

From
"Urs Rau (UK)"
Date:
Hi jr


jr wrote:
> hi Urs,
>
> this is perhaps not what you meant but...
>


Actually, this in-line-filter idea is exactly what I had in mind as the
worst-case laborious work around when I wrote my email. Laborious?
Laborious because there are 350+ tables that we fetch each night and I
did not fancy writing and parsing 350+ tables on each run.

The idea of a temp table was the sort of hint I hoped to get off the
list. And I got it. ;-) Now having run this for a few days, we start to
realise that maybe we still have to run it through an in-line-filter as
we keep getting new errors or overruns in new places. So we might have
to generate 350+ in-line-filters and run them each night.

BTW, there is a really neat trick that I stumbled across recently that
we plan to use for logging the actions of the filter. It's a technique
that allows us to have the stdout and stderr of the filter in a log
file, but also get the stderr out separately, so 'cron', which runs the
filter nightly, can email us the errors or failures.

(((filter-progress-db.sh | tee -a /var/log/progress-db-mirror) \
3>&1 1>&2 2>&3 | tee -a /var/log/progress-db-mirror) \
3>&1 1>&2 2>&3) >/dev/null


When the above command is run from cron, it will log everything to the
file and if anything goes wrong, stderr will go to the console, which
gets e-mailed to an administrator. If desired, you could also log stdout
and stderr to two distinct files.

It's taken from


http://www.enterprisenetworkingplanet.com/linux_unix/article.php/3870976/Improve-Your-Unix-Logging-with-Advanced-IO-Redirection.htm



> I'd use a simple script to filter out the offending lines before they go
> into the database, like the following written in Tcl (but you could
> write this in almost any language):
>

> ----------<snip>----------
> #!/usr/bin/tclsh

I guess in my case this would have to be perl. ;-)

> HTH

Thanks, yes it did. I like this list.

Regards,


--
Urs Rau


Re: how to continue after error in batch mode with psql

From
Mladen Gogala
Date:
Urs, psql is not a programming or reporting tool. Use perl or python
instead.

Urs Rau wrote:
> I have a nightly process that pulls data over into postgresql 8.3.10
> from a progress server runing under8.x. unfortunately the progress db
> does not enforce record size/length limitations. It does not care if a
> record of 20 characters contains 21, or 100 for that matter.
>
> we have a simple pipe that taks the progress data export dump and
> imports it into postgresql
>
> su - postgres -c "cat test.csv | psql -X -q test -c \"COPY t_test FROM
> stdin WITH DELIMITER AS ',' NULL AS '?' CSV QUOTE AS '\\\"' ESCAPE AS
> '\\\"'\""
>
> If any of the fields are over-length, we do get a error message that
> tells us which row needs fixing.
>
> ERROR:  value too long for type character varying(20)
> CONTEXT:  COPY t_test, line 2, column t_comment_c: "'comment 3 that is
> too long'"
>
> But how do we get psql to run through and continue after an error and
> import as much as possible of the rest of the import data and give us
> error messages about all lines with errors?
>
> So I want:
> ERROR:  value too long for type character varying(20)
> CONTEXT:  COPY t_test, line 2, column t_comment_c: "'comment 3 that is
> too long'"
> ERROR:  value too long for type character varying(20)
> CONTEXT:  COPY t_test, line 4, column t_comment_c: "'comment 5 that is
> too long'"
> ERROR:  value too long for type character varying(20)
> CONTEXT:  COPY t_test, line 6, column t_comment_c: "'comment 7 that is
> too long'"
> ERROR:  value too long for type character varying(20)
> CONTEXT:  COPY t_test, line 8, column t_comment_c: "'comment 9 that is
> too long'"
>
>
> I have tried adding the following to psql, but that did not make any
> difference.
>
> "-v ON_ERROR_STOP"
> "-v ON_ERROR_STOP=0"
> "--set "ON_ERROR_STOP"
> "--set "ON_ERROR_STOP=0"
>
>
> Here is my test database definition, followed by my test.csv data:
>
> SET client_encoding = 'UTF8';
> SET standard_conforming_strings = on;
> SET check_function_bodies = false;
> SET client_min_messages = warning;
> SET search_path = public, pg_catalog;
> SET default_tablespace = '';
> SET default_with_oids = false;
> --
> -- Name: t_test; Type: TABLE; Schema: public; Owner: postgres; Tablespace:
> --
> CREATE TABLE t_test (
>     t_record_i integer DEFAULT 0 NOT NULL,
>     t_comment_c character varying(20)
> );
> ALTER TABLE public.t_test OWNER TO postgres;
> --
> -- Data for Name: t_test; Type: TABLE DATA; Schema: public; Owner: postgres
> --
> INSERT INTO t_test VALUES (1, 'comment 1');
> --
> -- Name: t_table_pkey; Type: CONSTRAINT; Schema: public; Owner:
> postgres; Tablespace:
> --
> ALTER TABLE ONLY t_test
>     ADD CONSTRAINT t_table_pkey PRIMARY KEY (t_record_i);
> --
> -- Name: t_test; Type: ACL; Schema: public; Owner: postgres
> --
> REVOKE ALL ON TABLE t_test FROM PUBLIC;
> REVOKE ALL ON TABLE t_test FROM postgres;
> GRANT ALL ON TABLE t_test TO postgres;
>
> and my test.csv data:
> 2,'comment 2'
> 3,'comment 3 that is too long'
> 4,'comment 4'
> 5,'comment 5 that is too long'
> 6,'comment 6'
> 7,'comment 7 that is too long'
> 8,'comment 8'
> 9,'comment 9 that is too long'
> 10,'comment 10'
>
>
> What I am hoping for is to get an error message about the rows 3,5,7 and
> 9 (and - if I am lucky all the data up to the defined length in the
> record) and definitely all the data for row 2,4,6,8 and 10 in the t_test
> table in the test database.
>
> What does it take, top get this working? I am a newbie, am I barking up
> the wrong tree with trying to unset ON_ERROR_STOP?
>
> Many thanks for any hints.
>
> Regards,
>
>
>


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com