Thread: ignore errors for COPY
Hello I have postrges 8.3.12 and I have the following issue: I have a table create table test( table_id integer, datetime timestamp, MMSI integer, lat real, lng real, ); and I bulk insert data to this table with COPY. A tiny portion of the data in the file are wrong. For example one date is "2009-93-29 05:27:08" which obviously has a wrong month (93). COPY encounters this row and stop the insertion with ERROR: date/time field value out of range: "2009-93-29 05:27:08" Is there a way I can "turn" this error into a warning (or suppress the error) and make COPY simply to skip this row? I have a big amount of data (~100G) so iterating through them to find all the possible wrong timestamp, reals, and integers will be quite tedious and time consuming. Regards Vangelis
Le 11/11/2010 13:01, Vangelis Katsikaros a écrit : > Hello > > I have postrges 8.3.12 and I have the following issue: > > I have a table > create table test( > table_id integer, > datetime timestamp, > MMSI integer, > lat real, > lng real, > ); > > and I bulk insert data to this table with COPY. > > A tiny portion of the data in the file are wrong. For example one date > is "2009-93-29 05:27:08" which obviously has a wrong month (93). COPY > encounters this row and stop the insertion with > ERROR: date/time field value out of range: "2009-93-29 05:27:08" > > Is there a way I can "turn" this error into a warning (or suppress the > error) and make COPY simply to skip this row? > Nope. > I have a big amount of data (~100G) so iterating through them to find > all the possible wrong timestamp, reals, and integers will be quite > tedious and time consuming. > You should better look at pgloader which will use COPY to put your data in your table and found the lines in error. Of course, it takes time to detect lines in error. But at least, all "good" lines will be in your table, and all "bad" lines will be in a file, so that you can modify them to inject later. -- Guillaume http://www.postgresql.fr http://dalibo.com
On Thu, Nov 11, 2010 at 8:05 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote: > Le 11/11/2010 13:01, Vangelis Katsikaros a écrit : >> Hello >> >> I have postrges 8.3.12 and I have the following issue: >> >> I have a table >> create table test( >> table_id integer, >> datetime timestamp, >> MMSI integer, >> lat real, >> lng real, >> ); >> >> and I bulk insert data to this table with COPY. >> >> A tiny portion of the data in the file are wrong. For example one date >> is "2009-93-29 05:27:08" which obviously has a wrong month (93). COPY >> encounters this row and stop the insertion with >> ERROR: date/time field value out of range: "2009-93-29 05:27:08" >> >> Is there a way I can "turn" this error into a warning (or suppress the >> error) and make COPY simply to skip this row? >> > > Nope. > >> I have a big amount of data (~100G) so iterating through them to find >> all the possible wrong timestamp, reals, and integers will be quite >> tedious and time consuming. >> > > You should better look at pgloader which will use COPY to put your data > in your table and found the lines in error. Of course, it takes time to > detect lines in error. But at least, all "good" lines will be in your > table, and all "bad" lines will be in a file, so that you can modify > them to inject later. > > > -- > Guillaume > http://www.postgresql.fr > http://dalibo.com > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > olé!!! Maybe you could import it as a text column and then deal with the conversion in the DB....i do that sometimes....never had 100GB of data to work with though.... Rhys
On 11/11/2010 03:05 PM, Guillaume Lelarge wrote: > > You should better look at pgloader which will use COPY to put your data > in your table and found the lines in error. Of course, it takes time to > detect lines in error. But at least, all "good" lines will be in your > table, and all "bad" lines will be in a file, so that you can modify > them to inject later. > > Hi Thanks for the answer, indeed it solves the issue. Now, I would like to ask a second question (sorry for using the same thread :) I would like to create a specific database on a another location (not in "data_directory = '/var/lib/postgresql/8.3/main'" ). I was wondering if I have to run initdb -D /other/path/ of if there is another way (for example in mysql I can do the same thing by creating a symlink to the other location inside "datadir = /var/lib/mysql" ) Regards Vangelis
Le 11/11/2010 17:46, Vangelis Katsikaros a écrit : > On 11/11/2010 03:05 PM, Guillaume Lelarge wrote: >> >> You should better look at pgloader which will use COPY to put your data >> in your table and found the lines in error. Of course, it takes time to >> detect lines in error. But at least, all "good" lines will be in your >> table, and all "bad" lines will be in a file, so that you can modify >> them to inject later. >> >> > > Hi > > Thanks for the answer, indeed it solves the issue. > Great. > Now, I would like to ask a second question (sorry for using the same > thread :) > > I would like to create a specific database on a another location (not in > "data_directory = '/var/lib/postgresql/8.3/main'" ). I was wondering if > I have to run > initdb -D /other/path/ > > of if there is another way > To create another database in another location, you first need to create a tablespace, and then create a database in that tablespace. For example, in psql: CREATE TABLESPACE otherlocation LOCATION '/some/specific/directory'; CREATE DATABASE newdb TABLESPACE otherlocation; > (for example in mysql I can do the same thing by creating a symlink to > the other location inside "datadir = /var/lib/mysql" ) > Don't know MySQL, so can't say :) -- Guillaume http://www.postgresql.fr http://dalibo.com
On 11/11/2010 08:15 PM, Guillaume Lelarge wrote: >> >> I would like to create a specific database on a another location (not in >> "data_directory = '/var/lib/postgresql/8.3/main'" ). I was wondering if >> I have to run >> initdb -D /other/path/ >> >> of if there is another way >> > > To create another database in another location, you first need to create > a tablespace, and then create a database in that tablespace. For > example, in psql: > > CREATE TABLESPACE otherlocation LOCATION '/some/specific/directory'; > CREATE DATABASE newdb TABLESPACE otherlocation; Hey thanks again, that did the job :)