Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY" - Mailing list pgsql-general

From Arup Rakshit
Subject Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY"
Date
Msg-id 10543895.DrSTaAQAv7@linux-wzza.aruprakshit
Whole thread Raw
In response to Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY"  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
On Sunday, May 24, 2015 07:52:43 AM you wrote:
> >>
> >> Is it if the entire row is duplicated?
> >
> > It is entire row.
>
> So, Olivers second solution.
>

I have done this :

columns_t1 = self.singleton_class.fields.map { |f| "t1.#{f}" }.join(",")
columns_t2 = self.singleton_class.fields.map { |f| "t2.#{f}" }.join(",")
ActiveRecord::Base.transaction do
  conn = ActiveRecord::Base.connection
  conn.execute "CREATE TEMP TABLE tmp_table AS SELECT * FROM #{table.strip}; "
  conn.execute("COPY tmp_table ( #{self.singleton_class.fields.join(',') } ) FROM '#{source_file}' CSV HEADER DELIMITER
'\t'QUOTE '|' ;") 
  conn.execute "INSERT INTO #{table.strip} ( #{self.singleton_class.fields.join(',')} ) SELECT DISTINCT #{columns_t1}
FROMtmp_table t1 WHERE NOT EXISTS ( SELECT 1 FROM #{table.strip} t2 WHERE (#{columns_t2}) IS NOT DISTINCT FROM
(#{columns_t1}));" 
  conn.execute "DROP TABLE IF EXISTS tmp_table;"
End

The SQL wrapped inside the ActiveRecord ORM as you see above. But I hope you got the idea. But I am not sure, if it is
thecorrect way to do it or how it will hit the performance..... 

The Application can run on different OS. So I am helpless to use Unix commands.

--
================
Regards,
Arup Rakshit
================
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as
possible,you are, by definition, not smart enough to debug it. 

--Brian Kernighan


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: PG and undo logging
Next
From: Francisco Olarte
Date:
Subject: Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY"