Home > mailing lists

Removing duplicate rows - Mailing list pgsql-sql

From	Paul Lambert
Subject	Removing duplicate rows
Date	February 6, 2007 18:16:07
Msg-id	45C8FDE6.6050603@autoledgers.com.au Whole thread Raw
Responses	Re: Removing duplicate rows
List	pgsql-sql

Tree view

I have some tables which I need to load from flat data files that can 
contain upwards of five million rows.

It's possible (owing to the structure of the system the data is 
originating from) that on occasion the data extract that generates these 
flat files can include two records that have the same values in the 
fields that make up the primary key of the table in my Postgres 
database, thus throwing an error when I attempt to load.

I'm assuming the best way to get around this is to load the data into a 
temporary table with "copy from" and then do a "select distinct into" my 
real table.

The question I have is the best way to set up this temporary table in my  reload script. (Having 14 servers running
Postgres,each with 37 
 
tables, I don't want to be creating temporary tables manually each time 
I need to reload the databases)

As the table definitions get updated frequently, I'd like to put in my 
script to create the table as a copy of the existing 'real' table. I'm 
assuming "create table as" would be what I need to use, but what would I 
use as the select sql - i.e. how do I use select to pull the definition 
of a table.

Apologies if this is a stupid question, I'm still fairly new to Postgres 
so I'm not sure what system tables are available for pulling out this 
kind of information.

Thanks in advance for any assistance you can offer.

Regards,
Paul.

-- 
Paul Lambert
Database Administrator
AutoLedgers

pgsql-sql by date:

From: Richard Huxton
Date: 06 February 2007, 10:45:20
Subject: Re: Index ANDing & Index ORing

From: Andrew Sullivan
Date: 06 February 2007, 18:22:48
Subject: Re: Removing duplicate rows

Removing duplicate rows - Mailing list pgsql-sql

Previous

Next