How to "unique-ify" HUGE table? - Mailing list pgsql-performance

From Kynn Jones
Subject How to "unique-ify" HUGE table?
Date
Msg-id c2350ba40812230925jb50fed7h3dc58cc311888c7f@mail.gmail.com
Whole thread Raw
Responses Re: How to "unique-ify" HUGE table?  ("Scott Marlowe" <scott.marlowe@gmail.com>)
Re: How to "unique-ify" HUGE table?  ("D'Arcy J.M. Cain" <darcy@druid.net>)
Re: How to "unique-ify" HUGE table?  ("George Pavlov" <gpavlov@mynewplace.com>)
List pgsql-performance
Hi everyone!

I have a very large 2-column table (about 500M records) from which I want to remove duplicate records.

I have tried many approaches, but they all take forever.

The table's definition consists of two short TEXT columns.  It is a temporary table generated from a query:

CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ;

Initially I tried

CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ;

but after waiting for nearly an hour I aborted the query, and repeated it after getting rid of the DISTINCT clause.

Everything takes forever with this monster!  It's uncanny.  Even printing it out to a file takes forever, let alone creating an index for it.

Any words of wisdom on how to speed this up would be appreciated.

TIA!

Kynn



pgsql-performance by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: dbt-2 tuning results with postgresql-8.3.5
Next
From: "Scott Marlowe"
Date:
Subject: Re: How to "unique-ify" HUGE table?