Home > mailing lists

Re: How to "unique-ify" HUGE table? - Mailing list pgsql-performance

From	Scott Marlowe
Subject	Re: How to "unique-ify" HUGE table?
Date	December 23, 2008 13:34:32
Msg-id	dcc563d10812230934y613ec899sffe5483e87c93cfd@mail.gmail.com Whole thread Raw
In response to	How to "unique-ify" HUGE table? ("Kynn Jones" <kynnjo@gmail.com>)
List	pgsql-performance

Tree view

On Tue, Dec 23, 2008 at 10:25 AM, Kynn Jones <kynnjo@gmail.com> wrote:
> Hi everyone!
> I have a very large 2-column table (about 500M records) from which I want to
> remove duplicate records.
> I have tried many approaches, but they all take forever.
> The table's definition consists of two short TEXT columns.  It is a
> temporary table generated from a query:
>
> CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ;
> Initially I tried
> CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ;
> but after waiting for nearly an hour I aborted the query, and repeated it
> after getting rid of the DISTINCT clause.
> Everything takes forever with this monster!  It's uncanny.  Even printing it
> out to a file takes forever, let alone creating an index for it.
> Any words of wisdom on how to speed this up would be appreciated.

Did you try cranking up work_mem to something that's a large
percentage (25 to 50%) of total memory?

pgsql-performance by date:

From: "Kynn Jones"
Date: 23 December 2008, 13:25:53
Subject: How to "unique-ify" HUGE table?

From: "D'Arcy J.M. Cain"
Date: 23 December 2008, 13:45:24
Subject: Re: How to "unique-ify" HUGE table?

Re: How to "unique-ify" HUGE table? - Mailing list pgsql-performance

Previous

Next