Home > mailing lists

Re: Best way to delete unreferenced rows? - Mailing list pgsql-performance

From	Craig James
Subject	Re: Best way to delete unreferenced rows?
Date	June 11, 2007 12:41:37
Msg-id	466D6DDE.7030006@emolecules.com Whole thread
In response to	Re: Best way to delete unreferenced rows? ("Tyrrill, Ed" <tyrrill_ed@emc.com>)
Responses	Re: Best way to delete unreferenced rows? Re: Best way to delete unreferenced rows?
List	pgsql-performance

Tree view

Tyrrill, Ed wrote:
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -------------------
>  Merge Left Join  (cost=38725295.93..42505394.70 rows=13799645 width=8)
> (actual time=6503583.342..8220629.311 rows=93524 loops=1)
>    Merge Cond: ("outer".record_id = "inner".record_id)
>    Filter: ("inner".record_id IS NULL)
>    ->  Index Scan using backupobjects_pkey on backupobjects
> (cost=0.00..521525.10 rows=13799645 width=8) (actual
> time=15.955..357813.621 rows=13799645 loops=1)
>    ->  Sort  (cost=38725295.93..39262641.69 rows=214938304 width=8)
> (actual time=6503265.293..7713657.750 rows=214938308 loops=1)
>          Sort Key: backup_location.record_id
>          ->  Seq Scan on backup_location  (cost=0.00..3311212.04
> rows=214938304 width=8) (actual time=11.175..1881179.825 rows=214938308
> loops=1)
>  Total runtime: 8229178.269 ms
> (8 rows)
>
> I ran vacuum analyze after the last time any inserts, deletes, or
> updates were done, and before I ran the query above.  I've attached my
> postgresql.conf.  The machine has 4 GB of RAM.

I thought maybe someone with more expertise than me might answer this, but since they haven't I'll just make a comment.
It looks to me like the sort of 214 million rows is what's killing you.  I suppose you could try to increase the sort
memory,but that's a lot of memory.  It seems to me an index merge of a relation this large would be faster, but that's
atopic for the experts. 

On a theoretical level, the problem is that it's sorting the largest table.  Perhaps you could re-cast the query so
thatit only has to sort the smaller table, something like 

   select a.id from a where a.id not in (select distinct b.id from b)

where "b" is the smaller table.  There's still no guarantee that it won't do a sort on "a", though.  In fact one of the
cleverthings about Postgres is that it can convert a query like the one above into a regular join, unless you do
somethinglike "select ... offset 0" which blocks the optimizer from doing the rearrangement. 

But I think the first approach is to try to tune for a better plan using your original query.

Craig

pgsql-performance by date:

From: Christo Du Preez
Date: 11 June 2007, 12:36:19
Subject: test / live environment, major performance difference

From: Markus Schiltknecht
Date: 11 June 2007, 12:46:18
Subject: Re: dbt2 NOTPM numbers

Re: Best way to delete unreferenced rows? - Mailing list pgsql-performance

Previous

Next