Re: Best way to delete unreferenced rows? - Mailing list pgsql-performance

From Craig James
Subject Re: Best way to delete unreferenced rows?
Date
Msg-id 46688BC5.3020006@emolecules.com
Whole thread Raw
In response to Best way to delete unreferenced rows?  ("Tyrrill, Ed" <tyrrill_ed@emc.com>)
Responses Re: Best way to delete unreferenced rows?
List pgsql-performance
Tyrrill, Ed wrote:

> I have a table, let's call it A, whose primary key, a_id, is referenced
> in a second table, let's call it B.  For each unique A.a_id there are
> generally many rows in B with the same a_id.  My problem is that I want
> to delete a row in A when the last row in B that references it is
> deleted.  Right now I just query for rows in A that aren't referenced by
> B, and that worked great when the tables were small, but it takes over
> an hour now that the tables have grown larger (over 200 million rows in
> B and 14 million in A).  The delete has to do a sequential scan of both
> tables since I'm looking for what's not in the indexes.
>
> I was going to try creating a trigger after delete on B for each row to
> check for more rows in B with the same a_id, and delete the row in A if
> none found.  In general I will be deleting 10's of millions of rows from
> B and 100's of thousands of rows from A on a daily basis.  What do you
> think?  Does anyone have any other suggestions on different ways to
> approach this?

Essentially what you're doing is taking the one-hour job and spreading out in little chunks over thousands of queries.
Ifyou have 10^7 rows in B and 10^5 rows in A, then on average you have 100 references from B to A.  That means that 99%
ofthe time, your trigger will scan B and find that there's nothing to do.  This could add a lot of overhead to your
ordinarytransactions, costing a lot more in the long run than just doing the once-a-day big cleanout. 

You didn't send the specifics of the query you're using, along with an EXPLAIN ANALYZE of it in operation.  It also be
thatyour SQL is not optimal, and that somebody could suggest a more efficient query. 

It's also possible that it's not the sequential scans that are the problem, but rather that it just takes a long time
todelete 100,000 rows from table A because you have a lot of indexes. Or it could be a combination of performance
problems.

You haven't given us enough information to really analyze your problem.  Send more details!

Craig

pgsql-performance by date:

Previous
From: "Tyrrill, Ed"
Date:
Subject: Best way to delete unreferenced rows?
Next
From: Mark Kirkwood
Date:
Subject: Re: Weird 8.2.4 performance