Thread: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)
Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)
From
Steve T
Date:
Postgresql 8.1
I have a large'ish table containing supplier products. One of the suppliers is no longer being used, and I want to delete all the products for that supplier that haven't as yet been used.
To do this I have created a smaller table of unused products and then delete from the main table where the product is in the sub table - ie:
create temp table _sub_recs as
select distinct sprod.recNo
from supplierProduct as Sprod
left outer join sourcedetupgr as srcu
on sprod.recno = srcu.supplierproductrecno
where sprod.supplierrecno = 1
and srcu.recno is null;
create unique index _sub_recs1 on _sub_recs (recno);
vacuum analyze verbose _sub_recs;
explain delete from supplierProduct
where supplierrecno = 1
and recNo in
(select recNo from _sub_recs);
Which gives:
QUERY PLAN
-------------------------------------------------------------------------------
Hash IN Join (cost=7594.88..42240.95 rows=260517 width=6)
Hash Cond: ("outer".recno = "inner".recno)
-> Seq Scan on supplierproduct (cost=0.00..19062.44 rows=423684 width=10)
Filter: (supplierrecno = 1)
-> Hash (cost=5256.70..5256.70 rows=364870 width=4)
-> Seq Scan on _sub_recs (cost=0.00..5256.70 rows=364870 width=4)
(6 rows)
The problem is that the query, when run in earnest, disappears off into the sunset for hours. How can I see how it is progressing (is the only way setting up a procedure and doing explicit transactions for each single delete)? I also tried 'delete .... using' - but that gave the same plan.
Is is better to use the IN format above or EXISTS?
All the above is being run directly in psql.
I have a large'ish table containing supplier products. One of the suppliers is no longer being used, and I want to delete all the products for that supplier that haven't as yet been used.
To do this I have created a smaller table of unused products and then delete from the main table where the product is in the sub table - ie:
create temp table _sub_recs as
select distinct sprod.recNo
from supplierProduct as Sprod
left outer join sourcedetupgr as srcu
on sprod.recno = srcu.supplierproductrecno
where sprod.supplierrecno = 1
and srcu.recno is null;
create unique index _sub_recs1 on _sub_recs (recno);
vacuum analyze verbose _sub_recs;
explain delete from supplierProduct
where supplierrecno = 1
and recNo in
(select recNo from _sub_recs);
Which gives:
QUERY PLAN
-------------------------------------------------------------------------------
Hash IN Join (cost=7594.88..42240.95 rows=260517 width=6)
Hash Cond: ("outer".recno = "inner".recno)
-> Seq Scan on supplierproduct (cost=0.00..19062.44 rows=423684 width=10)
Filter: (supplierrecno = 1)
-> Hash (cost=5256.70..5256.70 rows=364870 width=4)
-> Seq Scan on _sub_recs (cost=0.00..5256.70 rows=364870 width=4)
(6 rows)
The problem is that the query, when run in earnest, disappears off into the sunset for hours. How can I see how it is progressing (is the only way setting up a procedure and doing explicit transactions for each single delete)? I also tried 'delete .... using' - but that gave the same plan.
Is is better to use the IN format above or EXISTS?
All the above is being run directly in psql.
Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)
From
Alan Hodgson
Date:
On Thursday 27 August 2009, Steve T <steve@retsol.co.uk> wrote: > explain delete from supplierProduct > where supplierrecno = 1 > and recNo in > (select recNo from _sub_recs); > > The problem is that the query, when run in earnest, disappears off into > the sunset for hours. How can I see how it is progressing (is the only > way setting up a procedure and doing explicit transactions for each > single delete)? I also tried 'delete .... using' - but that gave the > same plan. > > Is is better to use the IN format above or EXISTS? > "exists" in 8.1, I believe. "in" in later versions. > All the above is being run directly in psql. Do you have other foreign keys into supplierProduct on other tables where the relevant columns aren't indexed? Slow deletes are usually caused by that.
Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)
From
Tom Lane
Date:
Alan Hodgson <ahodgson@simkin.ca> writes: > Do you have other foreign keys into supplierProduct on other tables where > the relevant columns aren't indexed? Slow deletes are usually caused by > that. Yeah. The query plan itself looks perfectly reasonable, so I'm suspecting the problem is something happening subsequent to the row deletions --- like foreign key constraint checking. regards, tom lane
Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)
From
Steve T
Date:
Looks like that's the cause.
I had wrongly assumed that the constraint itself caused the foreign table to be indexed.
On Thu, 2009-08-27 at 09:34 -0700, Alan Hodgson wrote:
I had wrongly assumed that the constraint itself caused the foreign table to be indexed.
On Thu, 2009-08-27 at 09:34 -0700, Alan Hodgson wrote:
On Thursday 27 August 2009, Steve T <steve@retsol.co.uk> wrote: > explain delete from supplierProduct > where supplierrecno = 1 > and recNo in > (select recNo from _sub_recs); > > The problem is that the query, when run in earnest, disappears off into > the sunset for hours. How can I see how it is progressing (is the only > way setting up a procedure and doing explicit transactions for each > single delete)? I also tried 'delete .... using' - but that gave the > same plan. > > Is is better to use the IN format above or EXISTS? > "exists" in 8.1, I believe. "in" in later versions. > All the above is being run directly in psql. Do you have other foreign keys into supplierProduct on other tables where the relevant columns aren't indexed? Slow deletes are usually caused by that.
|
Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)
From
Steve T
Date:
Tom/Alan,
Thanks. 300,000+ rows deleted in under 10 minutes (may have been a lot less - I went to get a coffee, came back and it was done).
Thanks again.
PS Is there a reason that PostgreSql doesn't automatically create the foreign key indexes?
On Thu, 2009-08-27 at 13:17 -0400, Tom Lane wrote:
Thanks. 300,000+ rows deleted in under 10 minutes (may have been a lot less - I went to get a coffee, came back and it was done).
Thanks again.
PS Is there a reason that PostgreSql doesn't automatically create the foreign key indexes?
On Thu, 2009-08-27 at 13:17 -0400, Tom Lane wrote:
Alan Hodgson <ahodgson@simkin.ca> writes: > Do you have other foreign keys into supplierProduct on other tables where > the relevant columns aren't indexed? Slow deletes are usually caused by > that. Yeah. The query plan itself looks perfectly reasonable, so I'm suspecting the problem is something happening subsequent to the row deletions --- like foreign key constraint checking. regards, tom lane
|
Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)
From
Tom Lane
Date:
Steve T <steve@retsol.co.uk> writes: > PS Is there a reason that PostgreSql doesn't automatically create the > foreign key indexes? Sometimes you don't need/want them, or they might already exist because of other constraints you made. It is a bit of a gotcha for newbies, but we've concluded that forcing such indexes to be made would be a net loss. regards, tom lane
Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)
From
"Eric Comeau"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote in message news:20904.1251397138@sss.pgh.pa.us... > Steve T <steve@retsol.co.uk> writes: >> PS Is there a reason that PostgreSql doesn't automatically create the >> foreign key indexes? > > Sometimes you don't need/want them, or they might already exist because > of other constraints you made. It is a bit of a gotcha for newbies, > but we've concluded that forcing such indexes to be made would be a > net loss. > I would like to run a utility to identify any missing indexes for foreign-keys so I don't suffer the same faith. Just wondering if there is something built-in PostgreSQL or pgAdmin already? Or should I follow these blogs/posts: http://mlawire.blogspot.com/2009/08/postgresql-indexes-on-foreign-keys.html http://archives.postgresql.org/pgsql-general/2008-12/msg00880.php Thanks, Eric