Thread: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)

Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)

From

Steve T

Date:

27 August 2009, 13:28:38

Postgresql 8.1

I have a large'ish table containing supplier products. One of the suppliers is no longer being used, and I want to delete all the products for that supplier that haven't as yet been used.
To do this I have created a smaller table of unused products and then delete from the main table where the product is in the sub table - ie:

create temp table _sub_recs as
    select distinct sprod.recNo
        from supplierProduct as Sprod
                 left outer join sourcedetupgr as srcu
                 on sprod.recno = srcu.supplierproductrecno
        where sprod.supplierrecno = 1
        and srcu.recno is null;

create unique index _sub_recs1 on _sub_recs (recno);

vacuum analyze verbose _sub_recs;

explain delete from supplierProduct
                where supplierrecno = 1
                and recNo in
                   (select recNo from _sub_recs);

Which gives:
                                  QUERY PLAN
-------------------------------------------------------------------------------
Hash IN Join (cost=7594.88..42240.95 rows=260517 width=6)
   Hash Cond: ("outer".recno = "inner".recno)
   -> Seq Scan on supplierproduct (cost=0.00..19062.44 rows=423684 width=10)
         Filter: (supplierrecno = 1)
   -> Hash (cost=5256.70..5256.70 rows=364870 width=4)
         -> Seq Scan on _sub_recs (cost=0.00..5256.70 rows=364870 width=4)
(6 rows)

The problem is that the query, when run in earnest, disappears off into the sunset for hours. How can I see how it is progressing (is the only way setting up a procedure and doing explicit transactions for each single delete)? I also tried 'delete .... using' - but that gave the same plan.

Is is better to use the IN format above or EXISTS?

All the above is being run directly in psql.

Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)

From

Alan Hodgson

Date:

27 August 2009, 13:57:37

On Thursday 27 August 2009, Steve T <steve@retsol.co.uk> wrote:
> explain  delete from supplierProduct
>                 where supplierrecno = 1
>                 and recNo in
>                    (select recNo from _sub_recs);
>
> The problem is that the query, when run in earnest, disappears off into
> the sunset for hours. How can I see how it is progressing (is the only
> way setting up a procedure and doing explicit transactions for each
> single delete)? I also tried 'delete .... using' - but that gave the
> same plan.
>
> Is is better to use the IN format above or EXISTS?
>

"exists" in 8.1, I believe. "in" in later versions.

> All the above is being run directly in psql.

Do you have other foreign keys into supplierProduct on other tables where
the relevant columns aren't indexed? Slow deletes are usually caused by
that.

Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)

From

Tom Lane

Date:

27 August 2009, 14:18:31

Alan Hodgson <ahodgson@simkin.ca> writes:
> Do you have other foreign keys into supplierProduct on other tables where
> the relevant columns aren't indexed? Slow deletes are usually caused by
> that.

Yeah.  The query plan itself looks perfectly reasonable, so I'm
suspecting the problem is something happening subsequent to the row
deletions --- like foreign key constraint checking.

            regards, tom lane

Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)

From

Steve T

Date:

27 August 2009, 14:39:07

Looks like that's the cause.
I had wrongly assumed that the constraint itself caused the foreign table to be indexed.

On Thu, 2009-08-27 at 09:34 -0700, Alan Hodgson wrote:

On Thursday 27 August 2009, Steve T <steve@retsol.co.uk> wrote:
> explain  delete from supplierProduct
>                 where supplierrecno = 1
>                 and recNo in
>                    (select recNo from _sub_recs);
>
> The problem is that the query, when run in earnest, disappears off into
> the sunset for hours. How can I see how it is progressing (is the only
> way setting up a procedure and doing explicit transactions for each
> single delete)? I also tried 'delete .... using' - but that gave the
> same plan.
>
> Is is better to use the IN format above or EXISTS?
>

"exists" in 8.1, I believe. "in" in later versions.

> All the above is being run directly in psql.

Do you have other foreign keys into supplierProduct on other tables where 
the relevant columns aren't indexed? Slow deletes are usually caused by 
that.

Steve Tucknott
ReTSol Ltd

DDI: 01323 488548

Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)

From

Steve T

Date:

27 August 2009, 15:13:06

Tom/Alan,
Thanks. 300,000+ rows deleted in under 10 minutes (may have been a lot less - I went to get a coffee, came back and it was done).

Thanks again.

PS Is there a reason that PostgreSql doesn't automatically create the foreign key indexes?

On Thu, 2009-08-27 at 13:17 -0400, Tom Lane wrote:

Alan Hodgson <ahodgson@simkin.ca> writes:
> Do you have other foreign keys into supplierProduct on other tables where 
> the relevant columns aren't indexed? Slow deletes are usually caused by 
> that.

Yeah.  The query plan itself looks perfectly reasonable, so I'm
suspecting the problem is something happening subsequent to the row
deletions --- like foreign key constraint checking.

			regards, tom lane

Steve Tucknott
ReTSol Ltd

DDI: 01323 488548

Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)

From

Tom Lane

Date:

27 August 2009, 15:19:07

Steve T <steve@retsol.co.uk> writes:
> PS Is there a reason that PostgreSql doesn't automatically create the
> foreign key indexes?

Sometimes you don't need/want them, or they might already exist because
of other constraints you made.  It is a bit of a gotcha for newbies,
but we've concluded that forcing such indexes to be made would be a
net loss.

            regards, tom lane

Re: Speeding up 'bulk' delete (and/or seeing what is going on while the delete is being processed)

From

"Eric Comeau"

Date:

28 August 2009, 08:58:24

"Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
news:20904.1251397138@sss.pgh.pa.us...
> Steve T <steve@retsol.co.uk> writes:
>> PS Is there a reason that PostgreSql doesn't automatically create the
>> foreign key indexes?
>
> Sometimes you don't need/want them, or they might already exist because
> of other constraints you made.  It is a bit of a gotcha for newbies,
> but we've concluded that forcing such indexes to be made would be a
> net loss.
>

I would like to run a utility to identify any missing indexes for
foreign-keys so I don't suffer the same faith.

Just wondering if there is something built-in PostgreSQL or pgAdmin already?

Or should I follow these blogs/posts:

   http://mlawire.blogspot.com/2009/08/postgresql-indexes-on-foreign-keys.html
   http://archives.postgresql.org/pgsql-general/2008-12/msg00880.php

Thanks,
Eric