Thread: Poor Delete performance

Poor Delete performance

From
Bill Huff
Date:
  I am having some performance issues with deletion.

  It appears that in deleting records from a table with a significant
  number or rows ( in this case 1.3 million ) it takes about 1 hour per
  100K rows deleted if deleting more then 400K at a time.  This sounds
  way to slow to me.  If I break it up into smaller deletes then I can
  usually get about 5K rows per minute, but even that seems awfully slow
  to me.

  The machine is a pretty beefy box ( PIII 800Mhz, 256M, SCSI III )
  and is basically doing nothing else.  When performing the delete the
  system goes to 100% utilization ( >95% is postmaster ) and stays that
  way until the delete finishes.  My guess is that the overhead of MVCC
  tracking all of the rows deleted so that a rollback can be performed
  is killing me here, but I can see no way to get around it.

  I have searched the on-line archives for all the postgres lists and
  checked the FAQ and I can't find anything that gives any insight into
  increasing delete performance.

  Any information whatsoever would be greatly appreciated.

--
Bill

--
     _____
    / ___/___       | Bill Huff / bhuff@colltech.com
   / /__  __/       | Voice: (512) 263-0770 x 262
  / /__/ /          | Fax:   (512) 263-8921
  \___/ /ollective  | Pager: 1-800-946-4646 # 1406217
      \/echnologies |------[ http://www.colltech.com ] ------

Re: Poor Delete performance

From
Tom Lane
Date:
Bill Huff <bhuff@colltech.com> writes:
>   It appears that in deleting records from a table with a significant
>   number or rows ( in this case 1.3 million ) it takes about 1 hour per
>   100K rows deleted if deleting more then 400K at a time.  This sounds
>   way to slow to me.

Me too.  What PG version is this?  What's the query exactly, and what
does EXPLAIN show for it?  What is the full table definition ("pg_dump
-s -t tablename dbname" is the most reliable way to present the table)?

>   I have searched the on-line archives for all the postgres lists and
>   checked the FAQ and I can't find anything that gives any insight into
>   increasing delete performance.

DELETE (and UPDATE) are pretty much like SELECT as far as performance
considerations go.  Selecting the tuples to delete/update is the primary
performance issue.  So the docs about performance generally just talk
about SELECT to keep things simple.

            regards, tom lane

Re: Poor Delete performance

From
Bill Huff
Date:
  There is a foreign key constraint, but that is accomplished by an
  INSERT/UPDATE trigger.  I don't see why that should cause any problems.

--
Bill

On Mon, Mar 12, 2001 at 08:27:51AM -0800, Stephan Szabo wrote:
>
> As an outside chance, does the table in question have any constraints
> defined on it?
>
> On Mon, 12 Mar 2001, Bill Huff wrote:
>
> >   I am having some performance issues with deletion.
> >
> >   It appears that in deleting records from a table with a significant
> >   number or rows ( in this case 1.3 million ) it takes about 1 hour per
> >   100K rows deleted if deleting more then 400K at a time.  This sounds
> >   way to slow to me.  If I break it up into smaller deletes then I can
> >   usually get about 5K rows per minute, but even that seems awfully slow
> >   to me.
> >
> >   The machine is a pretty beefy box ( PIII 800Mhz, 256M, SCSI III )
> >   and is basically doing nothing else.  When performing the delete the
> >   system goes to 100% utilization ( >95% is postmaster ) and stays that
> >   way until the delete finishes.  My guess is that the overhead of MVCC
> >   tracking all of the rows deleted so that a rollback can be performed
> >   is killing me here, but I can see no way to get around it.
> >
> >   I have searched the on-line archives for all the postgres lists and
> >   checked the FAQ and I can't find anything that gives any insight into
> >   increasing delete performance.
> >
> >   Any information whatsoever would be greatly appreciated.

--
     _____
    / ___/___       | Bill Huff / bhuff@colltech.com
   / /__  __/       | Voice: (512) 263-0770 x 262
  / /__/ /          | Fax:   (512) 263-8921
  \___/ /ollective  | Pager: 1-800-946-4646 # 1406217
      \/echnologies |------[ http://www.colltech.com ] ------

Re: Poor Delete performance

From
Stephan Szabo
Date:
As an outside chance, does the table in question have any constraints
defined on it?

On Mon, 12 Mar 2001, Bill Huff wrote:

>   I am having some performance issues with deletion.
>
>   It appears that in deleting records from a table with a significant
>   number or rows ( in this case 1.3 million ) it takes about 1 hour per
>   100K rows deleted if deleting more then 400K at a time.  This sounds
>   way to slow to me.  If I break it up into smaller deletes then I can
>   usually get about 5K rows per minute, but even that seems awfully slow
>   to me.
>
>   The machine is a pretty beefy box ( PIII 800Mhz, 256M, SCSI III )
>   and is basically doing nothing else.  When performing the delete the
>   system goes to 100% utilization ( >95% is postmaster ) and stays that
>   way until the delete finishes.  My guess is that the overhead of MVCC
>   tracking all of the rows deleted so that a rollback can be performed
>   is killing me here, but I can see no way to get around it.
>
>   I have searched the on-line archives for all the postgres lists and
>   checked the FAQ and I can't find anything that gives any insight into
>   increasing delete performance.
>
>   Any information whatsoever would be greatly appreciated.


Re: Poor Delete performance

From
Tom Lane
Date:
Bill Huff <bhuff@colltech.com> writes:
>   There is a foreign key constraint, but that is accomplished by an
>   INSERT/UPDATE trigger.  I don't see why that should cause any problems.

Ah, I believe I see the problem: it's the inefficient implementation of
AFTER EVENT triggers.  The existence of either an AFTER UPDATE or AFTER
DELETE trigger causes trigger.c to add an event to its event queue for
every operation on the affected table.  And every addition of an event
searches that queue --- with a linear scan.  Thus operations on such a
table incur O(N^2) search cost if N tuples are affected in one
operation.

This needs to be improved (if we can't get rid of the lookup completely,
maybe use a hash table instead of sequential scan?) but it's much too
late to consider fixing it for 7.1 :-(.

Actually, though, it might be even stupider than that: it looks like
the queue should only be searched if the tuple being deleted was
inserted/modified earlier in the same transaction.  Assuming that that
doesn't apply to Bill's case, the only thing I can see that could be
causing O(N^2) behavior is the lappend() in deferredTriggerAddEvent.
That's simple enough that we *could* fix it for 7.1 ...

            regards, tom lane

RE: Poor Delete performance

From
Matthew
Date:
    [snip]

> This needs to be improved (if we can't get rid of the lookup completely,
> maybe use a hash table instead of sequential scan?) but it's much too
> late to consider fixing it for 7.1 :-(.
>
> Actually, though, it might be even stupider than that: it looks like
> the queue should only be searched if the tuple being deleted was
> inserted/modified earlier in the same transaction.  Assuming that that
> doesn't apply to Bill's case, the only thing I can see that could be
> causing O(N^2) behavior is the lappend() in deferredTriggerAddEvent.
> That's simple enough that we *could* fix it for 7.1 ...
>
    This would be a welcome improvement.  I have for a long time noticed
very slow delete performance when deleting a large number of records in one
command.  I can give more detail if so desired.

Re: Poor Delete performance

From
Bill Huff
Date:
  Just for everyones information.  I removed the foreign key constraint
  that was on the table and was able to delete about 190K records in
  just over 5 seconds.  That is much more like it.  B^)

  This does appear to be an interesting *feature* with the way that
  constraints are handled.

--
Bill

On Mon, Mar 12, 2001 at 12:23:41PM -0600, Matthew wrote:
>     [snip]
>
> > This needs to be improved (if we can't get rid of the lookup completely,
> > maybe use a hash table instead of sequential scan?) but it's much too
> > late to consider fixing it for 7.1 :-(.
> >
> > Actually, though, it might be even stupider than that: it looks like
> > the queue should only be searched if the tuple being deleted was
> > inserted/modified earlier in the same transaction.  Assuming that that
> > doesn't apply to Bill's case, the only thing I can see that could be
> > causing O(N^2) behavior is the lappend() in deferredTriggerAddEvent.
> > That's simple enough that we *could* fix it for 7.1 ...
> >
>     This would be a welcome improvement.  I have for a long time noticed
> very slow delete performance when deleting a large number of records in one
> command.  I can give more detail if so desired.

--
     _____
    / ___/___       | Bill Huff / bhuff@colltech.com
   / /__  __/       | Voice: (512) 263-0770 x 262
  / /__/ /          | Fax:   (512) 263-8921
  \___/ /ollective  | Pager: 1-800-946-4646 # 1406217
      \/echnologies |------[ http://www.colltech.com ] ------

Re: Poor Delete performance

From
Tom Lane
Date:
Matthew <matt@ctlno.com> writes:
>     This would be a welcome improvement.  I have for a long time noticed
> very slow delete performance when deleting a large number of records in one
> command.  I can give more detail if so desired.

Is this in a table that contains foreign key references?  (Or, more
generally, has any AFTER UPDATE or AFTER DELETE triggers?)

            regards, tom lane

Re: Poor Delete performance

From
Tom Lane
Date:
I've applied a patch for this problem to current sources.  The
improvement was more than I expected --- a test case involving
deleting 80000 tuples from a foreign-key-referencing table dropped
from ~15min to ~8sec.  Insertion of a large number of tuples in one
transaction sped up quite a bit too.

If you are in a hurry for a fix, you could probably adapt the diff
to 7.0.3 sources:
http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/commands/trigger.c.diff?r1=1.86&r2=1.87&f=c

            regards, tom lane