Thread: Poor Delete performance
I am having some performance issues with deletion. It appears that in deleting records from a table with a significant number or rows ( in this case 1.3 million ) it takes about 1 hour per 100K rows deleted if deleting more then 400K at a time. This sounds way to slow to me. If I break it up into smaller deletes then I can usually get about 5K rows per minute, but even that seems awfully slow to me. The machine is a pretty beefy box ( PIII 800Mhz, 256M, SCSI III ) and is basically doing nothing else. When performing the delete the system goes to 100% utilization ( >95% is postmaster ) and stays that way until the delete finishes. My guess is that the overhead of MVCC tracking all of the rows deleted so that a rollback can be performed is killing me here, but I can see no way to get around it. I have searched the on-line archives for all the postgres lists and checked the FAQ and I can't find anything that gives any insight into increasing delete performance. Any information whatsoever would be greatly appreciated. -- Bill -- _____ / ___/___ | Bill Huff / bhuff@colltech.com / /__ __/ | Voice: (512) 263-0770 x 262 / /__/ / | Fax: (512) 263-8921 \___/ /ollective | Pager: 1-800-946-4646 # 1406217 \/echnologies |------[ http://www.colltech.com ] ------
Bill Huff <bhuff@colltech.com> writes: > It appears that in deleting records from a table with a significant > number or rows ( in this case 1.3 million ) it takes about 1 hour per > 100K rows deleted if deleting more then 400K at a time. This sounds > way to slow to me. Me too. What PG version is this? What's the query exactly, and what does EXPLAIN show for it? What is the full table definition ("pg_dump -s -t tablename dbname" is the most reliable way to present the table)? > I have searched the on-line archives for all the postgres lists and > checked the FAQ and I can't find anything that gives any insight into > increasing delete performance. DELETE (and UPDATE) are pretty much like SELECT as far as performance considerations go. Selecting the tuples to delete/update is the primary performance issue. So the docs about performance generally just talk about SELECT to keep things simple. regards, tom lane
There is a foreign key constraint, but that is accomplished by an INSERT/UPDATE trigger. I don't see why that should cause any problems. -- Bill On Mon, Mar 12, 2001 at 08:27:51AM -0800, Stephan Szabo wrote: > > As an outside chance, does the table in question have any constraints > defined on it? > > On Mon, 12 Mar 2001, Bill Huff wrote: > > > I am having some performance issues with deletion. > > > > It appears that in deleting records from a table with a significant > > number or rows ( in this case 1.3 million ) it takes about 1 hour per > > 100K rows deleted if deleting more then 400K at a time. This sounds > > way to slow to me. If I break it up into smaller deletes then I can > > usually get about 5K rows per minute, but even that seems awfully slow > > to me. > > > > The machine is a pretty beefy box ( PIII 800Mhz, 256M, SCSI III ) > > and is basically doing nothing else. When performing the delete the > > system goes to 100% utilization ( >95% is postmaster ) and stays that > > way until the delete finishes. My guess is that the overhead of MVCC > > tracking all of the rows deleted so that a rollback can be performed > > is killing me here, but I can see no way to get around it. > > > > I have searched the on-line archives for all the postgres lists and > > checked the FAQ and I can't find anything that gives any insight into > > increasing delete performance. > > > > Any information whatsoever would be greatly appreciated. -- _____ / ___/___ | Bill Huff / bhuff@colltech.com / /__ __/ | Voice: (512) 263-0770 x 262 / /__/ / | Fax: (512) 263-8921 \___/ /ollective | Pager: 1-800-946-4646 # 1406217 \/echnologies |------[ http://www.colltech.com ] ------
As an outside chance, does the table in question have any constraints defined on it? On Mon, 12 Mar 2001, Bill Huff wrote: > I am having some performance issues with deletion. > > It appears that in deleting records from a table with a significant > number or rows ( in this case 1.3 million ) it takes about 1 hour per > 100K rows deleted if deleting more then 400K at a time. This sounds > way to slow to me. If I break it up into smaller deletes then I can > usually get about 5K rows per minute, but even that seems awfully slow > to me. > > The machine is a pretty beefy box ( PIII 800Mhz, 256M, SCSI III ) > and is basically doing nothing else. When performing the delete the > system goes to 100% utilization ( >95% is postmaster ) and stays that > way until the delete finishes. My guess is that the overhead of MVCC > tracking all of the rows deleted so that a rollback can be performed > is killing me here, but I can see no way to get around it. > > I have searched the on-line archives for all the postgres lists and > checked the FAQ and I can't find anything that gives any insight into > increasing delete performance. > > Any information whatsoever would be greatly appreciated.
Bill Huff <bhuff@colltech.com> writes: > There is a foreign key constraint, but that is accomplished by an > INSERT/UPDATE trigger. I don't see why that should cause any problems. Ah, I believe I see the problem: it's the inefficient implementation of AFTER EVENT triggers. The existence of either an AFTER UPDATE or AFTER DELETE trigger causes trigger.c to add an event to its event queue for every operation on the affected table. And every addition of an event searches that queue --- with a linear scan. Thus operations on such a table incur O(N^2) search cost if N tuples are affected in one operation. This needs to be improved (if we can't get rid of the lookup completely, maybe use a hash table instead of sequential scan?) but it's much too late to consider fixing it for 7.1 :-(. Actually, though, it might be even stupider than that: it looks like the queue should only be searched if the tuple being deleted was inserted/modified earlier in the same transaction. Assuming that that doesn't apply to Bill's case, the only thing I can see that could be causing O(N^2) behavior is the lappend() in deferredTriggerAddEvent. That's simple enough that we *could* fix it for 7.1 ... regards, tom lane
[snip] > This needs to be improved (if we can't get rid of the lookup completely, > maybe use a hash table instead of sequential scan?) but it's much too > late to consider fixing it for 7.1 :-(. > > Actually, though, it might be even stupider than that: it looks like > the queue should only be searched if the tuple being deleted was > inserted/modified earlier in the same transaction. Assuming that that > doesn't apply to Bill's case, the only thing I can see that could be > causing O(N^2) behavior is the lappend() in deferredTriggerAddEvent. > That's simple enough that we *could* fix it for 7.1 ... > This would be a welcome improvement. I have for a long time noticed very slow delete performance when deleting a large number of records in one command. I can give more detail if so desired.
Just for everyones information. I removed the foreign key constraint that was on the table and was able to delete about 190K records in just over 5 seconds. That is much more like it. B^) This does appear to be an interesting *feature* with the way that constraints are handled. -- Bill On Mon, Mar 12, 2001 at 12:23:41PM -0600, Matthew wrote: > [snip] > > > This needs to be improved (if we can't get rid of the lookup completely, > > maybe use a hash table instead of sequential scan?) but it's much too > > late to consider fixing it for 7.1 :-(. > > > > Actually, though, it might be even stupider than that: it looks like > > the queue should only be searched if the tuple being deleted was > > inserted/modified earlier in the same transaction. Assuming that that > > doesn't apply to Bill's case, the only thing I can see that could be > > causing O(N^2) behavior is the lappend() in deferredTriggerAddEvent. > > That's simple enough that we *could* fix it for 7.1 ... > > > This would be a welcome improvement. I have for a long time noticed > very slow delete performance when deleting a large number of records in one > command. I can give more detail if so desired. -- _____ / ___/___ | Bill Huff / bhuff@colltech.com / /__ __/ | Voice: (512) 263-0770 x 262 / /__/ / | Fax: (512) 263-8921 \___/ /ollective | Pager: 1-800-946-4646 # 1406217 \/echnologies |------[ http://www.colltech.com ] ------
Matthew <matt@ctlno.com> writes: > This would be a welcome improvement. I have for a long time noticed > very slow delete performance when deleting a large number of records in one > command. I can give more detail if so desired. Is this in a table that contains foreign key references? (Or, more generally, has any AFTER UPDATE or AFTER DELETE triggers?) regards, tom lane
I've applied a patch for this problem to current sources. The improvement was more than I expected --- a test case involving deleting 80000 tuples from a foreign-key-referencing table dropped from ~15min to ~8sec. Insertion of a large number of tuples in one transaction sped up quite a bit too. If you are in a hurry for a fix, you could probably adapt the diff to 7.0.3 sources: http://www.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/commands/trigger.c.diff?r1=1.86&r2=1.87&f=c regards, tom lane