Thread: Benchmark shows very slow bulk delete
Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html
Is this normal?
Thom
Is this normal?
Thom
On 01/27/10 14:28, Thom Brown wrote: > Had a quick look at a benchmark someone put together of MySQL vs > PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk > delete was very slow: > http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html I wish that, when people got the idea to run a simplistic benchmark like this, they would at least have the common sense to put the database on a RAM drive to avoid problems with different cylinder speeds of rotational media and fragmentation from multiple runs. Here are some typical results from a desktop SATA drive: ada0 512 # sectorsize 500107862016 # mediasize in bytes (466G) 976773168 # mediasize in sectors 969021 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. 6QG3Z026 # Disk ident. Seek times: Full stroke: 250 iter in 5.676993 sec = 22.708 msec Half stroke: 250 iter in 4.284583 sec = 17.138 msec Quarter stroke: 500 iter in 6.805539 sec = 13.611 msec Short forward: 400 iter in 2.678447 sec = 6.696 msec Short backward: 400 iter in 2.318637 sec = 5.797 msec Seq outer: 2048 iter in 0.214292 sec = 0.105 msec Seq inner: 2048 iter in 0.203929 sec = 0.100 msec Transfer rates: outside: 102400 kbytes in 1.229694 sec = 83273 kbytes/sec middle: 102400 kbytes in 1.446570 sec = 70788 kbytes/sec inside: 102400 kbytes in 2.446670 sec = 41853 kbytes/sec This doesn't explain the 4-orders-of-magnitude difference between MySQL and PostgreSQL in bulk_delete() (0.02 vs 577) but it does suggest that some other results where the performance is close, might be bogus. It's tough to benchmark anything involving rotational drives :)
On Wed, 27 Jan 2010, Thom Brown wrote: > Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, > and while PostgreSQL is generally faster, I noticed the bulk delete was very > slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html > > Is this normal? On the contrary, TRUNCATE TABLE is really rather fast. Seriously, the Postgres developers, when designing the system, decided on a database layout that was optimised for the most common cases. Bulk deletion of data is not really that common an operation, unless you are deleting whole categories of data, where setting up partitioning and deleting whole partitions would be sensible. Other complications are that the server has to maintain concurrent integrity - that is, another transaction must be able to see either none of the changes or all of them. As a consequence of this, Postgres needs to do a sequential scan through the table and mark the rows for deletion in the transaction, before flipping the transaction committed status and cleaning up afterwards. I'd be interested in how mysql manages to delete a whole load of rows in 0.02 seconds. How many rows is that? (Reading in the comments, I saw this: "The slow times for Postgresql Bulk Modify/Bulk Delete can be explained by foreign key references to the updates table." I'm not sure that fully explains it though, unless there are basically zero rows being deleted - it's hardly bulk then, is it?) Matthew -- People who love sausages, respect the law, and work with IT standards shouldn't watch any of them being made. -- Peter Gutmann
Thom Brown <thombrown@gmail.com> wrote: > Had a quick look at a benchmark someone put together of MySQL vs > PostgreSQL, and while PostgreSQL is generally faster, I noticed > the bulk delete was very slow: > http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html > > Is this normal? It is if you don't have an index on the table which has a foreign key defined which references the table in which you're doing deletes. The author of the benchmark apparently didn't realize that MySQL automatically adds such an index to the dependent table, while PostgreSQL leaves it to you to decide whether to add such an index. For "insert-only" tables, it isn't always worth the cost of maintaining it. Also, I see that the database was small enough to be fully cached, yet the costs weren't adjusted to the recommended values for such an environment, so PostgreSQL should *really* have beaten MySQL by more than it did. -Kevin
On Wed, Jan 27, 2010 at 9:54 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
key defined which references the table in which you're doingIt is if you don't have an index on the table which has a foreign
deletes. The author of the benchmark apparently didn't realize that
MySQL automatically adds such an index to the dependent table, while
PostgreSQL leaves it to you to decide whether to add such an index.
For "insert-only" tables, it isn't always worth the cost of
maintaining it.
It really gets to me that I have to not use some foreign keys in MySQL because I can't afford to maintain the index. I have to write super fun "check constraints" that look like
DELIMITER \\
CREATE TRIGGER Location_Pre_Delete BEFORE DELETE ON Locations FOR EACH ROW
BEGIN
DECLARE _id INT;
SELECT id INTO _id FROM BigHistoryTable WHERE locationId = OLD.id LIMIT 1;
IF _id IS NOT NULL THEN
INSERT INTO BigHistoryTable (column_that_does_not_exist_but_says_that_you_violated_my_hacked_foreign_key) VALUES ('fail');
END IF;
END\\
Sometimes I can't sleep at night for having written that code.
On Wednesday 27 January 2010 15:49:06 Matthew Wakeling wrote: > On Wed, 27 Jan 2010, Thom Brown wrote: > > Had a quick look at a benchmark someone put together of MySQL vs > > PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk > > delete was very slow: > > http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html > > > > Is this normal? > > On the contrary, TRUNCATE TABLE is really rather fast. > > Seriously, the Postgres developers, when designing the system, decided on > a database layout that was optimised for the most common cases. Bulk > deletion of data is not really that common an operation, unless you are > deleting whole categories of data, where setting up partitioning and > deleting whole partitions would be sensible. > > Other complications are that the server has to maintain concurrent > integrity - that is, another transaction must be able to see either none > of the changes or all of them. As a consequence of this, Postgres needs to > do a sequential scan through the table and mark the rows for deletion in > the transaction, before flipping the transaction committed status and > cleaning up afterwards. > > I'd be interested in how mysql manages to delete a whole load of rows in > 0.02 seconds. How many rows is that? Afair mysql detects that case and converts it into some truncate equivalent. Andres
Ivan Voras wrote: > I wish that, when people got the idea to run a simplistic benchmark > like this, they would at least have the common sense to put the > database on a RAM drive to avoid problems with different cylinder > speeds of rotational media and fragmentation from multiple runs. Huh? > It's tough to benchmark anything involving rotational drives :) But - how the database organises its IO to maximise the available bandwidth, limit avaiodable seeks, and limit avoidable flushes is absolutely key to realistic performance, especially on modest everyday hardware. Not everyone has a usage that justifies 'enterprise' kit - but plenty of people can benefit from something a step up from SQLite. If you just want to benchmark query processor efficiency then that's one scenario where taking physical IO out of the picture might be justified, but I don't see a good reason to suggest that it is 'common sense' to do so for all testing, and while the hardware involved is pretty low end, its still a valid data point. .
Kevin Grittner wrote: > It is if you don't have an index on the table which has a foreign > key defined which references the table in which you're doing > deletes. The author of the benchmark apparently didn't realize that > MySQL automatically adds such an index to the dependent table, while > PostgreSQL leaves it to you to decide whether to add such an index. > The author there didn't write the PostgreSQL schema; he's just using the osdb test kit: http://osdb.sourceforge.net/ Given that both Peter and Neil Conway have thrown work their way, I know there's been some PG specific work done on that project by people who know what's going on, but I'm not sure if that included a performance check. A quick glance at http://osdb.cvs.sourceforge.net/viewvc/osdb/osdb/src/callable-sql/postgres-ui/osdb-pg-ui.m4?revision=1.4&view=markup finds this: 222 createIndexForeign(char* tName, char* keyName, char* keyCol, 223 char* fTable, char* fFields) { 224 snprintf(cmd, CMDBUFLEN, 225 "alter table %s add constraint %s foreign key (%s) references %s (%s)", 226 tName, keyName, keyCol, fTable, fFields); But I don't see any obvious spot where the matching index that should go along with that is created at. The code is just convoluted enough (due to how they abstract away support for multiple databases) that I'm not sure yet--maybe they call their createIndexBtree function and fix this in a later step. But the way the function is named--"createIndexForeign"--seems to suggest they believe that this operation will create the index, too, which as you point out is just not true. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
James Mansion wrote: > Ivan Voras wrote: >> I wish that, when people got the idea to run a simplistic benchmark >> like this, they would at least have the common sense to put the >> database on a RAM drive to avoid problems with different cylinder >> speeds of rotational media and fragmentation from multiple runs. > Huh? >> It's tough to benchmark anything involving rotational drives :) > But - how the database organises its IO to maximise the available > bandwidth, limit > avaiodable seeks, and limit avoidable flushes is absolutely key to > realistic performance, > especially on modest everyday hardware. Not everyone has a usage that > justifies > 'enterprise' kit - but plenty of people can benefit from something a > step up from > SQLite. > > If you just want to benchmark query processor efficiency then that's one > scenario > where taking physical IO out of the picture might be justified, but I > don't see a good > reason to suggest that it is 'common sense' to do so for all testing, > and while the > hardware involved is pretty low end, its still a valid data point. > . You are right, of course, for common benchmarking to see what performance can be expected from some setup in some circumstances, but not where the intention is to compare products. You don't have to go the memory drive / SSD route - just make sure the databases always use the same (small) area of the disk drive.