Thread: Benchmark shows very slow bulk delete

Benchmark shows very slow bulk delete

From

Thom Brown

Date:

27 January 2010, 09:28:43

Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html

Is this normal?

Thom

Re: Benchmark shows very slow bulk delete

From

Ivan Voras

Date:

27 January 2010, 10:24:28

On 01/27/10 14:28, Thom Brown wrote:
> Had a quick look at a benchmark someone put together of MySQL vs
> PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk
> delete was very slow:
> http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html

I wish that, when people got the idea to run a simplistic benchmark like
this, they would at least have the common sense to put the database on a
RAM drive to avoid problems with different cylinder speeds of rotational
media and fragmentation from multiple runs.

Here are some typical results from a desktop SATA drive:

ada0
    512             # sectorsize
    500107862016    # mediasize in bytes (466G)
    976773168       # mediasize in sectors
    969021          # Cylinders according to firmware.
    16              # Heads according to firmware.
    63              # Sectors according to firmware.
    6QG3Z026        # Disk ident.

Seek times:
    Full stroke:      250 iter in   5.676993 sec =   22.708 msec
    Half stroke:      250 iter in   4.284583 sec =   17.138 msec
    Quarter stroke:      500 iter in   6.805539 sec =   13.611 msec
    Short forward:      400 iter in   2.678447 sec =    6.696 msec
    Short backward:      400 iter in   2.318637 sec =    5.797 msec
    Seq outer:     2048 iter in   0.214292 sec =    0.105 msec
    Seq inner:     2048 iter in   0.203929 sec =    0.100 msec
Transfer rates:
    outside:       102400 kbytes in   1.229694 sec =    83273 kbytes/sec
    middle:        102400 kbytes in   1.446570 sec =    70788 kbytes/sec
    inside:        102400 kbytes in   2.446670 sec =    41853 kbytes/sec

This doesn't explain the 4-orders-of-magnitude difference between MySQL
and PostgreSQL in bulk_delete() (0.02 vs 577) but it does suggest that
some other results where the performance is close, might be bogus.

It's tough to benchmark anything involving rotational drives :)

Re: Benchmark shows very slow bulk delete

From

Matthew Wakeling

Date:

27 January 2010, 10:49:20

On Wed, 27 Jan 2010, Thom Brown wrote:
> Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL,
> and while PostgreSQL is generally faster, I noticed the bulk delete was very
> slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html
>
> Is this normal?

On the contrary, TRUNCATE TABLE is really rather fast.

Seriously, the Postgres developers, when designing the system, decided on
a database layout that was optimised for the most common cases. Bulk
deletion of data is not really that common an operation, unless you are
deleting whole categories of data, where setting up partitioning and
deleting whole partitions would be sensible.

Other complications are that the server has to maintain concurrent
integrity - that is, another transaction must be able to see either none
of the changes or all of them. As a consequence of this, Postgres needs to
do a sequential scan through the table and mark the rows for deletion in
the transaction, before flipping the transaction committed status and
cleaning up afterwards.

I'd be interested in how mysql manages to delete a whole load of rows in
0.02 seconds. How many rows is that?

(Reading in the comments, I saw this: "The slow times for Postgresql Bulk
Modify/Bulk Delete can be explained by foreign key references to the
updates table." I'm not sure that fully explains it though, unless there
are basically zero rows being deleted - it's hardly bulk then, is it?)

Matthew

--
 People who love sausages, respect the law, and work with IT standards
 shouldn't watch any of them being made.  -- Peter Gutmann

Re: Benchmark shows very slow bulk delete

From

"Kevin Grittner"

Date:

27 January 2010, 10:54:27

Thom Brown <thombrown@gmail.com> wrote:

> Had a quick look at a benchmark someone put together of MySQL vs
> PostgreSQL, and while PostgreSQL is generally faster, I noticed
> the bulk delete was very slow:
> http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html
>
> Is this normal?

It is if you don't have an index on the table which has a foreign
key defined which references the table in which you're doing
deletes.  The author of the benchmark apparently didn't realize that
MySQL automatically adds such an index to the dependent table, while
PostgreSQL leaves it to you to decide whether to add such an index.
For "insert-only" tables, it isn't always worth the cost of
maintaining it.

Also, I see that the database was small enough to be fully cached,
yet the costs weren't adjusted to the recommended values for such an
environment, so PostgreSQL should *really* have beaten MySQL by more
than it did.

-Kevin

Re: Benchmark shows very slow bulk delete

From

Nikolas Everett

Date:

27 January 2010, 11:38:05

On Wed, Jan 27, 2010 at 9:54 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

It is if you don't have an index on the table which has a foreign
key defined which references the table in which you're doing
deletes. The author of the benchmark apparently didn't realize that
MySQL automatically adds such an index to the dependent table, while
PostgreSQL leaves it to you to decide whether to add such an index.
For "insert-only" tables, it isn't always worth the cost of
maintaining it.

It really gets to me that I have to not use some foreign keys in MySQL because I can't afford to maintain the index. I have to write super fun "check constraints" that look like

DELIMITER \\

CREATE TRIGGER Location_Pre_Delete BEFORE DELETE ON Locations FOR EACH ROW

BEGIN

DECLARE _id INT;

SELECT id INTO _id FROM BigHistoryTable WHERE locationId = OLD.id LIMIT 1;

IF _id IS NOT NULL THEN

INSERT INTO BigHistoryTable (column_that_does_not_exist_but_says_that_you_violated_my_hacked_foreign_key) VALUES ('fail');

END IF;

END\\

Sometimes I can't sleep at night for having written that code.

Re: Benchmark shows very slow bulk delete

From

Andres Freund

Date:

27 January 2010, 11:56:56

On Wednesday 27 January 2010 15:49:06 Matthew Wakeling wrote:
> On Wed, 27 Jan 2010, Thom Brown wrote:
> > Had a quick look at a benchmark someone put together of MySQL vs
> > PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk
> > delete was very slow:
> > http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html
> >
> > Is this normal?
>
> On the contrary, TRUNCATE TABLE is really rather fast.
>
> Seriously, the Postgres developers, when designing the system, decided on
> a database layout that was optimised for the most common cases. Bulk
> deletion of data is not really that common an operation, unless you are
> deleting whole categories of data, where setting up partitioning and
> deleting whole partitions would be sensible.
>
> Other complications are that the server has to maintain concurrent
> integrity - that is, another transaction must be able to see either none
> of the changes or all of them. As a consequence of this, Postgres needs to
> do a sequential scan through the table and mark the rows for deletion in
> the transaction, before flipping the transaction committed status and
> cleaning up afterwards.
>
> I'd be interested in how mysql manages to delete a whole load of rows in
> 0.02 seconds. How many rows is that?
Afair mysql detects that case and converts it into some truncate equivalent.

Andres

Re: Benchmark shows very slow bulk delete

From

James Mansion

Date:

27 January 2010, 17:26:56

Ivan Voras wrote:
> I wish that, when people got the idea to run a simplistic benchmark
> like this, they would at least have the common sense to put the
> database on a RAM drive to avoid problems with different cylinder
> speeds of rotational media and fragmentation from multiple runs.
Huh?
> It's tough to benchmark anything involving rotational drives :)
But - how the database organises its IO to maximise the available
bandwidth, limit
avaiodable seeks, and limit avoidable flushes is absolutely key to
realistic performance,
especially on modest everyday hardware. Not everyone has a usage that
justifies
'enterprise' kit - but plenty of people can benefit from something a
step up from
SQLite.

If you just want to benchmark query processor efficiency then that's one
scenario
where taking physical IO out of the picture might be justified, but I
don't see a good
reason to suggest that it is 'common sense' to do so for all testing,
and while the
hardware involved is pretty low end, its still a valid data point.
.

Re: Benchmark shows very slow bulk delete

From

Greg Smith

Date:

27 January 2010, 17:41:16

Kevin Grittner wrote:
> It is if you don't have an index on the table which has a foreign
> key defined which references the table in which you're doing
> deletes.  The author of the benchmark apparently didn't realize that
> MySQL automatically adds such an index to the dependent table, while
> PostgreSQL leaves it to you to decide whether to add such an index.
>

The author there didn't write the PostgreSQL schema; he's just using the
osdb test kit:  http://osdb.sourceforge.net/

Given that both Peter and Neil Conway have thrown work their way, I know
there's been some PG specific work done on that project by people who
know what's going on, but I'm not sure if that included a performance
check.  A quick glance at
http://osdb.cvs.sourceforge.net/viewvc/osdb/osdb/src/callable-sql/postgres-ui/osdb-pg-ui.m4?revision=1.4&view=markup
finds this:

  222 createIndexForeign(char* tName, char* keyName, char* keyCol,
  223                         char* fTable, char* fFields) {
  224     snprintf(cmd, CMDBUFLEN,
  225         "alter table %s add constraint %s foreign key (%s) references %s (%s)",
  226         tName, keyName, keyCol, fTable, fFields);

But I don't see any obvious spot where the matching index that should go
along with that is created at.  The code is just convoluted enough (due
to how they abstract away support for multiple databases) that I'm not
sure yet--maybe they call their createIndexBtree function and fix this
in a later step.  But the way the function is
named--"createIndexForeign"--seems to suggest they believe that this
operation will create the index, too, which as you point out is just not
true.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com

Re: Benchmark shows very slow bulk delete

From

Ivan Voras

Date:

28 January 2010, 06:53:04

James Mansion wrote:
> Ivan Voras wrote:
>> I wish that, when people got the idea to run a simplistic benchmark
>> like this, they would at least have the common sense to put the
>> database on a RAM drive to avoid problems with different cylinder
>> speeds of rotational media and fragmentation from multiple runs.
> Huh?
>> It's tough to benchmark anything involving rotational drives :)
> But - how the database organises its IO to maximise the available
> bandwidth, limit
> avaiodable seeks, and limit avoidable flushes is absolutely key to
> realistic performance,
> especially on modest everyday hardware. Not everyone has a usage that
> justifies
> 'enterprise' kit - but plenty of people can benefit from something a
> step up from
> SQLite.
>
> If you just want to benchmark query processor efficiency then that's one
> scenario
> where taking physical IO out of the picture might be justified, but I
> don't see a good
> reason to suggest that it is 'common sense' to do so for all testing,
> and while the
> hardware involved is pretty low end, its still a valid data point.
> .

You are right, of course, for common benchmarking to see what
performance can be expected from some setup in some circumstances, but
not where the intention is to compare products.

You don't have to go the memory drive / SSD route - just make sure the
databases always use the same (small) area of the disk drive.

Attachment

signature.asc