Thread: Probable faq: need some benchmarks of pgsql vr.s mysql

Probable faq: need some benchmarks of pgsql vr.s mysql

From
Brian Hurt
Date:
My company is currently hitting a problem with mysql/innodb having really
slow insert performance (we're seeing ~1K rows/second).  My boss wants to
go and spend a bunch of money on the Tokutek backend.  I'd rather we save
the money and go to postgres instead.  We're not heavily invested in mysql
at this point (fixing our queries to switch from mysql to postgres would
take about five minutes).  But my boss wants to see some benchmarks.

I've googled around for a while, but all the benchmarks I've found commit
one or more "fatal flaws", which render the benchmark pointless at best:

1) Comparing Postgres to MyISAM.  Transactions are not an option for us,
so it doesn't matter if MyISAM is a hundred times faster.  I want to
compare Postgres to InnoDB (bonus points for Postgres vr.s Tokutek).

2) Using the default configurations.  Be serious- is there any one who
cares the least about performance who uses the default configuration?

3) Using old versions of Postgres.  I'd like the survey to at least use
the 8.x series, bonus points for it being 9.x.

4) Not using COPY for inserts.  We would, of course, be using the copy
command for inserts.

Here's the thing.  I have personally seen postgres 8.1 insert 30K
rows/second, in to a real table, on crappy hardware (single slow IDE
drive, old crappy hardware).  I would be shocked if I can't improve on the
InnoDB numbers by at least an order of magnitude.  I'm whipping together a
personal benchmark to show this.  But I need a "professional looking"
benchmark, with pretty charts and graphs and etc., to back me up.

Help?

Brian


Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Stefan Kaltenbrunner
Date:
On 10/29/2010 02:58 PM, Brian Hurt wrote:
>
> My company is currently hitting a problem with mysql/innodb having
> really slow insert performance (we're seeing ~1K rows/second). My boss
> wants to go and spend a bunch of money on the Tokutek backend. I'd
> rather we save the money and go to postgres instead. We're not heavily
> invested in mysql at this point (fixing our queries to switch from mysql
> to postgres would take about five minutes). But my boss wants to see
> some benchmarks.

hmm 1k/s sounds very slow for MySQL/innodb if you are batching your
inserts (either multi-value inserts or larger transactions) - is there
anything special to that data(very wide, enourmous number of indexes etc)?


>
> I've googled around for a while, but all the benchmarks I've found
> commit one or more "fatal flaws", which render the benchmark pointless
> at best:
>
> 1) Comparing Postgres to MyISAM. Transactions are not an option for us,
> so it doesn't matter if MyISAM is a hundred times faster. I want to
> compare Postgres to InnoDB (bonus points for Postgres vr.s Tokutek).

never seen a comparison against Tokutek so you might have to benchmark
for yourself.

>
> 2) Using the default configurations. Be serious- is there any one who
> cares the least about performance who uses the default configuration?
>
> 3) Using old versions of Postgres. I'd like the survey to at least use
> the 8.x series, bonus points for it being 9.x.
>
> 4) Not using COPY for inserts. We would, of course, be using the copy
> command for inserts.
>
> Here's the thing. I have personally seen postgres 8.1 insert 30K
> rows/second, in to a real table, on crappy hardware (single slow IDE
> drive, old crappy hardware). I would be shocked if I can't improve on
> the InnoDB numbers by at least an order of magnitude. I'm whipping
> together a personal benchmark to show this. But I need a "professional
> looking" benchmark, with pretty charts and graphs and etc., to back me up.

Not sure what you would consider "professional" - but I did some testing
back in the 8.4 days here:
http://www.kaltenbrunner.cc/blog/index.php?/archives/27-Benchmarking-8.4-Chapter-2bulk-loading.html
- not sure if that actually matches your workload(but I guess you could
easily test yourself if it is that simple to convert your app).


>
> Help?

I don't think advocacy is actually the right list to ask maybe you would
get a wider audience on -performance or -general.


Stefan

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Rob Wultsch
Date:
On Fri, Oct 29, 2010 at 5:58 AM, Brian Hurt <bhurt@spnz.org> wrote:
>
> My company is currently hitting a problem with mysql/innodb having really
> slow insert performance (we're seeing ~1K rows/second).  My boss wants to go
> and spend a bunch of money on the Tokutek backend.  I'd rather we save the
> money and go to postgres instead.  We're not heavily invested in mysql at
> this point (fixing our queries to switch from mysql to postgres would take
> about five minutes).  But my boss wants to see some benchmarks.
>
> I've googled around for a while, but all the benchmarks I've found commit
> one or more "fatal flaws", which render the benchmark pointless at best:
>
> 1) Comparing Postgres to MyISAM.  Transactions are not an option for us, so
> it doesn't matter if MyISAM is a hundred times faster.  I want to compare
> Postgres to InnoDB (bonus points for Postgres vr.s Tokutek).
>
> 2) Using the default configurations.  Be serious- is there any one who cares
> the least about performance who uses the default configuration?
>
> 3) Using old versions of Postgres.  I'd like the survey to at least use the
> 8.x series, bonus points for it being 9.x.
>
> 4) Not using COPY for inserts.  We would, of course, be using the copy
> command for inserts.
>
> Here's the thing.  I have personally seen postgres 8.1 insert 30K
> rows/second, in to a real table, on crappy hardware (single slow IDE drive,
> old crappy hardware).  I would be shocked if I can't improve on the InnoDB
> numbers by at least an order of magnitude.  I'm whipping together a personal
> benchmark to show this.  But I need a "professional looking" benchmark, with
> pretty charts and graphs and etc., to back me up.
>
> Help?
>
> Brian


If you post your conf I can give you pointers on InnoDB performance.
Please also post a hardware profile (including ram and IO hardware).
One way or another this is the right first step for a comparison with
PG.

As for tokutek, I think they are probably the wave of the future. I
have suggested that they create a pg product.


--
Rob Wultsch
wultsch@gmail.com

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Josh Berkus
Date:
Brian,

> My company is currently hitting a problem with mysql/innodb having
> really slow insert performance (we're seeing ~1K rows/second).

As others have said, there's clearly something wrong with your setup
and/or application which would likely affect PostgreSQL as well. I can
name several ways in which PostgreSQL is better than MySQL/InnoDB, but
insert performance isn't one of them.

>My boss
> wants to go and spend a bunch of money on the Tokutek backend.

I'm sure that Tokutek would be thrilled to have a customer (they're very
new).  Again, though, insert performance isn't Tokutek's specialty; the
arguments for using them is that (a) they're not owned by oracle, and
(b) the "fractal indexes", which are cool.

  > I've googled around for a while, but all the benchmarks I've found
> commit one or more "fatal flaws", which render the benchmark pointless
> at best:

Doing real benchmarks is an involved, expensive process, and both DBMSes
are fast-moving targets which need to be benchmarked every year.  I've
talked with Percona about doing some MySQL vs. PostgreSQL head-to-heads,
but in the absence of specific funding that's unlikely to be completed.
  Besides, we both really want to do head-to-heads with MSSQL & Oracle,
not each other.

> Here's the thing. I have personally seen postgres 8.1 insert 30K
> rows/second, in to a real table, on crappy hardware (single slow IDE
> drive, old crappy hardware). I would be shocked if I can't improve on
> the InnoDB numbers by at least an order of magnitude. I'm whipping
> together a personal benchmark to show this. But I need a "professional
> looking" benchmark, with pretty charts and graphs and etc., to back me up.

Don't have anything like that, I'm afraid.  Last real benchmark was
SpecJAppserver from 2007 (which at the time was 90% of comparable Oracle
performance), which has long been superceded.

However, given that what you want to know about is insert performance,
it seems like it would be easy enough to mock up your own comparison.

--
                                   -- Josh Berkus
                                      PostgreSQL Experts Inc.
                                      http://www.pgexperts.com

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Josh Berkus
Date:
Brian,

Rob pointed me to a benchmark of MySQL vs. Tokutek which is interesting
to your case:

http://www.mysqlperformanceblog.com/2009/04/28/detailed-review-of-tokutek-storage-engine/

It's also interesting to me because apparently InnoDB *does* have an
issue with large numbers of inserts to an already-large table, which we
don't have (I don't think we do, anyway).

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
"Joshua D. Drake"
Date:
On Fri, 2010-10-29 at 13:53 -0700, Josh Berkus wrote:
> Brian,
>
> Rob pointed me to a benchmark of MySQL vs. Tokutek which is interesting
> to your case:
>
> http://www.mysqlperformanceblog.com/2009/04/28/detailed-review-of-tokutek-storage-engine/
>
> It's also interesting to me because apparently InnoDB *does* have an
> issue with large numbers of inserts to an already-large table, which we
> don't have (I don't think we do, anyway).

I thought InnoDB still had P/K insert/update issues. I could be cracked
(would have to dig up the article)

JD

>
> --
>                                   -- Josh Berkus
>                                      PostgreSQL Experts Inc.
>                                      http://www.pgexperts.com
>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Brian Hurt
Date:
For the record, the table we're having trouble inserting into is ~100 rows
with ~50 indexes on it.  E.F Codd is spinning in his grave.  The reason
they went with this design (instead of one that has two tables, each with
3-6 columns, and about that many indexes) is that "joins are slow".
Which they may be on Mysql, I don't know.  But this is (unfortunately) a
different battle.

Brian

On Fri, 29 Oct 2010, Josh Berkus wrote:

> Brian,
>
> Rob pointed me to a benchmark of MySQL vs. Tokutek which is interesting
> to your case:
>
> http://www.mysqlperformanceblog.com/2009/04/28/detailed-review-of-tokutek-storage-engine/
>
> It's also interesting to me because apparently InnoDB *does* have an
> issue with large numbers of inserts to an already-large table, which we
> don't have (I don't think we do, anyway).
>
> --
>                                  -- Josh Berkus
>                                     PostgreSQL Experts Inc.
>                                     http://www.pgexperts.com
>
> --
> Sent via pgsql-advocacy mailing list (pgsql-advocacy@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-advocacy
>

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Stefan Kaltenbrunner
Date:
On 10/29/2010 11:37 PM, Brian Hurt wrote:
>
> For the record, the table we're having trouble inserting into is ~100
> rows with ~50 indexes on it. E.F Codd is spinning in his grave. The
> reason they went with this design (instead of one that has two tables,
> each with 3-6 columns, and about that many indexes) is that "joins are
> slow". Which they may be on Mysql, I don't know. But this is
> (unfortunately) a different battle.

is that really only 100 rows or are you actually talking about columns?
if the later you will have a very hard time getting reasonable bulk/mass
loading performance in most databases (and also pg) - a table that wide
and with a that ridiculous number of indexes is just bound to be slow.
Now I actually think that the figures you are getting from innodb are
fairly reasonable...


Stefan

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
"Joshua D. Drake"
Date:
On Sat, 30 Oct 2010 15:57:39 +0200, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
> On 10/29/2010 11:37 PM, Brian Hurt wrote:
>>
>> For the record, the table we're having trouble inserting into is ~100
>> rows with ~50 indexes on it. E.F Codd is spinning in his grave. The
>> reason they went with this design (instead of one that has two tables,
>> each with 3-6 columns, and about that many indexes) is that "joins are
>> slow". Which they may be on Mysql, I don't know. But this is
>> (unfortunately) a different battle.

As opposed to the slowness you are having now?

/me smacks your developers for you

JD



--
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Josh Berkus
Date:
On 10/29/10 2:37 PM, Brian Hurt wrote:
>
> For the record, the table we're having trouble inserting into is ~100
> rows with ~50 indexes on it.  E.F Codd is spinning in his grave.  The
> reason they went with this design (instead of one that has two tables,
> each with 3-6 columns, and about that many indexes) is that "joins are
> slow". Which they may be on Mysql, I don't know.  But this is
> (unfortunately) a different battle.

Not sure that that'll be any better on Postgres then.   Few DBMSes
optimize for "stupid design".   Maybe they should go with a "NoSQL"
database.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Robert Treat
Date:


On Fri, Oct 29, 2010 at 1:25 PM, Josh Berkus <josh@agliodbs.com> wrote:
Brian,


My company is currently hitting a problem with mysql/innodb having
really slow insert performance (we're seeing ~1K rows/second).

As others have said, there's clearly something wrong with your setup and/or application which would likely affect PostgreSQL as well. I can name several ways in which PostgreSQL is better than MySQL/InnoDB, but insert performance isn't one of them.


My boss
wants to go and spend a bunch of money on the Tokutek backend.

I'm sure that Tokutek would be thrilled to have a customer (they're very new).  Again, though, insert performance isn't Tokutek's specialty; the arguments for using them is that (a) they're not owned by oracle, 
<snip>

That seems like a really weak argument, unless Tokutek has come out with a stand-alone product I'm not aware of. At best you'll still need to deal with Oracle to get the MySQL bits, at worse Tokutek's technology is as good as it sounds and they are not an acquisition target for Oracle. (Assuming you don't want to deal with Oracle, which you're reasoning above implies. Many people won't care about that though).
 

 > I've googled around for a while, but all the benchmarks I've found
commit one or more "fatal flaws", which render the benchmark pointless
at best:

Doing real benchmarks is an involved, expensive process, and both DBMSes are fast-moving targets which need to be benchmarked every year.  I've talked with Percona about doing some MySQL vs. PostgreSQL head-to-heads, but in the absence of specific funding that's unlikely to be completed.  Besides, we both really want to do head-to-heads with MSSQL & Oracle, not each other.


 Actually in this case, "real" benchmarks should be pretty easy. You already have a running application that you have numbers on. I'd say swap in a copy of xtradb, a copy of tokutek's tech, and a copy of Postgres (after all, it's only "5 minutes" to change the queries around), and benchmark all three. Realistically it would probably take a dedicated week to do it, but you're probably making a 3 year decision, so this bit of time up front seems like a good way to go. (And don't forget to publish your results afterwards). 

Robert Treat

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Brian Hurt
Date:

On Sat, 30 Oct 2010, Stefan Kaltenbrunner wrote:

> On 10/29/2010 11:37 PM, Brian Hurt wrote:
>>
>> For the record, the table we're having trouble inserting into is ~100
>> rows with ~50 indexes on it. E.F Codd is spinning in his grave. The
>> reason they went with this design (instead of one that has two tables,
>> each with 3-6 columns, and about that many indexes) is that "joins are
>> slow". Which they may be on Mysql, I don't know. But this is
>> (unfortunately) a different battle.
>
> is that really only 100 rows or are you actually talking about columns?

Bleh, I meant columns.

100 rows is nothing.

> if
> the later you will have a very hard time getting reasonable bulk/mass loading
> performance in most databases (and also pg) - a table that wide and with a
> that ridiculous number of indexes is just bound to be slow. Now I actually
> think that the figures you are getting from innodb are fairly reasonable...
>
>
> Stefan
>

Brian


Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
MARK CALLAGHAN
Date:
The "insert buffer" in InnoDB accelerates this workload. It buffers
changes in a special b-tree to avoid disk IO during secondary index
maintenance. For my workloads the special b-tree is able to capture
multiple changes to blocks and is likely reduce the IO requirements
for the application. Even without that benefit it allows the server to
absorb workload spikes as the disk reads for secondary index
maintenance are deferred.

This is done for inserts in MySQL 5.1 and for inserts, updates and
deletes in MySQL 5.5. This won't allow InnoDB to match TokuDB in
performance, but it should provide much better throughput than you
would expect from an engine that does update in place.

http://www.google.com/search?hl=en&q=insert+buffer+innodb

On Sat, Oct 30, 2010 at 4:44 PM, Brian Hurt <bhurt@spnz.org> wrote:
>
>
> On Sat, 30 Oct 2010, Stefan Kaltenbrunner wrote:
>
>> On 10/29/2010 11:37 PM, Brian Hurt wrote:
>>>
>>> For the record, the table we're having trouble inserting into is ~100
>>> rows with ~50 indexes on it. E.F Codd is spinning in his grave. The
>>> reason they went with this design (instead of one that has two tables,
>>> each with 3-6 columns, and about that many indexes) is that "joins are
>>> slow". Which they may be on Mysql, I don't know. But this is
>>> (unfortunately) a different battle.
>>
>> is that really only 100 rows or are you actually talking about columns?
>
> Bleh, I meant columns.
>
> 100 rows is nothing.
>
>> if the later you will have a very hard time getting reasonable bulk/mass
>> loading performance in most databases (and also pg) - a table that wide and
>> with a that ridiculous number of indexes is just bound to be slow. Now I
>> actually think that the figures you are getting from innodb are fairly
>> reasonable...
>>
>>
>> Stefan
>>
>
> Brian
>
>
> --
> Sent via pgsql-advocacy mailing list (pgsql-advocacy@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-advocacy
>



--
Mark Callaghan
mdcallag@gmail.com

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Martin Farach-Colton
Date:
I'd like to make one clarification. Insertion performance *is* Tokutek's specialty.  Customers have found that TokuDB outperforms InnoDB by up to 80x on insertion performance (http://tokutek.com/customers/a-social-networking-case-study/). They use this insertion performance to keep a rich set of indexes, which in turn speeds up queries.

As for insertion rates into InnoDB vs Postgresql and the reasonableness of 1k/s insertions, both InnoDB and postgresql use B-trees for indexing. (PG also has some other data structures for indexing, but they are used in special cases). On insertion, a B-tree puts a row into a leaf *which must be in memory*.  Disk seeks to retrieve leafs make B-trees slow for many types of data.  

When are B-trees fast?  When data fits in memory or when the key is sequential/pre-sorted (avoids need to retrieve leafs).  Out of the sweet spot, you need to move the disk head on every insertion, and 1k/sec insertions is not uncommon.  A report of 30k/sec insertions for a B-tree suggests to me an in-memory database, an auto-increment primary key with no secondary indexes, or some other special case that avoids disk seeks.  The B-tree bottleneck is inherent in the data structure, not a function of InnoDB vs postgresql vs ....

Fractal trees do not perform disk seeks for each insertion, and they are therefore very good at insertions.

Martin Farach-Colton
Tokutek, Inc

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
MARK CALLAGHAN
Date:
On Tue, Nov 2, 2010 at 7:17 AM, Martin Farach-Colton <martin@tokutek.com> wrote:

> The B-tree bottleneck is inherent in the data structure, not a function of
> InnoDB vs postgresql vs ....
> Fractal trees do not perform disk seeks for each insertion, and they are
> therefore very good at insertions.
> Martin Farach-Colton
> Tokutek, Inc

It isn't that simple as I described earlier in this thread. InnoDB has
the insert buffer:
http://www.google.com/search?hl=en&q=insert+buffer+innodb

InnoDB can do a disk seek for each PK/unique index on a table during
an insert. It does not do disk seeks for each secondary index.

--
Mark Callaghan
mdcallag@gmail.com

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Josh Berkus
Date:
> It isn't that simple as I described earlier in this thread. InnoDB has
> the insert buffer:
> http://www.google.com/search?hl=en&q=insert+buffer+innodb
>
> InnoDB can do a disk seek for each PK/unique index on a table during
> an insert. It does not do disk seeks for each secondary index.

How does InnoDB make sure that sessions see the buffered inserts if they
should transactionally be visible?

Postgres doesn't have a concept of "primary" vs. "secondary" indexes,
since unlike InnoDB, table rows are not btree-ordered on disk.  This
means that, for example, it takes a much larger table before we hit the
"won't fit in memory" limit where the indexes become expensive to
update.  It also makes insert buffering much less valuable, since when a
table gets busy we can just start tacking stuff onto the end.

Not that I wouldn't love to have someone working on "fractal indexes" in
Postgres.  Maybe I need to find a Postgres-based business model for
Tokutek.  ;-)

--
                                   -- Josh Berkus
                                      PostgreSQL Experts Inc.
                                      http://www.pgexperts.com

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Martin Farach-Colton
Date:
Good point Mark.  

There's an inherent performance cliff associated with B-trees when you drop out of memory.  The InnoDB insertion buffer is a way to soften that cliff, but it doesn't eliminate the underlying cause.  An insertion buffer will help -- sometimes a lot -- because you can bunch up your insertions and hit the same B-tree leaf with more than one insertions.  When won't it help?  

- When you have a lot of secondary indexes, because you dilute the effect of insertion bunching across the indexes.

- When your insertion pattern looks random, because a random key will have a lot less leaf bunching.

But even without these "buffer killers", you still drift down towards the limiting situation of one-seek-per-insertion as the database grows.  So the insertion buffer moves out the pain point but doesn't address the underlying problem.


On Mon, Nov 8, 2010 at 10:52 AM, MARK CALLAGHAN <mdcallag@gmail.com> wrote:
On Tue, Nov 2, 2010 at 7:17 AM, Martin Farach-Colton <martin@tokutek.com> wrote:

> The B-tree bottleneck is inherent in the data structure, not a function of
> InnoDB vs postgresql vs ....
> Fractal trees do not perform disk seeks for each insertion, and they are
> therefore very good at insertions.
> Martin Farach-Colton
> Tokutek, Inc

It isn't that simple as I described earlier in this thread. InnoDB has
the insert buffer:
http://www.google.com/search?hl=en&q=insert+buffer+innodb

InnoDB can do a disk seek for each PK/unique index on a table during
an insert. It does not do disk seeks for each secondary index.

--
Mark Callaghan
mdcallag@gmail.com

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
MARK CALLAGHAN
Date:
On Mon, Nov 8, 2010 at 10:02 AM, Josh Berkus <josh@agliodbs.com> wrote:
>
>
> How does InnoDB make sure that sessions see the buffered inserts if they
> should transactionally be visible?

There is an IO completion callback that is called when the read into
the buffer pool is finished. AFAIK, the callback will merge the
pending changes from the insert buffer. Oh, and the name of the insert
buffer has been changed to "change buffer" as it can buffer inserts
and deletes in MySQL 5.5. Also note that there are no updates for
secondary indexes. There is only insert and delete marking.

--
Mark Callaghan
mdcallag@gmail.com

Re: Probable faq: need some benchmarks of pgsql vr.s mysql

From
Josh Berkus
Date:
> There is an IO completion callback that is called when the read into
> the buffer pool is finished. AFAIK, the callback will merge the
> pending changes from the insert buffer. Oh, and the name of the insert
> buffer has been changed to "change buffer" as it can buffer inserts
> and deletes in MySQL 5.5. Also note that there are no updates for
> secondary indexes. There is only insert and delete marking.

Seems like it still needs work.  I'm not entirely convinced that an
index buffer would be a net win for PostgreSQL, given the different
structure of our storage.  I'd look at those "fractal indexes" first.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com