Thread: Differentiating different Open Source databases

Differentiating different Open Source databases

From
"Nasby, Jim"
Date:
An opinion I often run across when talking to database people who haven't dealt with Postgres is "open source databases
aren'tvery good". In all cases I've seen, this opinion was formed because they looked at or used a certain open source
databaseand indeed found things that could certainly cause serious issues. These opinions were formed from legitimate
deficiencies,and because they were in an open source database these folks had made the unreasonable assumption that all
OSSdatabases just weren't very good. 

Fortunately, it's never been hard for me to enlighten these folks, first by admitting that "Yes, that is a problem with
thatdatabase" and second by pointing out that not all OSS databases are the same. So I'm wondering if there's some way
thatwe can *proactively* get that message out on a larger scale. 

Some might point out that these folks are just reaching unreasonable conclusions, and I agree. I also would agree that
thesepeople *should* understand that just like commercial databases aren't all the same, neither are all OSS databases.
However,that *is* the conclusion they're reaching, and not just because they're closed-minded or anti-OSS. Based on how
easyit's been for me to enlighten them, I don't think it would be hard to change this unfair bias; it would probably
onlytake a single prominent article to do it. 
--
Jim "Decibel!" Nasby jnasby@EnovaFinancial.com
Primary: 512-579-9024     Backup: 512-569-9461


Re: Differentiating different Open Source databases

From
"Nasby, Jim"
Date:
On May 16, 2011, at 5:09 AM, Dimitri Fontaine wrote:
> "Nasby, Jim" <JNasby@enovafinancial.com> writes:
>> An opinion I often run across when talking to database people who haven't
>> dealt with Postgres is "open source databases aren't very good". In all
>
> Well I'm not sure how closely related/relevant it is, but I find more
> and more people thinking they should take the NoSQL pill because frankly
> you only get so far with Oracle and MySQL.
>
> They should also hear the message that PostgreSQL is quite another
> beast, and its role into your software architecture can be very
> different from those first two.  Because of technical facts and also
> licencing policies, of course.

Do we have any contacts at magazines or other publications? Is there a way we could get some kind of article published?

Also, does anyone know any recent hard-core converts (I suppose we can ask on -general). Perhaps we could do a series
ofinterviews of people that used to think "OSS databases are toys" and have changed their minds... 
--
Jim "Decibel!" Nasby jnasby@EnovaFinancial.com
Primary: 512-579-9024     Backup: 512-569-9461


Re: Differentiating different Open Source databases

From
Dimitri Fontaine
Date:
"Nasby, Jim" <JNasby@enovafinancial.com> writes:
> An opinion I often run across when talking to database people who haven't
> dealt with Postgres is "open source databases aren't very good". In all

Well I'm not sure how closely related/relevant it is, but I find more
and more people thinking they should take the NoSQL pill because frankly
you only get so far with Oracle and MySQL.

They should also hear the message that PostgreSQL is quite another
beast, and its role into your software architecture can be very
different from those first two.  Because of technical facts and also
licencing policies, of course.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Differentiating different Open Source databases

From
Rob Wultsch
Date:
On Mon, May 16, 2011 at 3:09 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
> you only get so far with Oracle and MySQL.

Disclaimer: I do not speak for any of my employers, past, present or future.

"only get so far with... MySQL", so, yeah, about that... You are wrong.

Last I heard, it was the primary persistent data store for the worlds
largest social network. A system that several hundreds of million
people interact with on a daily basis.

Last I heard, it was also the primary persistent data store for the
worlds largest web hosting provider, domain registrar and SSL
registrar.

Last I heard, it was also the primary persistent data store for the
largest ad network (with billions in revenue) on the web. From
personal experience this is not the only place MySQL is used with
financial data.

A question I have asked numerous times in the last year is "Does
anyone run a farm with more than 1,000 Postgres servers?".  The answer
I have received again and again is no. That is not the case with
MySQL. Given the cost of large farms of server, have no doubt that if
PG was a better solution* it would be used. At this point MySQL has
and PG does not have covering indexes, index change buffering, a cheap
optimizer which can be made almost free with hints, on disk
compression, query caching (as a stopgap for Memcache integration),
etc... And for many workloads PG is a better options than MySQL.

And let us not ignore the advantages of on disk checksums. Can anyone
really say that PG cares about your data and MySQL does not while PG
still allows silent corruption? Given the unreliability of SATA drives
this is a real problem.

Whatever your feelings about MySQL, "only get so far" is
intellectually dishonest.

As for NoSQL, that is several families of problems and solutions:
Easy of use - Couch
Caching - Memcache, Redis
Optimizing writes over reads - HBase

The only one where I think PG might beat MySQL is caching with
unlogged tables (a as of yet unreleased feature), and that is assuming
that you don't need to read from them. With a read/write workload I am
not sure who would win in terms of performance, but I doubt that the
difference would be enough to set aside the advantages of
institutional momentum.

Of course, this is all open for debate but is another attempt to say
"PG rocks, MySQL sucks" really worth attempting? It had been tried
more than a few times before and been a losing effort. Are there not
other ways that PG people can spend their time and be more highly
leveraged? Are there not more than enough expensive, difficult to use,
proprietary solutions to pick off and devour before an elephant goes
to sea to try to kill a dolphin? I suggest finding some other db
(particularly a proprietary one) to be your tusk'ing bag, the dolphin
can give as good as it gets. Or better yet, do something really useful
and write tutorials.

*I think that finding people with experience is what is holding PG
back far more than any technical deficiencies. However awesome a piece
of software is, it is worthless if it is impossible to find anyone to
run it.

Best,

Rob "the MySQL guy that really likes PG, except when PG people say
silly things" Wultsch
wultsch@gmail.com

Re: Differentiating different Open Source databases

From
Alastair Turner
Date:
Excerpts from Rob Wultsch On Sat, May 21, 2011 at 10:01 PM:
> ... primary persistent data store for the worlds largest social
>  network
>
> ... primary persistent data store for the worlds largest web
> hosting provider, domain registrar and SSL registrar.
>
> ... primary persistent data store for the largest ad network
>
> ... "Does anyone run a farm with more than 1,000 Postgres
> servers?".
>
> ... if PG was a better solution* it would be used

In some ways you're saying proves Jim's point. A pragmatic definition
of "better" would be "more appropriate" or "a better fit' - a better
fit for the workload or possibly the organisation's existing skills
and along with the skills habits and expectations.

The examples you're quoting above are foreign to decision makers with
a background in commercial RDBMSs like DB/2, MSSQL, etc. Insurance
brokerages with 200 staff members don't care about 1000 server farms -
they want expression indexes, partial indexes, CTEs and a bunch of
other things which they've come to expect from relation databases.

The mistake which these not entirely hypothetical managers (I have met
a few too) are making about in assuming equality between all open
source databases is much as the same as you mistake in claiming that
the features which matter to myfacedoubleclickspacebook are the only
ones that matter.

> *I think that finding people with experience is what is holding PG
> back far more than any technical deficiencies. However awesome a piece
> of software is, it is worthless if it is impossible to find anyone to
> run it.
>
+1

Regards

Bell.

Re: Differentiating different Open Source databases

From
Jim Nasby
Date:
On May 22, 2011, at 1:21 PM, Alastair Turner wrote:
> Excerpts from Rob Wultsch On Sat, May 21, 2011 at 10:01 PM:
>> ... primary persistent data store for the worlds largest social
>> network
>>
>> ... primary persistent data store for the worlds largest web
>> hosting provider, domain registrar and SSL registrar.
>>
>> ... primary persistent data store for the largest ad network
>>
>> ... "Does anyone run a farm with more than 1,000 Postgres
>> servers?".
>>
>> ... if PG was a better solution* it would be used
>
> In some ways you're saying proves Jim's point. A pragmatic definition
> of "better" would be "more appropriate" or "a better fit' - a better
> fit for the workload or possibly the organisation's existing skills
> and along with the skills habits and expectations.
>
> The examples you're quoting above are foreign to decision makers with
> a background in commercial RDBMSs like DB/2, MSSQL, etc. Insurance
> brokerages with 200 staff members don't care about 1000 server farms -
> they want expression indexes, partial indexes, CTEs and a bunch of
> other things which they've come to expect from relation databases.
>
> The mistake which these not entirely hypothetical managers (I have met
> a few too) are making about in assuming equality between all open
> source databases is much as the same as you mistake in claiming that
> the features which matter to myfacedoubleclickspacebook are the only
> ones that matter.

Right. It is especially easy for experienced database people to dismiss OSS databases because of missing features;
thingslike materialized views, replication in Postgres, or stored procedures and triggers in MySQL (just to name a
few).Throw in all the NoSQL OSS databases and things get even worse. 

Postgres has advanced to a point where there aren't many features that we don't have that large databases do;
materializedviews and parallel query execution are the only two that come to mind. We even have features that other
majordatabases don't have (we'll see if we beat MSSQL to the punch with KNN indexes). I'm hoping there's some way we
canenlighten database professionals that not all OSS databases are the same, and the Postgres actually has most (if not
all)of what they expect out of a large commercial database. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Differentiating different Open Source databases

From
Rob Wultsch
Date:
On Sun, May 22, 2011 at 11:21 AM, Alastair Turner <bell@ctrlf5.co.za> wrote:
> In some ways you're saying proves Jim's point. A pragmatic definition
> of "better" would be "more appropriate" or "a better fit' - a better
> fit for the workload or possibly the organisation's existing skills
> and along with the skills habits and expectations.

Sure. I did not disagree with the meat of what Jim said. I am not sure
it is possible to effectively execute what he wants to do, but I don't
think he is wrong at all.

> The examples you're quoting above are foreign to decision makers with
> a background in commercial RDBMSs like DB/2, MSSQL, etc. Insurance
> brokerages with 200 staff members don't care about 1000 server farms -
> they want expression indexes, partial indexes, CTEs and a bunch of
> other things which they've come to expect from relation databases.

The point of what I said was to make it very clear that Dimitri is
wrong. Saying MySQL sucks is not productive at all.

> The mistake which these not entirely hypothetical managers (I have met
> a few too) are making about in assuming equality between all open
> source databases is much as the same as you mistake in claiming that
> the features which matter to myfacedoubleclickspacebook are the only
> ones that matter.

Different strokes for different folks. I never said that those
features are the only ones that matter. Do note that most of the
features I mentioned don't help one run a big farm. Covering indexes
are assumed to be everywhere at this point. Checksuming data on disk
is just a good practice. I can continue at length.

My point is that beating up on MySQL ("only get so far") is often
wrong, intellectually dishonest and a discredit to the community.

--
Rob Wultsch
wultsch@gmail.com

Re: Differentiating different Open Source databases

From
Dimitri Fontaine
Date:
Hi,

Rob Wultsch <wultsch@gmail.com> writes:
> "only get so far with... MySQL", so, yeah, about that... You are wrong.

I mainly agree with most of what you said there, but let me refocus a
little the context of my comment, because it somehow went out all the
wrong way.

My angle is that of application architecture.  What part of the business
logic do you want your database to handle for you, ensuring your
constraints and a good concurrency pattern?  How is that choice going to
limit your ability to scale (up, out) ?

With Oracle you are pretty quickly limited to the licensing model, with
MySQL by the features available (example: you may have either
transaction safety *or* full text search).  In both cases if you have a
demanding environment you will fix the problem in the application code
rather than using features the "database product" of your choice is
providing.

PostgreSQL offers a breakthrough here because you get a huge set of
business logic development oriented features and scaling is not limited
by the licence costs.  That was my point, only very poorly made.  I hope
it get out better this time.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Differentiating different Open Source databases

From
Greg Smith
Date:
Rob Wultsch wrote:
> The point of what I said was to make it very clear that Dimitri is
> wrong. Saying MySQL sucks is not productive at all.
>

I'm not sure where this escalation of hostility came from, but it's not
really helping.  Attributing comments to Dimitri that he didn't say,
along with kicking around a strawman you built of them, is
intellectually dishonest too you know.

There is a large enough list of things PostgreSQL is really good for,
where neither MySQL nor Oracle are effective competitors, to justify the
"only get so far" comment you read way too much into; they're just not
your use cases.  Some of the GIS workloads we're seeing nowadays are
good examples.  And comparisons using the NoSQL problem space will
always be absent of any cases where the ability to execute complicated
queries is the main challenge.  I spend an order of magnitude more time
fighting >5 table join issues than I do any of the things you mentioned
optimizing for.

There are of course some challenges to PostgreSQL deployments in the
areas you specialize in too, where there are significant advantages
advocating for MySQL instead.  I'm not sure why you're so hung up on
covering indexes as one of the key parts of that; those are nice but far
from essential.  The scale of Heroku's PostgreSQL deployments seems
accelerating toward the sort of size you're suggesting hasn't been
achieved yet.  From the information they've shared about that, I'm
seeing a pretty different of issues than the ones you were highlighting
as key limiters.  I'd rather talk about what successful deployments are
using and fighting rather than bashing PostgreSQL use cases in the more
abstract way.  For a while now, large farms of PostgreSQL has been only
a theorized problem only because a popular enough app compatible with it
wasn't available yet.  Heroku seems to have that with their Rails
hosting, and scaling up the database instance set has just taken the
normal sort of database operations work to accomplish.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: Differentiating different Open Source databases

From
Rob Wultsch
Date:
On Sun, May 22, 2011 at 9:54 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Rob Wultsch wrote:
>>
>> The point of what I said was to make it very clear that Dimitri is
>> wrong. Saying MySQL sucks is not productive at all.
>>
>
> I'm not sure where this escalation of hostility came from, but it's not
> really helping.  Attributing comments to Dimitri that he didn't say, along
> with kicking around a strawman you built of them, is intellectually
> dishonest too you know.

There was escalation? He made statement should have been fleshed out
before sending and I called him on it. Given the context of his
statement (trying to publicize) I think being blunt was required.
What he said was a truncated version of what he wanted to get across.
In the full version it is valid, but the truncated version is rubbish.
Jim had made a comment about people thinking that open source
databases were not very good. Dimitri agreed with a comment about
MySQL. There is not much reading between the lines required here.

I have had to deal with having to answer questions about PG people
belittling MySQL much of time I push to use PG. It get's really old
and has hurt my attempts to use PG.

> There is a large enough list of things PostgreSQL is really good for, where
> neither MySQL nor Oracle are effective competitors, to justify the "only get
> so far" comment you read way too much into; they're just not your use cases.
>  Some of the GIS workloads we're seeing nowadays are good examples.  And
> comparisons using the NoSQL problem space will always be absent of any cases
> where the ability to execute complicated queries is the main challenge.  I
> spend an order of magnitude more time fighting >5 table join issues than I
> do any of the things you mentioned optimizing for.

No argument with PG being a very good platform (as I have already said
in this thread). My comments were a defense of MySQL also being a very
good platform and saying that anyone say it is not is wrong and pg
advocacy should not be pushing that viewpoint.

> There are of course some challenges to PostgreSQL deployments in the areas
> you specialize in too, where there are significant advantages advocating for
> MySQL instead.  I'm not sure why you're so hung up on covering indexes as
> one of the key parts of that; those are nice but far from essential.

Covering indexes are not needed for a big deployment, but they are
massively useful for many of the workloads that I dealt with.

> The scale of Heroku's PostgreSQL deployments seems accelerating toward the sort
> of size you're suggesting hasn't been achieved yet.  From the information
> they've shared about that, I'm seeing a pretty different of issues than the
> ones you were highlighting as key limiters.  I'd rather talk about what
> successful deployments are using and fighting rather than bashing PostgreSQL
> use cases in the more abstract way.  For a while now, large farms of
> PostgreSQL has been only a theorized problem only because a popular enough
> app compatible with it wasn't available yet.  Heroku seems to have that with
> their Rails hosting, and scaling up the database instance set has just taken
> the normal sort of database operations work to accomplish.

I have not tried to highlight anything in particular as a key limiter
other than competent humans. I suggested writing tutorials.

I am trying to steer the advocacy group to not hurt my advocacy.

--
Rob Wultsch
wultsch@gmail.com

Re: Differentiating different Open Source databases

From
Josh Berkus
Date:
Guys,

>> I'm not sure where this escalation of hostility came from, but it's not
>> really helping.  Attributing comments to Dimitri that he didn't say, along

> There was escalation? He made statement should have been fleshed out
> before sending and I called him on it. Given the context of his

This particular discussion seems really pointless, and not helping
advocate either PostgreSQL or any other OSDB.  Give it a rest, thanks?

The useful thing here would be to come up with a document which explains
how PostgreSQL is different from other DBMSes and the use-cases we're
particularly good for.  This is a question I get all the time, and it
would be lovely to have a doc for it.  It doesn't have to say any bad
things about anyone else's database -- in fact, it's better if it
doesn't -- just specifically what we really shine at in general terms.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com