Thread: Differentiating different Open Source databases
An opinion I often run across when talking to database people who haven't dealt with Postgres is "open source databases aren'tvery good". In all cases I've seen, this opinion was formed because they looked at or used a certain open source databaseand indeed found things that could certainly cause serious issues. These opinions were formed from legitimate deficiencies,and because they were in an open source database these folks had made the unreasonable assumption that all OSSdatabases just weren't very good. Fortunately, it's never been hard for me to enlighten these folks, first by admitting that "Yes, that is a problem with thatdatabase" and second by pointing out that not all OSS databases are the same. So I'm wondering if there's some way thatwe can *proactively* get that message out on a larger scale. Some might point out that these folks are just reaching unreasonable conclusions, and I agree. I also would agree that thesepeople *should* understand that just like commercial databases aren't all the same, neither are all OSS databases. However,that *is* the conclusion they're reaching, and not just because they're closed-minded or anti-OSS. Based on how easyit's been for me to enlighten them, I don't think it would be hard to change this unfair bias; it would probably onlytake a single prominent article to do it. -- Jim "Decibel!" Nasby jnasby@EnovaFinancial.com Primary: 512-579-9024 Backup: 512-569-9461
On May 16, 2011, at 5:09 AM, Dimitri Fontaine wrote: > "Nasby, Jim" <JNasby@enovafinancial.com> writes: >> An opinion I often run across when talking to database people who haven't >> dealt with Postgres is "open source databases aren't very good". In all > > Well I'm not sure how closely related/relevant it is, but I find more > and more people thinking they should take the NoSQL pill because frankly > you only get so far with Oracle and MySQL. > > They should also hear the message that PostgreSQL is quite another > beast, and its role into your software architecture can be very > different from those first two. Because of technical facts and also > licencing policies, of course. Do we have any contacts at magazines or other publications? Is there a way we could get some kind of article published? Also, does anyone know any recent hard-core converts (I suppose we can ask on -general). Perhaps we could do a series ofinterviews of people that used to think "OSS databases are toys" and have changed their minds... -- Jim "Decibel!" Nasby jnasby@EnovaFinancial.com Primary: 512-579-9024 Backup: 512-569-9461
"Nasby, Jim" <JNasby@enovafinancial.com> writes: > An opinion I often run across when talking to database people who haven't > dealt with Postgres is "open source databases aren't very good". In all Well I'm not sure how closely related/relevant it is, but I find more and more people thinking they should take the NoSQL pill because frankly you only get so far with Oracle and MySQL. They should also hear the message that PostgreSQL is quite another beast, and its role into your software architecture can be very different from those first two. Because of technical facts and also licencing policies, of course. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, May 16, 2011 at 3:09 AM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > you only get so far with Oracle and MySQL. Disclaimer: I do not speak for any of my employers, past, present or future. "only get so far with... MySQL", so, yeah, about that... You are wrong. Last I heard, it was the primary persistent data store for the worlds largest social network. A system that several hundreds of million people interact with on a daily basis. Last I heard, it was also the primary persistent data store for the worlds largest web hosting provider, domain registrar and SSL registrar. Last I heard, it was also the primary persistent data store for the largest ad network (with billions in revenue) on the web. From personal experience this is not the only place MySQL is used with financial data. A question I have asked numerous times in the last year is "Does anyone run a farm with more than 1,000 Postgres servers?". The answer I have received again and again is no. That is not the case with MySQL. Given the cost of large farms of server, have no doubt that if PG was a better solution* it would be used. At this point MySQL has and PG does not have covering indexes, index change buffering, a cheap optimizer which can be made almost free with hints, on disk compression, query caching (as a stopgap for Memcache integration), etc... And for many workloads PG is a better options than MySQL. And let us not ignore the advantages of on disk checksums. Can anyone really say that PG cares about your data and MySQL does not while PG still allows silent corruption? Given the unreliability of SATA drives this is a real problem. Whatever your feelings about MySQL, "only get so far" is intellectually dishonest. As for NoSQL, that is several families of problems and solutions: Easy of use - Couch Caching - Memcache, Redis Optimizing writes over reads - HBase The only one where I think PG might beat MySQL is caching with unlogged tables (a as of yet unreleased feature), and that is assuming that you don't need to read from them. With a read/write workload I am not sure who would win in terms of performance, but I doubt that the difference would be enough to set aside the advantages of institutional momentum. Of course, this is all open for debate but is another attempt to say "PG rocks, MySQL sucks" really worth attempting? It had been tried more than a few times before and been a losing effort. Are there not other ways that PG people can spend their time and be more highly leveraged? Are there not more than enough expensive, difficult to use, proprietary solutions to pick off and devour before an elephant goes to sea to try to kill a dolphin? I suggest finding some other db (particularly a proprietary one) to be your tusk'ing bag, the dolphin can give as good as it gets. Or better yet, do something really useful and write tutorials. *I think that finding people with experience is what is holding PG back far more than any technical deficiencies. However awesome a piece of software is, it is worthless if it is impossible to find anyone to run it. Best, Rob "the MySQL guy that really likes PG, except when PG people say silly things" Wultsch wultsch@gmail.com
Excerpts from Rob Wultsch On Sat, May 21, 2011 at 10:01 PM: > ... primary persistent data store for the worlds largest social > network > > ... primary persistent data store for the worlds largest web > hosting provider, domain registrar and SSL registrar. > > ... primary persistent data store for the largest ad network > > ... "Does anyone run a farm with more than 1,000 Postgres > servers?". > > ... if PG was a better solution* it would be used In some ways you're saying proves Jim's point. A pragmatic definition of "better" would be "more appropriate" or "a better fit' - a better fit for the workload or possibly the organisation's existing skills and along with the skills habits and expectations. The examples you're quoting above are foreign to decision makers with a background in commercial RDBMSs like DB/2, MSSQL, etc. Insurance brokerages with 200 staff members don't care about 1000 server farms - they want expression indexes, partial indexes, CTEs and a bunch of other things which they've come to expect from relation databases. The mistake which these not entirely hypothetical managers (I have met a few too) are making about in assuming equality between all open source databases is much as the same as you mistake in claiming that the features which matter to myfacedoubleclickspacebook are the only ones that matter. > *I think that finding people with experience is what is holding PG > back far more than any technical deficiencies. However awesome a piece > of software is, it is worthless if it is impossible to find anyone to > run it. > +1 Regards Bell.
On May 22, 2011, at 1:21 PM, Alastair Turner wrote: > Excerpts from Rob Wultsch On Sat, May 21, 2011 at 10:01 PM: >> ... primary persistent data store for the worlds largest social >> network >> >> ... primary persistent data store for the worlds largest web >> hosting provider, domain registrar and SSL registrar. >> >> ... primary persistent data store for the largest ad network >> >> ... "Does anyone run a farm with more than 1,000 Postgres >> servers?". >> >> ... if PG was a better solution* it would be used > > In some ways you're saying proves Jim's point. A pragmatic definition > of "better" would be "more appropriate" or "a better fit' - a better > fit for the workload or possibly the organisation's existing skills > and along with the skills habits and expectations. > > The examples you're quoting above are foreign to decision makers with > a background in commercial RDBMSs like DB/2, MSSQL, etc. Insurance > brokerages with 200 staff members don't care about 1000 server farms - > they want expression indexes, partial indexes, CTEs and a bunch of > other things which they've come to expect from relation databases. > > The mistake which these not entirely hypothetical managers (I have met > a few too) are making about in assuming equality between all open > source databases is much as the same as you mistake in claiming that > the features which matter to myfacedoubleclickspacebook are the only > ones that matter. Right. It is especially easy for experienced database people to dismiss OSS databases because of missing features; thingslike materialized views, replication in Postgres, or stored procedures and triggers in MySQL (just to name a few).Throw in all the NoSQL OSS databases and things get even worse. Postgres has advanced to a point where there aren't many features that we don't have that large databases do; materializedviews and parallel query execution are the only two that come to mind. We even have features that other majordatabases don't have (we'll see if we beat MSSQL to the punch with KNN indexes). I'm hoping there's some way we canenlighten database professionals that not all OSS databases are the same, and the Postgres actually has most (if not all)of what they expect out of a large commercial database. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Sun, May 22, 2011 at 11:21 AM, Alastair Turner <bell@ctrlf5.co.za> wrote: > In some ways you're saying proves Jim's point. A pragmatic definition > of "better" would be "more appropriate" or "a better fit' - a better > fit for the workload or possibly the organisation's existing skills > and along with the skills habits and expectations. Sure. I did not disagree with the meat of what Jim said. I am not sure it is possible to effectively execute what he wants to do, but I don't think he is wrong at all. > The examples you're quoting above are foreign to decision makers with > a background in commercial RDBMSs like DB/2, MSSQL, etc. Insurance > brokerages with 200 staff members don't care about 1000 server farms - > they want expression indexes, partial indexes, CTEs and a bunch of > other things which they've come to expect from relation databases. The point of what I said was to make it very clear that Dimitri is wrong. Saying MySQL sucks is not productive at all. > The mistake which these not entirely hypothetical managers (I have met > a few too) are making about in assuming equality between all open > source databases is much as the same as you mistake in claiming that > the features which matter to myfacedoubleclickspacebook are the only > ones that matter. Different strokes for different folks. I never said that those features are the only ones that matter. Do note that most of the features I mentioned don't help one run a big farm. Covering indexes are assumed to be everywhere at this point. Checksuming data on disk is just a good practice. I can continue at length. My point is that beating up on MySQL ("only get so far") is often wrong, intellectually dishonest and a discredit to the community. -- Rob Wultsch wultsch@gmail.com
Hi, Rob Wultsch <wultsch@gmail.com> writes: > "only get so far with... MySQL", so, yeah, about that... You are wrong. I mainly agree with most of what you said there, but let me refocus a little the context of my comment, because it somehow went out all the wrong way. My angle is that of application architecture. What part of the business logic do you want your database to handle for you, ensuring your constraints and a good concurrency pattern? How is that choice going to limit your ability to scale (up, out) ? With Oracle you are pretty quickly limited to the licensing model, with MySQL by the features available (example: you may have either transaction safety *or* full text search). In both cases if you have a demanding environment you will fix the problem in the application code rather than using features the "database product" of your choice is providing. PostgreSQL offers a breakthrough here because you get a huge set of business logic development oriented features and scaling is not limited by the licence costs. That was my point, only very poorly made. I hope it get out better this time. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Rob Wultsch wrote: > The point of what I said was to make it very clear that Dimitri is > wrong. Saying MySQL sucks is not productive at all. > I'm not sure where this escalation of hostility came from, but it's not really helping. Attributing comments to Dimitri that he didn't say, along with kicking around a strawman you built of them, is intellectually dishonest too you know. There is a large enough list of things PostgreSQL is really good for, where neither MySQL nor Oracle are effective competitors, to justify the "only get so far" comment you read way too much into; they're just not your use cases. Some of the GIS workloads we're seeing nowadays are good examples. And comparisons using the NoSQL problem space will always be absent of any cases where the ability to execute complicated queries is the main challenge. I spend an order of magnitude more time fighting >5 table join issues than I do any of the things you mentioned optimizing for. There are of course some challenges to PostgreSQL deployments in the areas you specialize in too, where there are significant advantages advocating for MySQL instead. I'm not sure why you're so hung up on covering indexes as one of the key parts of that; those are nice but far from essential. The scale of Heroku's PostgreSQL deployments seems accelerating toward the sort of size you're suggesting hasn't been achieved yet. From the information they've shared about that, I'm seeing a pretty different of issues than the ones you were highlighting as key limiters. I'd rather talk about what successful deployments are using and fighting rather than bashing PostgreSQL use cases in the more abstract way. For a while now, large farms of PostgreSQL has been only a theorized problem only because a popular enough app compatible with it wasn't available yet. Heroku seems to have that with their Rails hosting, and scaling up the database instance set has just taken the normal sort of database operations work to accomplish. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Sun, May 22, 2011 at 9:54 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Rob Wultsch wrote: >> >> The point of what I said was to make it very clear that Dimitri is >> wrong. Saying MySQL sucks is not productive at all. >> > > I'm not sure where this escalation of hostility came from, but it's not > really helping. Attributing comments to Dimitri that he didn't say, along > with kicking around a strawman you built of them, is intellectually > dishonest too you know. There was escalation? He made statement should have been fleshed out before sending and I called him on it. Given the context of his statement (trying to publicize) I think being blunt was required. What he said was a truncated version of what he wanted to get across. In the full version it is valid, but the truncated version is rubbish. Jim had made a comment about people thinking that open source databases were not very good. Dimitri agreed with a comment about MySQL. There is not much reading between the lines required here. I have had to deal with having to answer questions about PG people belittling MySQL much of time I push to use PG. It get's really old and has hurt my attempts to use PG. > There is a large enough list of things PostgreSQL is really good for, where > neither MySQL nor Oracle are effective competitors, to justify the "only get > so far" comment you read way too much into; they're just not your use cases. > Some of the GIS workloads we're seeing nowadays are good examples. And > comparisons using the NoSQL problem space will always be absent of any cases > where the ability to execute complicated queries is the main challenge. I > spend an order of magnitude more time fighting >5 table join issues than I > do any of the things you mentioned optimizing for. No argument with PG being a very good platform (as I have already said in this thread). My comments were a defense of MySQL also being a very good platform and saying that anyone say it is not is wrong and pg advocacy should not be pushing that viewpoint. > There are of course some challenges to PostgreSQL deployments in the areas > you specialize in too, where there are significant advantages advocating for > MySQL instead. I'm not sure why you're so hung up on covering indexes as > one of the key parts of that; those are nice but far from essential. Covering indexes are not needed for a big deployment, but they are massively useful for many of the workloads that I dealt with. > The scale of Heroku's PostgreSQL deployments seems accelerating toward the sort > of size you're suggesting hasn't been achieved yet. From the information > they've shared about that, I'm seeing a pretty different of issues than the > ones you were highlighting as key limiters. I'd rather talk about what > successful deployments are using and fighting rather than bashing PostgreSQL > use cases in the more abstract way. For a while now, large farms of > PostgreSQL has been only a theorized problem only because a popular enough > app compatible with it wasn't available yet. Heroku seems to have that with > their Rails hosting, and scaling up the database instance set has just taken > the normal sort of database operations work to accomplish. I have not tried to highlight anything in particular as a key limiter other than competent humans. I suggested writing tutorials. I am trying to steer the advocacy group to not hurt my advocacy. -- Rob Wultsch wultsch@gmail.com
Guys, >> I'm not sure where this escalation of hostility came from, but it's not >> really helping. Attributing comments to Dimitri that he didn't say, along > There was escalation? He made statement should have been fleshed out > before sending and I called him on it. Given the context of his This particular discussion seems really pointless, and not helping advocate either PostgreSQL or any other OSDB. Give it a rest, thanks? The useful thing here would be to come up with a document which explains how PostgreSQL is different from other DBMSes and the use-cases we're particularly good for. This is a question I get all the time, and it would be lovely to have a doc for it. It doesn't have to say any bad things about anyone else's database -- in fact, it's better if it doesn't -- just specifically what we really shine at in general terms. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com