Thread: Justifying a PG over MySQL approach to a project
Hi Everyone:
Tomorrow, I will need to present to a group of managers (who know nothing about DBs) why I chose to use PG over MySQL in a project, MySQL being the more popular DB choice with other engineers, and managers fearing things that are “different” (risk). I have a few hard tecnical reasons (check constraint, deferred constraint checking, array data type), but I’m looking for a “it’s more reliable” reasons. Again, the audience is managers. Is there an impartial, 3rd party evaluation of the 2 DBs out there that identifies PG as being more reliable? It might mention things like fewer incidences of corrupt tables/indexes, fewer deamon crashes, better recovery after system crashes, etc... ?
Thanks !
Managers want support, they can't live without. Every piece of software has its flaws and needs patches. PostgreSQL is supported for 5 years, the latest version (8.4) will be supported at least until 2014. In total there are 6 supported version as we speak, 7.4 - 8.4. MySQL has active support for 5.0 and 5.1 but 5.0 will only be supported for the next two weeks and 5.1 until december next year. Unless you pay for an extended support contract. After 5.1 there is no other stable version at this moment, nobody knows what comes next.
Good luck!
Op 16 dec 2009, om 22:02 heeft Gauthier, Dave het volgende geschreven:
Hi Everyone:Tomorrow, I will need to present to a group of managers (who know nothing about DBs) why I chose to use PG over MySQL in a project, MySQL being the more popular DB choice with other engineers, and managers fearing things that are “different” (risk). I have a few hard tecnical reasons (check constraint, deferred constraint checking, array data type), but I’m looking for a “it’s more reliable” reasons. Again, the audience is managers. Is there an impartial, 3rd party evaluation of the 2 DBs out there that identifies PG as being more reliable? It might mention things like fewer incidences of corrupt tables/indexes, fewer deamon crashes, better recovery after system crashes, etc... ?Thanks !
Gauthier, Dave wrote on 16.12.2009 22:02: > Hi Everyone: > > Tomorrow, I will need to present to a group of managers (who know > nothing about DBs) why I chose to use PG over MySQL in a project, What kind of project is that? If you are developing something that you are selling to other people, MySQL's GPL license will force you to buy a commerciallicense in order to distribute your application unless it is GPL as well. You don't have such constraints with PostgreSQL There are some features that you might want to mention as well - ANSI standard windowing functions - ANSI standard common table expressions - XML support (not necessarily important, but can potentially be nice) Something that drives me nuts with MySQL: it behaves differently depending on the configuration settings, different defaultswith different OS (regarding case sensitivity for example) or the default storage engine selected (thinking aboutANSI mode, strict tables, the ability to store invalid dates, insert 0 instead of null and all those little things...). That makes the QA for a project much more complicated, especially if you don't have control over the installation at thecustomer's site PostgreSQL behaves the same ("syntactically"), regardless on where or how it was installed Thomas
You've probably already found http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007 which was my long treatment of this topic (and overdue for an update). The main thing I intended to put into such an update when I get to it is talking about the really deplorable bug handling situation for MySQL, which is part of how all the data corruption issues show up. There's a good overview of its general weirdness at http://www.xaprb.com/blog/2007/08/12/what-would-make-me-buy-mysql-enterprise/ and the following series of pages lead you through my favorite set of bugs: http://www.mysqlperformanceblog.com/2007/10/04/mysql-quality-of-old-and-new-features/ http://bugs.mysql.com/bug.php?id=28591 http://bugs.mysql.com/bug.php?id=31001 http://bugs.mysql.com/bug.php?id=37830 Basically, they made a performance optimization *in the stable release* and fundamentally broke very basic behavior which didn't get caught by their internal QA at all. That's a disaster that opens up serious questions about both their project planning/structure and their QA too, far as I'm concerned. They do have a regression test suite: http://dev.mysql.com/doc/refman/5.0/en/mysql-test-suite.html But it's not really clear that they run it on every platform, i.e. http://ourdelta.org/hidden-tests-of-the-mysql-testsuite This supports the rumors I've heard that the development on the database regularly cheats by just disabling tests that don't work right in some situations, just so they can ship saying "there's no know issues!". Obviously that's hearsay, but it sure seems to fit the facts we do know. Meanwhile, PostgreSQL never does anything but bug fixes in their stable version updates: http://www.postgresql.org/support/versioning While the PostgreSQL regression testing build farm is completely public and there is no tolerance for failed tests in the community: http://buildfarm.postgresql.org/cgi-bin/show_status.pl The main other reason why PostgreSQL has less corruption issues IMHO is that there's exactly one "storage engine" and everybody works on it. What the MySQL community calls options in storage engines I call split QA, and the source of new types of failures not possible if you only have one underlying storage codebase to worry about. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Greg Smith wrote on 16.12.2009 22:44: > You've probably already found > http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007 > which was my long treatment of this topic (and overdue for an update). There is an update: http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL_2009
Thomas Kellerer wrote: > Greg Smith wrote on 16.12.2009 22:44: >> You've probably already found >> http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007 >> >> which was my long treatment of this topic (and overdue for an update). > > There is an update: > > http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL_2009 You just found where my work in progress on producing an update is at. There's minimal changes there so far. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Greg Smith <greg@2ndquadrant.com> writes: > They do have a regression test suite: > http://dev.mysql.com/doc/refman/5.0/en/mysql-test-suite.html > But it's not really clear that they run it on every platform, i.e. > http://ourdelta.org/hidden-tests-of-the-mysql-testsuite They definitely don't run it on every combination of allegedly-supported options. I had to turn on --with-big-tables in the Red Hat build awhile ago, which is probably a good thing anyway (though if so, why isn't it default?); the reason I had to do it was the regression tests started showing obvious failures without it, proving that they don't bother to run any internal tests without it. I'm not sure how thorough our buildfarm coverage is for different option combinations, but the fact that their test suite takes circa four hours to run is *not* an advantage for them in the comparison. They clearly haven't got the resources to run all the cases they ought to. (BTW, that's 4 hours for standard "make check", not any of the optional tests referred to in the above-cited blog entry.) > This supports the rumors I've heard that the development on the database > regularly cheats by just disabling tests that don't work right in some > situations, just so they can ship saying "there's no know issues!". Oh, absolutely. They actually have a standard mechanism built into the test harness for disabling tests that are currently failing, and the set that are so disabled changes with every update. Compare the contents of mysql-test/t/disabled.def in various releases sometime. regards, tom lane
On Wed, Dec 16, 2009 at 2:44 PM, Greg Smith <greg@2ndquadrant.com> wrote: > You've probably already found > http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007 > which was my long treatment of this topic (and overdue for an update). > > The main thing I intended to put into such an update when I get to it is > talking about the really deplorable bug handling situation for MySQL, which > is part of how all the data corruption issues show up. There's a good > overview of its general weirdness at > http://www.xaprb.com/blog/2007/08/12/what-would-make-me-buy-mysql-enterprise/ > and the following series of pages lead you through my favorite set of bugs: > > http://www.mysqlperformanceblog.com/2007/10/04/mysql-quality-of-old-and-new-features/ > http://bugs.mysql.com/bug.php?id=28591 > http://bugs.mysql.com/bug.php?id=31001 > http://bugs.mysql.com/bug.php?id=37830 > > Basically, they made a performance optimization *in the stable release* and > fundamentally broke very basic behavior which didn't get caught by their > internal QA at all. That's a disaster that opens up serious questions about > both their project planning/structure and their QA too, far as I'm > concerned. The important point here is that the bug was introduced to a stable branch, fixed halfway, then detected again, then fixed yet again. This does not instil confidence in their QA or code review. As a for instance of who runs PostgreSQL and who runs MySQL, we have slashdot and the .info and .org TLDs. When you go to slashdot.org and it's not working right, that's MySQL acting up. When you can't get to any .info or .org domains, that's PostgreSQL. I've had slashdot have a non-functioning database underneath it quite a few times (note that the site stays up, but you can't edit anything because it's all static). I've never once had the .org or .info TLDs go down on me.
On Dec 16, 2009, at 3:05 PM, Scott Marlowe wrote: > On Wed, Dec 16, 2009 at 2:44 PM, Greg Smith <greg@2ndquadrant.com> wrote: >> You've probably already found >> http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007 >> which was my long treatment of this topic (and overdue for an update). >> >> The main thing I intended to put into such an update when I get to it is >> talking about the really deplorable bug handling situation for MySQL, which >> is part of how all the data corruption issues show up. There's a good >> overview of its general weirdness at >> http://www.xaprb.com/blog/2007/08/12/what-would-make-me-buy-mysql-enterprise/ >> and the following series of pages lead you through my favorite set of bugs: >> >> http://www.mysqlperformanceblog.com/2007/10/04/mysql-quality-of-old-and-new-features/ >> http://bugs.mysql.com/bug.php?id=28591 >> http://bugs.mysql.com/bug.php?id=31001 >> http://bugs.mysql.com/bug.php?id=37830 >> >> Basically, they made a performance optimization *in the stable release* and >> fundamentally broke very basic behavior which didn't get caught by their >> internal QA at all. That's a disaster that opens up serious questions about >> both their project planning/structure and their QA too, far as I'm >> concerned. > > The important point here is that the bug was introduced to a stable > branch, fixed halfway, then detected again, then fixed yet again. > This does not instil confidence in their QA or code review. > > As a for instance of who runs PostgreSQL and who runs MySQL, we have > slashdot and the .info and .org TLDs. When you go to slashdot.org and > it's not working right, that's MySQL acting up. When you can't get to > any .info or .org domains, that's PostgreSQL. My information is quite dated, but as I understand it that's not actually true. Postgresql is used for domain registration management at those domains (amongst others). It's not used for anything related to resolution of those domains in real time that I'm aware of. If you were unable to register or transfer a .org domain that would be a postgresql failure. > > I've had slashdot have a non-functioning database underneath it quite > a few times (note that the site stays up, but you can't edit anything > because it's all static). I've never once had the .org or .info TLDs > go down on me. Lets not draw too much attention to the database that's responsible for that stability. :) Cheers, Steve
EnterpriseDB wrote a white paper called "PostgreSQL vs. MySQL: A Comparison of Enterprise Suitability", which is fairly accessible: http://downloads.enterprisedb.com/whitepapers/White_Paper_PostgreSQL_MySQL.pdf Regards, Peter Geoghegan
On 17/12/2009 5:02 AM, Gauthier, Dave wrote: > Hi Everyone: > > Tomorrow, I will need to present to a group of managers (who know > nothing about DBs) why I chose to use PG over MySQL in a project, MySQL > being the more popular DB choice with other engineers, and managers > fearing things that are “different” (risk). I have a few hard tecnical > reasons (check constraint, deferred constraint checking, array data > type), but I’m looking for a “it’s more reliable” reasons. Again, the > audience is managers. Is there an impartial, 3^rd party evaluation of > the 2 DBs out there that identifies PG as being more reliable? It might > mention things like fewer incidences of corrupt tables/indexes, fewer > deamon crashes, better recovery after system crashes, etc... ? In all honesty, I don't know if there's much out there in terms of impartial analysis. Most of it is done by someone with some sort of a preference that tends to make its self known. It also depends a _lot_ on what you are doing with the database. What sorts of data are you storing? How important to you is that data? What sorts of client workloads do you expect - huge numbers of clients running frequent simple queries, or fewer clients with big complex queries? How much data do you expect to store? etc. All these have a real bearing on database choice, and it's hard to give good answers without some knowledge of those details. One thing I'd like to highlight now: when people say "MySQL is faster" or "Pg is slow" they tend to (a) be referring to very old versions of Pg, and (b) be using the very fast but very unsafe MyISAM table type in MySQL, which is great until it eats your data. So beware of speed claims not backed by very solid configuration details. Anyway, just to be different let's try to look at why you might choose MySQL over PostgreSQL, instead of getting all us Pg folks listing why you should pick Pg. To me, Pg is the default safe and sane choice, and I need to seek reasons why I might use MySQL instead for a particular task. So: *scratches head* - MySQL is horizontally scalable via clustering and multi-master replication (though you must beware of numerous gotchas). PostgreSQL can be used with read-only slaves via Slony/Bucardo/etc replication, but is limited to a single authoriative master. (There's work ongoing to enable readonly hot standby slaves with failover, but no multi-master is on the horizion). - If you don't care about your data, MySQL used with MyISAM is *crazy* fast for lots of small simple queries. Big enough apps will still need something like memcached on top of that, though. If using MySQL+MyISAM this way you must be prepared to deal with table corruption on crashes/outages/powerloss, lack of any transactional behaviour, etc. There's also some bizarre error "handling" they use to avoid aborting a non-transactional operation on a MyISAM table half-way though, so you must be very careful to make sure your updates are valid before attempting them. But.... why not just use memcached over something somewhat slower but a lot safer? I guess this one isn't a plus. - It's a cool tool when you want to query and integrate data from all sorts of disparate sources, thanks to its support for pluggable storage engines. If you want something for data analysis and integration rather than safe storage it's well worth looking at. -- Craig Ringer
Quick question about the following statement:
"but no multi-master is on the horizion"
From what I understand, there's several multi-master solutions such as Bucardo, rubyrep, PgPool and PgPool II, PgCluster and Sequoia. Also Postgres-R, which is still in development. Perhaps you just meant there's nothing available out of the box? Thanks!
Mike
"but no multi-master is on the horizion"
From what I understand, there's several multi-master solutions such as Bucardo, rubyrep, PgPool and PgPool II, PgCluster and Sequoia. Also Postgres-R, which is still in development. Perhaps you just meant there's nothing available out of the box? Thanks!
Mike
On Wed, Dec 16, 2009 at 10:30 PM, Craig Ringer <craig@postnewspapers.com.au> wrote:
On 17/12/2009 5:02 AM, Gauthier, Dave wrote:Hi Everyone:audience is managers. Is there an impartial, 3^rd party evaluation of
Tomorrow, I will need to present to a group of managers (who know
nothing about DBs) why I chose to use PG over MySQL in a project, MySQL
being the more popular DB choice with other engineers, and managers
fearing things that are “different” (risk). I have a few hard tecnical
reasons (check constraint, deferred constraint checking, array data
type), but I’m looking for a “it’s more reliable” reasons. Again, the
the 2 DBs out there that identifies PG as being more reliable? It might
mention things like fewer incidences of corrupt tables/indexes, fewer
deamon crashes, better recovery after system crashes, etc... ?
In all honesty, I don't know if there's much out there in terms of impartial analysis. Most of it is done by someone with some sort of a preference that tends to make its self known.
It also depends a _lot_ on what you are doing with the database. What sorts of data are you storing? How important to you is that data? What sorts of client workloads do you expect - huge numbers of clients running frequent simple queries, or fewer clients with big complex queries? How much data do you expect to store? etc. All these have a real bearing on database choice, and it's hard to give good answers without some knowledge of those details.
One thing I'd like to highlight now: when people say "MySQL is faster" or "Pg is slow" they tend to (a) be referring to very old versions of Pg, and (b) be using the very fast but very unsafe MyISAM table type in MySQL, which is great until it eats your data. So beware of speed claims not backed by very solid configuration details.
Anyway, just to be different let's try to look at why you might choose MySQL over PostgreSQL, instead of getting all us Pg folks listing why you should pick Pg. To me, Pg is the default safe and sane choice, and I need to seek reasons why I might use MySQL instead for a particular task. So:
*scratches head*
- MySQL is horizontally scalable via clustering and multi-master replication (though you must beware of numerous gotchas). PostgreSQL can be used with read-only slaves via Slony/Bucardo/etc replication, but is limited to a single authoriative master.
(There's work ongoing to enable readonly hot standby slaves with failover, but no multi-master is on the horizion).
- If you don't care about your data, MySQL used with MyISAM is *crazy* fast for lots of small simple queries. Big enough apps will still need something like memcached on top of that, though. If using MySQL+MyISAM this way you must be prepared to deal with table corruption on crashes/outages/powerloss, lack of any transactional behaviour, etc. There's also some bizarre error "handling" they use to avoid aborting a non-transactional operation on a MyISAM table half-way though, so you must be very careful to make sure your updates are valid before attempting them. But.... why not just use memcached over something somewhat slower but a lot safer? I guess this one isn't a plus.
- It's a cool tool when you want to query and integrate data from all sorts of disparate sources, thanks to its support for pluggable storage engines. If you want something for data analysis and integration rather than safe storage it's well worth looking at.
--
Craig Ringer
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Dave, please also check out the licence and costs terms in detail. Especially: is it given that the planned usage willl continue to be within the allowed bounds for MySQL-GPL? Are otherwise the costs for MySQL-commercial budgeted or a reserve founded? PostgreSQL has here a GIANT advantage with a very very clear licence which allows basically anything relevant; without the need to buy commerical licences. Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 no fx, no carrier pigeon - %s is too gigantic of an industry to bend to the whims of reality
On Dec 16, 2009, at 10:30 PM, Craig Ringer wrote: > - If you don't care about your data, MySQL used with MyISAM is *crazy* fast for lots of small simple queries. This one causes me no end of grief as too often it's simply touted as "MyISAM is fast(er)" while leaving of the bit about"for lots of small, simple queries". Developers then pick MySQL with MyISAM storage and then scratch their heads saying,"But! I heard it was faster...," when I tell them the reason their app is crawling is because they have even moderatelycomplex reads or writes starving out the rest of their app thanks to the table locks required by MyISAM. As youmentioned, for the type of active workloads that MyISAM is good for, you might as well just use memcache over somethingmore reliable and/or concurrent, or even a simple key-value or document store if you really don't need transactions. Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > - MySQL is horizontally scalable via clustering and multi-master > replication (though you must beware of numerous gotchas). PostgreSQL can > be used with read-only slaves via Slony/Bucardo/etc replication, but is > limited to a single authoriative master. > > (There's work ongoing to enable readonly hot standby slaves with > failover, but no multi-master is on the horizion). Well that's refreshing: usually Bucardo is mistaken for a system that only does master-master and not master-slave, rather than vice-versa. :) You can have two authoritative masters with Bucardo, in addition to any number of slaves radiating from one or both of those (as well as just simple master->slaves). > - It's a cool tool when you want to query and integrate data from all > sorts of disparate sources, thanks to its support for pluggable storage > engines. If you want something for data analysis and integration rather > than safe storage it's well worth looking at. What sort of sources? I'm curious here to find areas we can improve upon. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 200912170927 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAksqP9kACgkQvJuQZxSWSshbUQCg3CfvpeivDi6gg2bkr74I17Qe RKAAnRu3GTUQ3Bg3R2Fq3eOsgK4N0xd1 =5r9R -----END PGP SIGNATURE-----
On Wed, Dec 16, 2009 at 4:02 PM, Gauthier, Dave <dave.gauthier@intel.com> wrote: > Hi Everyone: > > Tomorrow, I will need to present to a group of managers (who know nothing > about DBs) why I chose to use PG over MySQL in a project, MySQL being the > more popular DB choice with other engineers, and managers fearing things > that are “different” (risk). I have a few hard tecnical reasons (check > constraint, deferred constraint checking, array data type), but I’m looking > for a “it’s more reliable” reasons. Again, the audience is managers. Is > there an impartial, 3rd party evaluation of the 2 DBs out there that > identifies PG as being more reliable? It might mention things like fewer > incidences of corrupt tables/indexes, fewer deamon crashes, better recovery > after system crashes, etc... ? The #1 useful/practical/business sense feature that postgresql has over mysql and afaik, most commercial databases even, is transaction DDL. You can update live systems and if anything goes wrong your changes roll back. merlin
They just called the meeting, or at least that part of it. There seems to be a battle brewing, some MySQL advocates areangry, concerned, fearful, ... I dont know why for sure. My managers, who advocate my position and PG are preparing,but the decision will be made by higher-ups who really don't know anything about DBs. They just talk in termsof risk and cost and schedules and yes, licenses. So I'll let them articulate the defense of PG on those terms. I'mjust an engineer. I've been feeding them the valuable input I've been getting from this forumn and thanks to all whohave contributed. Really! -----Original Message----- From: Massa, Harald Armin [mailto:chef@ghum.de] Sent: Thursday, December 17, 2009 3:14 AM To: Gauthier, Dave Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Justifying a PG over MySQL approach to a project Dave, please also check out the licence and costs terms in detail. Especially: is it given that the planned usage willl continue to be within the allowed bounds for MySQL-GPL? Are otherwise the costs for MySQL-commercial budgeted or a reserve founded? PostgreSQL has here a GIANT advantage with a very very clear licence which allows basically anything relevant; without the need to buy commerical licences. Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 no fx, no carrier pigeon - %s is too gigantic of an industry to bend to the whims of reality
On Thu, Dec 17, 2009 at 3:55 PM, Gauthier, Dave <dave.gauthier@intel.com> wrote: > They just called the meeting, or at least that part of it. There seems to be a battle brewing, some MySQL advocates areangry, concerned, fearful, ... I dont know why for sure. in places like that it is inevitable. there's always going to be crowd that will fear of change. They don't generate reasonable opinions, it is the fear of change. It might be hard to fight that, since managers will make the decision based on reports that they can trust. Scared folks often generate a lot of feedback. Just like in politics :) > My managers, who advocate my position and PG are preparing, but the decision will be made by higher-ups who really don'tknow anything about DBs. They just talk in terms of risk and cost and schedules and yes, licenses. So I'll let themarticulate the defense of PG on those terms. I'm just an engineer. I've been feeding them the valuable input I've beengetting from this forumn and t hanks to all who have contributed. Really! Well, give them the best report ever. Also, skip the crap they won't understand. Try writing first the stuff they will understand, than give them reason why they would want to consider it - in their own language. Skip the engineering stuff. Managers often have a very short focus span. As soon as it smells like something they don't understand, they will stop reading it. -- GJ
Actually, the DB I'm working on is rather small but has a somewhat complex system of constraints and triggers that maintainthe data. Queries will outnumber writes (20x at least). And the DB has to be mirrorred at a sister site a couplethousand miles away, so I'm looking for a robust DB replication system for that. These are the key points they will be worried about... - DB up time (most important), including recovery time after disasters (e.g. power outages) - Data integrity. I'm addressing this with constraints and using triggers to populate columns with derived data. - Data Quality. NO CORRUPT TABLES / INDEXES - Retrofitting existing apps to work with PG. Perl/DBI is a subtle change in the DBD designation. Some Tcl-MySQL code istougher. I'm proposing changing everything to go through ODBC as a standard now, and for the future. - Cost of maintainence. Do I have to babysit this DB 4 hours every day, or does it run by itself? Is this like Oracle wherewe have to hire professional 24x7 DBAs, or is this hands-off? That kind of question. I have a DB up and working. Runs great, no problems, but very lightly loaded and/or used at this time. Having worked withPG in the past, I'm not worried about this piece. I am more concerned with getting a robust DB replication system up and running. Bucardo looks pretty good, but I've juststarted looking at the options. Any suggestions? Thanks! -----Original Message----- From: Erik Jones [mailto:ejones@engineyard.com] Sent: Thursday, December 17, 2009 4:42 AM To: Craig Ringer Cc: Gauthier, Dave; pgsql-general@postgresql.org Subject: Re: [GENERAL] Justifying a PG over MySQL approach to a project On Dec 16, 2009, at 10:30 PM, Craig Ringer wrote: > - If you don't care about your data, MySQL used with MyISAM is *crazy* fast for lots of small simple queries. This one causes me no end of grief as too often it's simply touted as "MyISAM is fast(er)" while leaving of the bit about"for lots of small, simple queries". Developers then pick MySQL with MyISAM storage and then scratch their heads saying,"But! I heard it was faster...," when I tell them the reason their app is crawling is because they have even moderatelycomplex reads or writes starving out the rest of their app thanks to the table locks required by MyISAM. As youmentioned, for the type of active workloads that MyISAM is good for, you might as well just use memcache over somethingmore reliable and/or concurrent, or even a simple key-value or document store if you really don't need transactions. Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k
How difficult is it to switch the master's hat from one DB instance to another? Let's say the master in a master-slave scenariogoes down but the slave is fine. Can I designate the slave as being the new master, use it for read/write, and thenjust call the broken master the new slave once it comes back to life (something like that)? -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Greg Sabino Mullane Sent: Thursday, December 17, 2009 9:28 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Justifying a PG over MySQL approach to a project -----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > - MySQL is horizontally scalable via clustering and multi-master > replication (though you must beware of numerous gotchas). PostgreSQL can > be used with read-only slaves via Slony/Bucardo/etc replication, but is > limited to a single authoriative master. > > (There's work ongoing to enable readonly hot standby slaves with > failover, but no multi-master is on the horizion). Well that's refreshing: usually Bucardo is mistaken for a system that only does master-master and not master-slave, rather than vice-versa. :) You can have two authoritative masters with Bucardo, in addition to any number of slaves radiating from one or both of those (as well as just simple master->slaves). > - It's a cool tool when you want to query and integrate data from all > sorts of disparate sources, thanks to its support for pluggable storage > engines. If you want something for data analysis and integration rather > than safe storage it's well worth looking at. What sort of sources? I'm curious here to find areas we can improve upon. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 200912170927 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAksqP9kACgkQvJuQZxSWSshbUQCg3CfvpeivDi6gg2bkr74I17Qe RKAAnRu3GTUQ3Bg3R2Fq3eOsgK4N0xd1 =5r9R -----END PGP SIGNATURE----- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Thu, Dec 17, 2009 at 8:23 AM, Gauthier, Dave <dave.gauthier@intel.com> wrote: > How difficult is it to switch the master's hat from one DB instance to another? Let's say the master in a master-slavescenario goes down but the slave is fine. Can I designate the slave as being the new master, use it for read/write,and then just call the broken master the new slave once it comes back to life (something like that)? I know someone that uses a revolving Sony Master-Slave setup between China and the US. During the US working hour the US server is the master, during the working hours of China it becomes the Master. Of course the person how constructed this system mentioned it was woefully challenging. In his case, not only was the slony configuration difficult but also finding and keeping stable communication path-ways between China and the US. -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > How difficult is it to switch the master's hat from one DB instance > to another? Let's say the master in a master-slave scenario goes > down but the slave is fine. Can I designate the slave as being the > new master, use it for read/write, and then just call the broken > master the new slave once it comes back to life (something like that)? Sure. Bucardo slaves are not changed at all, so they are already read/write and don't need anything special done to "unslave" them. One possible way to handle the scenario is: Assuming three servers: * A (master) sends changes to B, receives read/write queries * B (slave) has transaction_read_only set to true, receives read queries * C has the Bucardo database and daemon Box A goes down suddenly. * Stop Bucardo on box C * Flip the boxes around in the bucardo.db table * Do a 'bucardo_ctl validate sync all' (This will create the needed triggers on B) * Set B's transaction_read_only to false * Point your apps at B instead of A for read/write queries When A comes back up: * DROP SCHEMA bucardo CASCADE; (drops all triggers) * Set transaction_read_only to true * Start Bucardo on C * Once caught up, point read-only queries to A If you are in a rush, you point things to B immediately after A fails, but you'll have to recopy the entire table data to the slave, as the triggers won't be in place yet. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 200912171153 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAksqYqUACgkQvJuQZxSWSsjZtQCfTwbI3f9W0z+82IU7lL+2LwNK aUYAnj3AMjRDOeFIuHDee4JJemneArie =75Ho -----END PGP SIGNATURE-----
"Gauthier, Dave" <dave.gauthier@intel.com> writes: > I am more concerned with getting a robust DB replication system up and > running. Bucardo looks pretty good, but I've just started looking at > the options. Any suggestions? Master Slave replication? Meaning no writes on the "sister site". If yes, consider Londiste from Skytools. Easy to setup and maintain, and robust. http://wiki.postgresql.org/wiki/Skytools http://wiki.postgresql.org/wiki/Londiste_Tutorial Regards, -- dim
On Thursday 17 December 2009 09:51:19 Richard Broersma wrote: > On Thu, Dec 17, 2009 at 8:23 AM, Gauthier, Dave <dave.gauthier@intel.com> wrote: > > How difficult is it to switch the master's hat from one DB instance to > > another? Let's say the master in a master-slave scenario goes down but > > the slave is fine. Can I designate the slave as being the new master, > > use it for read/write, and then just call the broken master the new slave > > once it comes back to life (something like that)? This is very easy with SLONY. If the master is truly 'dead' you can run a SLONY 'failover' command like this (note these are slonik commands where node 1 is the 'dead' master and node 2 is the current slave): echo 'Preparing to failover (set id = 1, backup node = 2)'; failover (id = 1, backup node = 2); echo 'Done'; echo 'Preparing to drop node (set id = 1, event node = 2)'; drop node (id = 1, event node = 2); echo 'Done'; echo 'Failover complete'; at this point the dead master node is no longer part of the replication cluster and the slave is the new master Once the dead node is rebuilt then you simply add it to the replication cluster as a new slave node Also, if you just wanted to "move" the master, changing the existing master into a slave in the process you can do this: lock set (id = 1, origin = 1); wait for event (origin = 1, confirmed = 2); echo 'set locked'; move set (id = 1, old origin = 1, new origin = 2); wait for event (origin = 1, confirmed = 2); echo 'switchover complete'; > > I know someone that uses a revolving Sony Master-Slave setup between > China and the US. During the US working hour the US server is the > master, during the working hours of China it becomes the Master. > > Of course the person how constructed this system mentioned it was > woefully challenging. In his case, not only was the slony > configuration difficult but also finding and keeping stable > communication path-ways between China and the US. >
Gauthier, Dave wrote: > > Hi Everyone: > > Tomorrow, I will need to present to a group of managers (who know > nothing about DBs) why I chose to use PG over MySQL in a project, > MySQL being the more popular DB choice with other engineers, and > managers fearing things that are “different” (risk). I have a few hard > tecnical reasons (check constraint, deferred constraint checking, > array data type), but I’m looking for a “it’s more reliable” reasons. > Again, the audience is managers. Is there an impartial, 3^rd party > evaluation of the 2 DBs out there that identifies PG as being more > reliable? It might mention things like fewer incidences of corrupt > tables/indexes, fewer deamon crashes, better recovery after system > crashes, etc... ? > let me just say this one word about that. ORACLE i think its a pretty safe assumption that Oracle will not be good for MySQL.
Gauthier, Dave wrote: > Hi Everyone: > > > > Tomorrow, I will need to present to a group of managers (who know > nothing about DBs) why I chose to use PG over MySQL in a project, MySQL > being the more popular DB choice with other engineers, and managers > fearing things that are “different” (risk). I have a few hard tecnical > reasons (check constraint, deferred constraint checking, array data > type), but I’m looking for a “it’s more reliable” reasons. Again, the > audience is managers. Is there an impartial, 3^rd party evaluation of > the 2 DBs out there that identifies PG as being more reliable? It might > mention things like fewer incidences of corrupt tables/indexes, fewer > deamon crashes, better recovery after system crashes, etc... ? > > > > Thanks ! There is a current question about the survivability of MySQL right now with the potential sale of MySQL. I would not bank on MySQL for any long-term project. I am sure that MySQL will live in the long run, but they may well be turbulent times ahead if whomever comes to own MySQL decides to neglect or kill it and the source gets forked. Madi
On Thu, Dec 17, 2009 at 12:35 PM, Madison Kelly <linux@alteeve.com> wrote: > Gauthier, Dave wrote: >> >> Hi Everyone: >> >> >> Tomorrow, I will need to present to a group of managers (who know nothing >> about DBs) why I chose to use PG over MySQL in a project, MySQL being the >> more popular DB choice with other engineers, and managers fearing things >> that are “different” (risk). I have a few hard tecnical reasons (check >> constraint, deferred constraint checking, array data type), but I’m looking >> for a “it’s more reliable” reasons. Again, the audience is managers. Is >> there an impartial, 3^rd party evaluation of the 2 DBs out there that >> identifies PG as being more reliable? It might mention things like fewer >> incidences of corrupt tables/indexes, fewer deamon crashes, better recovery >> after system crashes, etc... ? >> >> >> Thanks ! > > There is a current question about the survivability of MySQL right now with > the potential sale of MySQL. I would not bank on MySQL for any long-term > project. I am sure that MySQL will live in the long run, but they may well > be turbulent times ahead if whomever comes to own MySQL decides to neglect > or kill it and the source gets forked. It's important to separate out MySQL AB the company, owned by Sun, soon to be owned by Oracle, from MySQL the GPL licensed database, which may or may not allow you to distribute your own commercial code without buying a license. Given the OSS License loophole, and the fact that many of those OSS licenses do NOT require the release of source code, there's every possibiliy you could release your commercial code under the BSD code to yourself, and then give only the compiled code to customers and you'd technically be safe distributing MySQL with it. There are several companies in the MySQL biosphere that are releasing their own forks of MySQL with lots of bug fixes, and I have no doubt that MySQL the GPL database will continue to be available as a GPL product for quite some time. However, the availability of commercially licensed MySQL may or may not continue based on the business needs of Oracle. For in house use only, this is a non-issue, as the GPL only affects distribution of MySQL, not internal usage. The biggest argument I'd use against MySQL in general is that PostgreSQL is a better database for any time your data and its integrity are important.
One concern I have about these trigger based replication systems is that I fear it may ping the slave for each and every DML statement separately in time and in a transaction. My slave will literally be 1400 miles away and all replication communications will be over the net. If I have a transaction which has 1000 DML statements in it, is this thing going to update the slave 1000 times separately over the net ? (I may not live long enough to see it finish) Or will it be smart enough to wait until I "commit" then send over a single bundle of 1000 DML? The time diff will be more than significant.
Thanks for all the great input on this!
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Greg Sabino Mullane
Sent: Thursday, December 17, 2009 11:58 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Justifying a PG over MySQL approach to a project
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Greg Sabino Mullane
Sent: Thursday, December 17, 2009 11:58 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Justifying a PG over MySQL approach to a project
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
> How difficult is it to switch the master's hat from one DB instance
> to another? Let's say the master in a master-slave scenario goes
> down but the slave is fine. Can I designate the slave as being the
> new master, use it for read/write, and then just call the broken
> master the new slave once it comes back to life (something like that)?
Sure. Bucardo slaves are not changed at all, so they are already
read/write and don't need anything special done to "unslave" them.
One possible way to handle the scenario is:
Assuming three servers:
* A (master) sends changes to B, receives read/write queries
* B (slave) has transaction_read_only set to true, receives read queries
* C has the Bucardo database and daemon
Box A goes down suddenly.
* Stop Bucardo on box C
* Flip the boxes around in the bucardo.db table
* Do a 'bucardo_ctl validate sync all'
(This will create the needed triggers on B)
* Set B's transaction_read_only to false
* Point your apps at B instead of A for read/write queries
When A comes back up:
* DROP SCHEMA bucardo CASCADE; (drops all triggers)
* Set transaction_read_only to true
* Start Bucardo on C
* Once caught up, point read-only queries to A
If you are in a rush, you point things to B immediately after A fails,
but you'll have to recopy the entire table data to the slave, as the
triggers won't be in place yet.
- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 200912171153
-----BEGIN PGP SIGNATURE-----
iEYEAREDAAYFAksqYqUACgkQvJuQZxSWSsjZtQCfTwbI3f9W0z+82IU7lL+2LwNK
aUYAnj3AMjRDOeFIuHDee4JJemneArie
=75Ho
-----END PGP SIGNATURE-----
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
I would recommend using a traffic shaping router (like the one built into the linux kernel and controlled by tc / iptables) to simulate a long distance connection and testing this yourself to see which replication engine will work best for you. On Thu, Dec 17, 2009 at 7:35 PM, Gauthier, Dave <dave.gauthier@intel.com> wrote: > One concern I have about these trigger based replication systems is that I > fear it may ping the slave for each and every DML statement separately in > time and in a transaction. My slave will literally be 1400 miles away and > all replication communications will be over the net. If I have a > transaction which has 1000 DML statements in it, is this thing going to > update the slave 1000 times separately over the net ? (I may not live long > enough to see it finish) Or will it be smart enough to wait until I > "commit" then send over a single bundle of 1000 DML? The time diff will be > more than significant. > > Thanks for all the great input on this! > > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Greg Sabino Mullane > Sent: Thursday, December 17, 2009 11:58 AM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Justifying a PG over MySQL approach to a project > > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > > >> How difficult is it to switch the master's hat from one DB instance >> to another? Let's say the master in a master-slave scenario goes >> down but the slave is fine. Can I designate the slave as being the >> new master, use it for read/write, and then just call the broken >> master the new slave once it comes back to life (something like that)? > > Sure. Bucardo slaves are not changed at all, so they are already > read/write and don't need anything special done to "unslave" them. > > One possible way to handle the scenario is: > > Assuming three servers: > * A (master) sends changes to B, receives read/write queries > * B (slave) has transaction_read_only set to true, receives read queries > * C has the Bucardo database and daemon > > Box A goes down suddenly. > > * Stop Bucardo on box C > * Flip the boxes around in the bucardo.db table > * Do a 'bucardo_ctl validate sync all' > (This will create the needed triggers on B) > * Set B's transaction_read_only to false > * Point your apps at B instead of A for read/write queries > > When A comes back up: > > * DROP SCHEMA bucardo CASCADE; (drops all triggers) > * Set transaction_read_only to true > * Start Bucardo on C > * Once caught up, point read-only queries to A > > If you are in a rush, you point things to B immediately after A fails, > but you'll have to recopy the entire table data to the slave, as the > triggers won't be in place yet. > > - -- > Greg Sabino Mullane greg@turnstep.com > End Point Corporation http://www.endpoint.com/ > PGP Key: 0x14964AC8 200912171153 > http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 > -----BEGIN PGP SIGNATURE----- > > iEYEAREDAAYFAksqYqUACgkQvJuQZxSWSsjZtQCfTwbI3f9W0z+82IU7lL+2LwNK > aUYAnj3AMjRDOeFIuHDee4JJemneArie > =75Ho > -----END PGP SIGNATURE----- > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- When fascism comes to America, it will be intolerance sold as diversity.
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > One concern I have about these trigger based replication systems is > that I fear it may ping the slave for each and every DML statement > separately in time and in a transaction. My slave will literally be > 1400 miles away and all replication communications will be over the > net. If I have a transaction which has 1000 DML statements in it, is > this thing going to update the slave 1000 times separately over the > net ? (I may not live long enough to see it finish) Or will it be > smart enough to wait until I "commit" then send over a single bundle > of 1000 DML? The time diff will be more than significant. Both Slony and Bucardo are transaction based, meaning all 1000 changes will happen at once, and only when the transaction has committed. In practice, you'll find that both are surprisingly quick. With Bucardo, the number of changes may be much less than 1000, as it is data based, not statement based. Thus, as an extreme example, if the 1000 statements are all updating a single row, only a single update (actually a delete/insert) is done on the slave. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 200912172144 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAksq7RMACgkQvJuQZxSWSsgvNQCg0Og/Ilg7rO9UjBecCTVwf04x E1cAn05wdiJSh+VN3XpZrsFhNJYUTYjG =VLf6 -----END PGP SIGNATURE-----
Scott Marlowe wrote: > I would recommend using a traffic shaping router (like the one built > into the linux kernel and controlled by tc / iptables) to simulate a > long distance connection and testing this yourself to see which > replication engine will work best for you. > Netem : http://www.linuxfoundation.org/collaborate/workgroups/networking/netem We used this to make a test rig for Directory Server replication, to verify a re-design that added pipelining to the replication protocol. It's already in the modern Linuxes--just needs to be configured.
On Thu, Dec 17, 2009 at 7:51 PM, David Boreham <david_list@boreham.org> wrote: > Scott Marlowe wrote: >> >> I would recommend using a traffic shaping router (like the one built >> into the linux kernel and controlled by tc / iptables) to simulate a >> long distance connection and testing this yourself to see which >> replication engine will work best for you. >> > > Netem : > http://www.linuxfoundation.org/collaborate/workgroups/networking/netem > We used this to make a test rig for Directory Server replication, to verify > a > re-design that added pipelining to the replication protocol. > It's already in the modern Linuxes--just needs to be configured. Wow, everytime I turn around someone's built something cool from a set of small sharp tools. Thanks!
Lincoln Yeoh wrote: > It seems you currently can only control outbound traffic from an > interface, so you'd have to set stuff on both interfaces to "shape" > upstream and downstream - this is not so convenient in some network > topologies. This is more a property of the universe than the software ;) However, there are tricks that can be used with a virtual nic driver to give the effect of 'inbound' shaping in the case that you don't have control over the sending interface. In our project we deployed a dedicated shaping machine with a bunch of nics that connected to each test hosts. Then wrote scripts to setup the shaping and the test host routing to emulate the desired network characteristics.
At 11:28 AM 12/18/2009, Scott Marlowe wrote: >On Thu, Dec 17, 2009 at 7:51 PM, David Boreham <david_list@boreham.org> wrote: > > Scott Marlowe wrote: > >> > >> I would recommend using a traffic shaping router (like the one built > >> into the linux kernel and controlled by tc / iptables) to simulate a > >> long distance connection and testing this yourself to see which > >> replication engine will work best for you. > >> > > > > Netem : > > http://www.linuxfoundation.org/collaborate/workgroups/networking/netem > > We used this to make a test rig for Directory Server replication, to verify > > a > > re-design that added pipelining to the replication protocol. > > It's already in the modern Linuxes--just needs to be configured. > >Wow, everytime I turn around someone's built something cool from a set >of small sharp tools. Thanks! There's also a livecd with a WebUI to emulate WANs. I think it's basically a wrapper around tc/netem, but I find it convenient for quick and dirty tests. http://wanem.sourceforge.net/ It seems you currently can only control outbound traffic from an interface, so you'd have to set stuff on both interfaces to "shape" upstream and downstream - this is not so convenient in some network topologies. Regards, Link.
At 03:19 AM 12/19/2009, David Boreham wrote: >Lincoln Yeoh wrote: >>It seems you currently can only control outbound traffic from an >>interface, so you'd have to set stuff on both interfaces to "shape" >>upstream and downstream - this is not so convenient in some network topologies. >This is more a property of the universe than the software ;) Not really, there is no technical reason that makes it impossible to have an inbound queue instead of jumping through hoops and creating virtual interfaces (or even an entire host). Link.
At 05:44 AM 12/17/2009, Greg Smith wrote: >You've probably already found >http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007 >which was my long treatment of this topic (and overdue for an update). > >The main thing I intended to put into such an update when I get to >it is talking about the really deplorable bug handling situation for >MySQL, which is part of how all the data corruption issues show >up. There's a good overview of its general weirdness at >http://www.xaprb.com/blog/2007/08/12/what-would-make-me-buy-mysql-enterprise/ >and the following series of pages lead you through my favorite set of bugs: More so when Monty himself grumbles about the bug handling situation: http://monty-says.blogspot.com/2008/11/oops-we-did-it-again-mysql-51-released.html If people still insist on MySQL, you might want to get it in writing that it's someone else's decision to use MySQL and not yours ;). Ten or so years ago MySQL was better than Postgres95, and it would have been easy to justify using MySQL over Postgres95 (which was really slow and had a fair number of bugs). But Postgresql is much better than MySQL now. That's just my opinion of course. Link
On Sun, Dec 20, 2009 at 10:04 AM, Lincoln Yeoh <lyeoh@pop.jaring.my> wrote: > At 05:44 AM 12/17/2009, Greg Smith wrote: >> >> You've probably already found >> http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007 >> which was my long treatment of this topic (and overdue for an update). >> >> The main thing I intended to put into such an update when I get to it is >> talking about the really deplorable bug handling situation for MySQL, which >> is part of how all the data corruption issues show up. There's a good >> overview of its general weirdness at >> http://www.xaprb.com/blog/2007/08/12/what-would-make-me-buy-mysql-enterprise/ >> and the following series of pages lead you through my favorite set of bugs: > > More so when Monty himself grumbles about the bug handling situation: > > http://monty-says.blogspot.com/2008/11/oops-we-did-it-again-mysql-51-released.html > > If people still insist on MySQL, you might want to get it in writing that > it's someone else's decision to use MySQL and not yours ;). > > Ten or so years ago MySQL was better than Postgres95, and it would have been > easy to justify using MySQL over Postgres95 (which was really slow and had a > fair number of bugs). But Postgresql is much better than MySQL now. That's > just my opinion of course. I don't think anybody is going to dispute that here. IMO, Postgres is just completely in an another league on technical terms. From a business point of view, the BSD license is great but I can understand being nervous about availability and price of postgresql talent. In the long run though, you are much better off with one of us! merlin
Lincoln Yeoh wrote: > Ten or so years ago MySQL was better than Postgres95, and it would have > been easy to justify using MySQL over Postgres95 (which was really slow > and had a fair number of bugs). But Postgresql is much better than MySQL > now. That's just my opinion of course. Really?!? MySQL development started in '94; and their first internal release was May 95.[1] At that time Postgres's SQL language support was new, but didn't the underlying database already have a half decade of history that surely was more mature than MySQL at the time? I thought the main justification for MySQL back then is that they had better Win95 support (and a quality control philosophy that more matched the old pre-NT windows that favored time-to-market over correctness). [1] http://en.wikipedia.org/wiki/MySQL#cite_note-21
On Sun, Dec 20, 2009 at 3:19 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote: > Lincoln Yeoh wrote: >> Ten or so years ago MySQL was better than Postgres95, and it would have >> been easy to justify using MySQL over Postgres95 (which was really slow >> and had a fair number of bugs). But Postgresql is much better than MySQL >> now. That's just my opinion of course. > > Really?!? > > MySQL development started in '94; and their first internal release was May 95.[1] > > At that time Postgres's SQL language support was new, but didn't the underlying > database already have a half decade of history that surely was more mature > than MySQL at the time? For a long time, postgres had a lot of issues that made it less suitable for high web environments, especially 24x7 high load. vacuum was a nightmare, transaction overhead was very high, and the complex disk format made upgrades a real pain (only this last issue remains). The postgresql optimizer has always had an edge, but it wasn't so cut and dry back then. It was only with the 8.x versions that postgres really started pulling away. merlin
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Lincoln Yeoh wrote: >> Ten or so years ago MySQL was better than Postgres95, and it would have >> been easy to justify using MySQL over Postgres95 (which was really slow >> and had a fair number of bugs). But Postgresql is much better than MySQL >> now. That's just my opinion of course. > Really?!? > MySQL development started in '94; and their first internal release was May 95.[1] > At that time Postgres's SQL language support was new, but didn't the underlying > database already have a half decade of history that surely was more mature > than MySQL at the time? What the underlying database had was a decade or so of development and use for strictly academic purposes. This anecdote might help: http://archives.postgresql.org/pgsql-hackers/2002-06/msg00085.php It was not until the current community started working on it, circa 1997, that there was any real emphasis on making it stable enough for production use. And I would say that we didn't get to the point of being really production-worthy until 2001 or so, by which time the "Postgres sucks" meme was already pretty widely established. And so was MySQL. We've been playing catchup in the public-perception department ever since. regards, tom lane
The arguments against PG are not technical. The biggest advocate for MySQL is actually a very sharp engineer who admitsthat PG is a superior DB. But MySQL is more popular in the corp and has more formal recognition. So he's saying thatthe differences aren't "big enoug" to justify using PG. A statement from an uninterested third party stating that data in a PG DB is more secure, more integrit, more up-time thana MySQL implementation, now THAT would carry weight with the decision maker (who know nothing about DBs, but want thatdata secure, integrit and available). Not sure when the decision will be made. But I expect to get physically ill if I have to dismantle the PG implementationand replace it with MySQL. Dear Santa, All I want for Christmas is to be able to keep my DB. -----Original Message----- From: Lincoln Yeoh [mailto:lyeoh@pop.jaring.my] Sent: Sunday, December 20, 2009 10:05 AM To: Greg Smith; Gauthier, Dave Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Justifying a PG over MySQL approach to a project At 05:44 AM 12/17/2009, Greg Smith wrote: >You've probably already found >http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007 >which was my long treatment of this topic (and overdue for an update). > >The main thing I intended to put into such an update when I get to >it is talking about the really deplorable bug handling situation for >MySQL, which is part of how all the data corruption issues show >up. There's a good overview of its general weirdness at >http://www.xaprb.com/blog/2007/08/12/what-would-make-me-buy-mysql-enterprise/ >and the following series of pages lead you through my favorite set of bugs: More so when Monty himself grumbles about the bug handling situation: http://monty-says.blogspot.com/2008/11/oops-we-did-it-again-mysql-51-released.html If people still insist on MySQL, you might want to get it in writing that it's someone else's decision to use MySQL and not yours ;). Ten or so years ago MySQL was better than Postgres95, and it would have been easy to justify using MySQL over Postgres95 (which was really slow and had a fair number of bugs). But Postgresql is much better than MySQL now. That's just my opinion of course. Link
Gauthier, Dave wrote: > The arguments against PG are not technical. A few more points that I didn't see in this thread yet that might help answer the non-technical questions: * There seem to be more commercial vendors providing support for Postgres than MySQL - because most mysql support came from that one company. http://www.postgresql.org/support/professional_support * There are bigger companies supporting Postgres than mysql. And yes, that'll still be true even if Oracle supports MySQL. http://postgresql.fastware.com/ * There are a number of extremely scalable commercial solutions based on postgres forks (greenplum, enterprisedb, aster, whatever yahoo uses, etc). These run many of the largest databases in the world. If you expect your app to grow to that scale; it might make your migration easier. * There are specialty commercial companies that support specific postgres features very well - such as Refractions specialized http://www.refractions.net/ which provide great postgis support. * There are enough large companies that depend entirely on each of the databases that make either one a save choice from that point of view (Skype). And the way Apple and Cisco use it for a number of their programs (google "cisco postgresql" or "apple final cut postgreesql" for links) are other nice datapoints of companies most managers would have heard of. > Dear Santa, All I want for Christmas is to be able to keep my DB.
Ron Mayer wrote: > * There are enough large companies that depend entirely > on each of the databases that make either one a save > choice from that point of view (Skype). And the way > Apple and Cisco use it for a number of their programs > Yeah, these are all good examples. Cisco uses PostgreSQL in a number of products: Carrier-Sensitive Routing: http://www.cisco.com/en/US/products/sw/voicesw/ps4371/products_user_guide_chapter09186a00800c252c.html Fabric Manager: http://www.cisco.com/en/.../product_data_sheet09186a00800c4656.pdf That have non-trivial uptime requirements. "Call routing" is not a field particularly tolerant of the "my database got corrupted and went down" kind of errors. You'll similarly find PostgreSQL used inside Japan's Nippon Telegraph and Telephone Corporation (NTT) too, enough so that they're doing major development to improve it (they're sponsoring the "Streaming Replication" feature targeted for 8.5). When the telcos and providers of telco equipment like Skype, Cisco, and NTT are all using PostgreSQL, it certainly makes it easy to support the idea that the database is reliable in the real world. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Merlin Moncure wrote: > It was only with the 8.x versions that postgres > really started pulling away. > Today I was re-reading a great reminder of just how true this is: http://suckit.blog.hu/2009/09/29/postgresql_history From the baseline provided by 8.0, PostgreSQL increased in speed by about 8X on both read and writes sides between 8.1 and 8.3. Since 8.1 came out in late 2005, it's no wonder the "PostgreSQL is slow" meme got so deep into people's memories--until only four years ago, it was still actually true. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
> "Gauthier, Dave" <dave.gauthier@intel.com> : > Again, the audience is managers. Is there an > impartial, 3rd party evaluation of the 2 DBs out there that > identifies PG as being more reliable? It might mention things like > fewer incidences of corrupt tables/indexes, fewer deamon crashes, > better recovery after system crashes, etc... ? I dont know if it's a plus or a minus, but: http://www.google.com/search?&q=save+mysql+oracle -- Architecte Informatique chez Blueline/Gulfsat: Administration Systeme, Recherche & Developpement +261 34 29 155 34 / +261 33 11 207 36
> I dont know if it's a plus or a minus, but: Well, the fact that Monty secretly tried to persuade the EC toward forcing Oracle to release MySQL under a license other than the GPL, while lying & denying that in public, really shouldn't be considered a plus for MySQL, I would think ;-) Seriously, founder & current owner engaged in political intrigue over licensing? Try running that by a risk-averse manager! -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
On Dec 21, 2009, at 8:44 AM, Scott Ribe wrote: > Well, the fact that Monty secretly tried to persuade the EC toward forcing > Oracle to release MySQL under a license other than the GPL, while lying & > denying that in public, really shouldn't be considered a plus for MySQL, I > would think ;-) > > Seriously, founder & current owner engaged in political intrigue over > licensing? Try running that by a risk-averse manager! I was literally just discussing this situation with our other DBA at work. Monty, who *chose* the GPL for the open sourceend of their inane dual-licensing scheme, and Stallman, who *wrote* the damn thing, are trying to get the EU to forceOracle to change the open source end of the license to something more permissive, arguing that the viral nature of theGPL will force companies that release proprietary products that use MySQL to buy commercial licenses from Oracle whichis exacly why they (MySQL AB) set things up in that manner when Monty owned the company. It seems to me that Monty& co. wanted to have their cake and eat it too and are now crying because Oracle wants to buy the recipe. If you're company releases any kind of proprietary software then that situation alone sounds like a good business reasonto me not to be looking at MySQL right now. Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k
On Mon, Dec 21, 2009 at 10:55 AM, Erik Jones <ejones@engineyard.com> wrote: > > On Dec 21, 2009, at 8:44 AM, Scott Ribe wrote: > >> Well, the fact that Monty secretly tried to persuade the EC toward forcing >> Oracle to release MySQL under a license other than the GPL, while lying & >> denying that in public, really shouldn't be considered a plus for MySQL, I >> would think ;-) >> >> Seriously, founder & current owner engaged in political intrigue over >> licensing? Try running that by a risk-averse manager! > > I was literally just discussing this situation with our other DBA at work. Monty, who *chose* the GPL for the open sourceend of their inane dual-licensing scheme, and Stallman, who *wrote* the damn thing, are trying to get the EU to forceOracle to change the open source end of the license to something more permissive, arguing that the viral nature of theGPL will force companies that release proprietary products that use MySQL to buy commercial licenses from Oracle whichis exacly why they (MySQL AB) set things up in that manner when Monty owned the company. It seems to me that Monty& co. wanted to have their cake and eat it too and are now crying because Oracle wants to buy the recipe. > > If you're company releases any kind of proprietary software then that situation alone sounds like a good business reasonto me not to be looking at MySQL right now. Even more impressive is that if you go to Monty's blog and post that statement, he will moderate it away and no one will see it there. He refuses to answer those questions are discuss the fact that he has secretly requested that Oracle be forced to change the license on mysql if they buy it. He puts forth his points like he wants a public dialog, but he most certainly does not.
I was wondering... In head-to-head comparisons, do DBs get stree tested, not only in terms of performance, but in terms of corruptions, downtime, recovery time, lost data, etc... .? I've heard it said that MySQL is superior to MySQL in this regard. But if this were stated in an article from a 3rd partyor something, THAT would make an impact on mgmt. On a different note... Other projects in the group use MySQL. In the past, the risky decision was made to put the highly volitile tables of theseother users in the same MySQL database as the project critical data tables (the ones I just rewrote into PG). Of course,I moved the critical tables out to a separate DB and let them play in a sandbox (MySQL, PG, SQLite,... whatever theywant) But one criticism I expect to have to deal with has to do with their apps no longer being able to simply join intothe project critical tables for queries anymore (they'l no longer be in the sane DB, and in fact, on PG as opposed toMySQL). They have ways to cope with this. Since they all code in perl/DBI, they could simpy open handles to the 2 DBs. Or there'san op sys level app out there that they could use to get the same data. But I was thinking of something that wouldretain their ability to "join" into that data. At first, I was thinking that I'd just write a stored procedure in MySQL that runs a perl script that opens the PG DB, getsmetadata for the view that has the data they want, plus the data in the view, then create a temp table in MySQL which= the view and load it up with data. But I just found out yesterday that MySQL does not support stored procedures writtenin other languages (like PG supports perl, tcl, python...). So my fallback is to write the perl script in a perlmodule where it accepts as input a DBI handle to the MySQL DB. With the MySQL DB handle in hand, the perl script wouldthen attach to the PG DB, get the metadata for the view, create the view in MySQL as a temp table, then populate it. This view has <5,000 records in it, so it's realistic to do in real time. The one thing I lose with this is scriptinglanguage independence (it's a sub in a perl module, so you can only use it from perl). If anyone can brainstorm a better solution to this I'm all ears (eyes, whatever...)? -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Erik Jones Sent: Monday, December 21, 2009 12:56 PM To: pgsql-general@postgresql.org List Subject: Re: [GENERAL] Justifying a PG over MySQL approach to a project On Dec 21, 2009, at 8:44 AM, Scott Ribe wrote: > Well, the fact that Monty secretly tried to persuade the EC toward forcing > Oracle to release MySQL under a license other than the GPL, while lying & > denying that in public, really shouldn't be considered a plus for MySQL, I > would think ;-) > > Seriously, founder & current owner engaged in political intrigue over > licensing? Try running that by a risk-averse manager! I was literally just discussing this situation with our other DBA at work. Monty, who *chose* the GPL for the open sourceend of their inane dual-licensing scheme, and Stallman, who *wrote* the damn thing, are trying to get the EU to forceOracle to change the open source end of the license to something more permissive, arguing that the viral nature of theGPL will force companies that release proprietary products that use MySQL to buy commercial licenses from Oracle whichis exacly why they (MySQL AB) set things up in that manner when Monty owned the company. It seems to me that Monty& co. wanted to have their cake and eat it too and are now crying because Oracle wants to buy the recipe. If you're company releases any kind of proprietary software then that situation alone sounds like a good business reasonto me not to be looking at MySQL right now. Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Mon, Dec 21, 2009 at 12:23 PM, Gauthier, Dave <dave.gauthier@intel.com> wrote: > They have ways to cope with this. Since they all code in perl/DBI, they could simpy open handles to the 2 DBs. Or there'san op sys level app out there that they could use to get the same data. But I was thinking of something that wouldretain their ability to "join" into that data. You can accomplish some stuff like that with dbilink. Not sure if it's what you want or not.
On 12/21/09 11:23 AM PST, "Gauthier, Dave" <dave.gauthier@intel.com> wrote: > I was wondering... > > In head-to-head comparisons, do DBs get stree tested, not only in terms of > performance, but in terms of corruptions, down time, recovery time, lost data, > etc... .? > > I've heard it said that MySQL is superior to MySQL in this regard. But if > this were stated in an article from a 3rd party or something, THAT would make > an impact on mgmt. > You are not going to find this kind of comparison (or at least a valid one) for the same reason you would not find an off-the-shelf answer on performance. It really depends on what your application does and how it interacts with the database. Like performance testing, there's really no substitute for trying it out yourself. Some of the things you might try include crashing the database and seeing if all the data come back, plus how easily can you run backups, how easily can you upgrade, how hard it is to set up HA (MySQL master/master handy for upgrades and to use as read-only slave, PostgreSQL warm standby far better for full server replicas). Hint: Putting a write load on MySQL with MyISAM table type and then crashing the database is a really quick way to make MySQL look bad, as you'll soon corrupt the tables and will typically get app failures until you run myisamchk to repair them. Personally I like MySQL for a lot of purposes but this one really drive me nuts. It's so easy to demonstrate it feels like cheating. Cheers, Robert Hodges