Thread: Help needed with PostgreSQL clustering/switching from MySQL
I'm new to PostgreSQL, coming at it from a MySQL background. I'm currently looking at switching one of our applications (which currently uses MySQL) over to PostgreSQL and had some questions.
We're considering the switch because of issues we have faced when using MySQL in a clustered scenario and we're hoping that switching to PostgreSQL will help us resolve these issues. Our three biggest pain points with MySQL are:
1. MySQL's NDB engine (used for clustering) cannot index textual data stored in a BLOB field
2. When configuring a MySQL cluster, there is a memory limit on the number of objects (tables and fields). We often have problems when importing new tables, wherein we need to increase the memory limit for the server's NDB engine before it allows us to import. This can be a problem for dynamically-generated tables, as we cannot accurately forecast the number of database objects in advance in these cases.
3. MySQL's NDB engine doesn't support or enforce foreign keys.
So my first question is, I'd like to know if PostgreSQL has similar issues when running in a clustered scenario.
Second, on reviewing the manual and some sites, it seems that there are a number of different OSS solutions for implementing failover and clustering with PostgreSQL, but no "official" version. Is this understanding correct? If yes, which solution is best suited for running PostgreSQL in a private cloud, with clustering/failover support?
Thank you,
Vikram
On Tue, Jun 21, 2011 at 05:07:10AM +0000, Vikram Vaswani wrote: > > Hello > > I'm new to PostgreSQL, coming at it from a MySQL background. I'm > currently looking at switching one of our applications (which > currently uses MySQL) over to PostgreSQL and had some questions. > > We're considering the switch because of issues we have faced when > using MySQL in a clustered scenario and we're hoping that switching > to PostgreSQL will help us resolve these issues. Our three biggest > pain points with MySQL are: > > 1. MySQL's NDB engine (used for clustering) cannot index textual > data stored in a BLOB field PostgreSQL's full text capability is quite good. There are better specialty (non-relational) engines out there, and some proprietary engines that do more (or at least different) things, so you'll need to assess carefully what type of text searching you want to do, and what you'll trade that capability for. > 2. When configuring a MySQL cluster, there is a memory limit on the > number of objects (tables and fields). We often have problems when > importing new tables, wherein we need to increase the memory limit > for the server's NDB engine before it allows us to import. This can > be a problem for dynamically-generated tables, as we cannot > accurately forecast the number of database objects in advance in > these cases. Dynamically generated tables are generally a problem at the design level. Neither PostgreSQL nor any other engine will solve that. > 3. MySQL's NDB engine doesn't support or enforce foreign keys. This, PostgreSQL does extremely well. > So my first question is, I'd like to know if PostgreSQL has similar > issues when running in a clustered scenario. I'm not entirely sure what you mean by a "clustered scenario," but I'd like to digress into the matter of multi-master replication. It can be fast, so long as it doesn't have to be correct, or it can be correct, so long as your users are willing to wait, but it can't be both fast and correct at once. In 90%+ of cases, it's neither fast nor correct. I guess what I'm saying here is that you should not design systems that depend on unicorn steak, skyhooks, magic pixie dust, or fast, accurate multi-master replication. > Second, on reviewing the manual and some sites, it seems that there > are a number of different OSS solutions for implementing failover > and clustering with PostgreSQL, but no "official" version. As of 9.0, there is built-in asynchronous replication, which can be streamed (lower lag times) if you like. You might also want to consider some of the other solutions. http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling > Is this understanding correct? If yes, which solution is best suited > for running PostgreSQL in a private cloud, with clustering/failover > support? Please to understand that you need to set priorities for these things and decide which you might sacrifice in order to get the others. > Thank you, > > Vikram > THIS EMAIL & ANY ATTACHED FILES ARE PRIVATE & CONFIDENTIAL If you are not the addressee, any disclosure, reproduction,copying, distribution, or any other dissemination or use of this communication is strictly prohibited. If youhave received this transmission in error please notify the sender immediately and then delete this email. Email transmissioncannot be guaranteed to be secure or error free as information could be intercepted, corrupted, lost, destroyed,arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors oromissions in the contents of this message which arise as a result of email transmission. If verification is required pleaserequest a hard copy version. In future, please to elide these "disclaimers." The serve no legal or practical purpose, but they do give people they annoy a convenient excuse not to reply. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Jun 21, 2011 at 1:07 AM, Vikram Vaswani <vikram.vaswani@loudcloudsystems.com> wrote: > So my first question is, I'd like to know if PostgreSQL has similar issues > when running in a clustered scenario. > Postgres has nothing quite like the MySQL cluster mode with NDB. You will have to re-think your solution if you want to use postgres to distribute your queries and data across multiple servers. To decide what is the best solution for replication in postgres you need to define exactly what you mean by "clustering" and "failover", and what your acceptable downtime is, etc.
On 06/21/2011 01:25 PM, David Fetter wrote: > Dynamically generated tables are generally a problem at the design > level. Neither PostgreSQL nor any other engine will solve that. It depends a bit on what the OP means by dynamically generated tables. > I'm not entirely sure what you mean by a "clustered scenario," ... which is really important. You need to define your needs better. Why are you clustering? What problems is your cluster intended to solve? Is it for performance? Redundancy? Geographic distribution of replicas? Can you afford to lose a certain limited number/time of transactions if the master fails, or must absolutely every transaction be replicated to standby nodes before a COMMIT returns success to the application? Do query results from standby nodes have to be perfectly consistent with the master, or can they lag behind? Do you need to be able to write changes to multiple nodes, or is it OK to have a single master node for writes and multiple nodes for read-only queries? BTW, I cannot agree enough with David Fetter's comments on multiple master replication - so long as it's qualified with "in relational databases". Multiple master works extremely well when used with systems designed for it like eventually-consistent distributed document databases. It works relatively poorly for SQL RDBMSs because there's so much inter-node synchronization and co-ordination required to keep everything consistent and correct according to the strict rules of the SQL standards. -- Craig Ringer
On 06/21/2011 10:00 AM, Vick Khera wrote: > Postgres has nothing quite like the MySQL cluster mode with NDB. You > will have to re-think your solution if you want to use postgres to > distribute your queries and data across multiple servers. > The closest thing to a NDB cluster in PostgreSQL is using PL/Proxy to split data across multiple nodes. Both use similar hash-key methods to distribute things across more than one system, and you can run queries that return a combined set of results bigger than any single node could have handled. But even that's extremely different from NDB as far as what the interface for executing queries is like. Vick is absolutely right here: asking about whether PostgreSQL solves the very specific problems that MySQL NDB has isn't even the right question. The two don't compare directly at all; different replication approach, different node distribution approach, different query approach. You need to return to basics instead: what is the actual business and/or technology need that has to be solved? From that there may be a PostgreSQL solution that makes sense, using its replication and query distribution mechanisms. But it's extremely unlikely that will look like a NDB cluster at all, and therefore very unlikely to have the same problems at all. You'll get a whole new mystery set instead! One of the most common mistakes I see people make when architecting database systems is assuming they have to use one of these really complicated sharded approaches to make their server perform well. Unless you have a massive database or extremely high write volume, it's way more trouble than it's worth to go through distributing writes onto multiple nodes. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On 22/06/11 10:00, Greg Smith wrote: > On 06/21/2011 10:00 AM, Vick Khera wrote: >> Postgres has nothing quite like the MySQL cluster mode with NDB. You >> will have to re-think your solution if you want to use postgres to >> distribute your queries and data across multiple servers. >> > > The closest thing to a NDB cluster in PostgreSQL is using PL/Proxy to > split data across multiple nodes. Both use similar hash-key methods to > distribute things across more than one system, and you can run queries > that return a combined set of results bigger than any single node could > have handled. Doesn't PL/Proxy, like NDB, offer no referential integrity enforcement between nodes? Enforcing referential integrity in a multi-master cluster is one of the things that adds synchronization overhead and makes multi-master clustered systems slower. In extremely rough terms, before a node can do certain operations it has to ask one or more other nodes "can I do this?" and wait for them to reply (possibly waiting for their disk I/O or for other tasks to finish). Each node then has to say "I've done this" when it's ready to commit the change and make it visible across the cluster, and other nodes have to handle that change. For example, if you want to delete a tuple that's in a table targeted by a referential integrity constraint you have to ask all the other nodes whether they have any tuples that reference yours before you can go ahead and delete it. As you can imagine, this gets slow. There are various caching and optimistic execution tricks used to help with these issues in some clustered RDBMSes, but they're all limited to various degrees. The more effective tricks come with application-visible downsides: for example, they can allow transactions to abort at commit time because a concurrent modification occurred on another cluster node. -- Craig Ringer