Thread: Help needed with PostgreSQL clustering/switching from MySQL

Help needed with PostgreSQL clustering/switching from MySQL

From

Vikram Vaswani

Date:

21 June 2011, 02:13:51

Hello

I'm new to PostgreSQL, coming at it from a MySQL background. I'm currently looking at switching one of our applications (which currently uses MySQL) over to PostgreSQL and had some questions.

We're considering the switch because of issues we have faced when using MySQL in a clustered scenario and we're hoping that switching to PostgreSQL will help us resolve these issues. Our three biggest pain points with MySQL are:

1. MySQL's NDB engine (used for clustering) cannot index textual data stored in a BLOB field
2. When configuring a MySQL cluster, there is a memory limit on the number of objects (tables and fields). We often have problems when importing new tables, wherein we need to increase the memory limit for the server's NDB engine before it allows us to import. This can be a problem for dynamically-generated tables, as we cannot accurately forecast the number of database objects in advance in these cases.
3. MySQL's NDB engine doesn't support or enforce foreign keys.

So my first question is, I'd like to know if PostgreSQL has similar issues when running in a clustered scenario.

Second, on reviewing the manual and some sites, it seems that there are a number of different OSS solutions for implementing failover and clustering with PostgreSQL, but no "official" version. Is this understanding correct? If yes, which solution is best suited for running PostgreSQL in a private cloud, with clustering/failover support?

Thank you,

Vikram

THIS EMAIL & ANY ATTACHED FILES ARE PRIVATE & CONFIDENTIAL If you are not the addressee, any disclosure, reproduction, copying, distribution, or any other dissemination or use of this communication is strictly prohibited. If you have received this transmission in error please notify the sender immediately and then delete this email. Email transmission cannot be guaranteed to be secure or error free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of email transmission. If verification is required please request a hard copy version.

Re: Help needed with PostgreSQL clustering/switching from MySQL

From

David Fetter

Date:

21 June 2011, 02:25:20

On Tue, Jun 21, 2011 at 05:07:10AM +0000, Vikram Vaswani wrote:
>
> Hello
>
> I'm new to PostgreSQL, coming at it from a MySQL background. I'm
> currently looking at switching one of our applications (which
> currently uses MySQL) over to PostgreSQL and had some questions.
>
> We're considering the switch because of issues we have faced when
> using MySQL in a clustered scenario and we're hoping that switching
> to PostgreSQL will help us resolve these issues. Our three biggest
> pain points with MySQL are:
>
> 1. MySQL's NDB engine (used for clustering) cannot index textual
> data stored in a BLOB field

PostgreSQL's full text capability is quite good.  There are better
specialty (non-relational) engines out there, and some proprietary
engines that do more (or at least different) things, so you'll need to
assess carefully what type of text searching you want to do, and what
you'll trade that capability for.

> 2. When configuring a MySQL cluster, there is a memory limit on the
> number of objects (tables and fields). We often have problems when
> importing new tables, wherein we need to increase the memory limit
> for the server's NDB engine before it allows us to import. This can
> be a problem for dynamically-generated tables, as we cannot
> accurately forecast the number of database objects in advance in
> these cases.

Dynamically generated tables are generally a problem at the design
level.  Neither PostgreSQL nor any other engine will solve that.

> 3. MySQL's NDB engine doesn't support or enforce foreign keys.

This, PostgreSQL does extremely well.

> So my first question is, I'd like to know if PostgreSQL has similar
> issues when running in a clustered scenario.

I'm not entirely sure what you mean by a "clustered scenario," but I'd
like to digress into the matter of multi-master replication.  It can
be fast, so long as it doesn't have to be correct, or it can be
correct, so long as your users are willing to wait, but it can't be
both fast and correct at once.

In 90%+ of cases, it's neither fast nor correct.

I guess what I'm saying here is that you should not design systems
that depend on unicorn steak, skyhooks, magic pixie dust, or fast,
accurate multi-master replication.

> Second, on reviewing the manual and some sites, it seems that there
> are a number of different OSS solutions for implementing failover
> and clustering with PostgreSQL, but no "official" version.

As of 9.0, there is built-in asynchronous replication, which can be
streamed (lower lag times) if you like.  You might also want to
consider some of the other solutions.

http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling

> Is this understanding correct? If yes, which solution is best suited
> for running PostgreSQL in a private cloud, with clustering/failover
> support?

Please to understand that you need to set priorities for these things
and decide which you might sacrifice in order to get the others.

> Thank you,
>
> Vikram
> THIS EMAIL & ANY ATTACHED FILES ARE PRIVATE & CONFIDENTIAL If you are not the addressee, any disclosure,
reproduction,copying, distribution, or any other dissemination or use of this communication is strictly prohibited. If
youhave received this transmission in error please notify the sender immediately and then delete this email. Email
transmissioncannot be guaranteed to be secure or error free as information could be intercepted, corrupted, lost,
destroyed,arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors
oromissions in the contents of this message which arise as a result of email transmission. If verification is required
pleaserequest a hard copy version. 

In future, please to elide these "disclaimers."  The serve no legal or
practical purpose, but they do give people they annoy a convenient
excuse not to reply.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Help needed with PostgreSQL clustering/switching from MySQL

From

Vick Khera

Date:

21 June 2011, 11:01:07

On Tue, Jun 21, 2011 at 1:07 AM, Vikram Vaswani
<vikram.vaswani@loudcloudsystems.com> wrote:
> So my first question is, I'd like to know if PostgreSQL has similar issues
> when running in a clustered scenario.
>

Postgres has nothing quite like the MySQL cluster mode with NDB.  You
will have to re-think your solution if you want to use postgres to
distribute your queries and data across multiple servers.

To decide what is the best solution for replication in postgres you
need to define exactly what you mean by "clustering" and "failover",
and what your acceptable downtime is, etc.

Re: Help needed with PostgreSQL clustering/switching from MySQL

From

Craig Ringer

Date:

21 June 2011, 21:59:12

On 06/21/2011 01:25 PM, David Fetter wrote:

> Dynamically generated tables are generally a problem at the design
> level.  Neither PostgreSQL nor any other engine will solve that.

It depends a bit on what the OP means by dynamically generated tables.

> I'm not entirely sure what you mean by a "clustered scenario,"

... which is really important.

You need to define your needs better. Why are you clustering? What
problems is your cluster intended to solve? Is it for performance?
Redundancy? Geographic distribution of replicas? Can you afford to lose
a certain limited number/time of transactions if the master fails, or
must absolutely every transaction be replicated to standby nodes before
a COMMIT returns success to the application? Do query results from
standby nodes have to be perfectly consistent with the master, or can
they lag behind? Do you need to be able to write changes to multiple
nodes, or is it OK to have a single master node for writes and multiple
nodes for read-only queries?

BTW, I cannot agree enough with David Fetter's comments on multiple
master replication - so long as it's qualified with "in relational
databases". Multiple master works extremely well when used with systems
designed for it like eventually-consistent distributed document
databases. It works relatively poorly for SQL RDBMSs because there's so
much inter-node synchronization and co-ordination required to keep
everything consistent and correct according to the strict rules of the
SQL standards.

--
Craig Ringer

Re: Help needed with PostgreSQL clustering/switching from MySQL

From

Greg Smith

Date:

21 June 2011, 23:01:00

On 06/21/2011 10:00 AM, Vick Khera wrote:
> Postgres has nothing quite like the MySQL cluster mode with NDB. You
> will have to re-think your solution if you want to use postgres to
> distribute your queries and data across multiple servers.
>

The closest thing to a NDB cluster in PostgreSQL is using PL/Proxy to
split data across multiple nodes.  Both use similar hash-key methods to
distribute things across more than one system, and you can run queries
that return a combined set of results bigger than any single node could
have handled.  But even that's extremely different from NDB as far as
what the interface for executing queries is like.

Vick is absolutely right here:  asking about whether PostgreSQL solves
the very specific problems that MySQL NDB has isn't even the right
question.  The two don't compare directly at all; different replication
approach, different node distribution approach, different query
approach.  You need to return to basics instead:  what is the actual
business and/or technology need that has to be solved?  From that there
may be a PostgreSQL solution that makes sense, using its replication and
query distribution mechanisms.  But it's extremely unlikely that will
look like a NDB cluster at all, and therefore very unlikely to have the
same problems at all.  You'll get a whole new mystery set instead!

One of the most common mistakes I see people make when architecting
database systems is assuming they have to use one of these really
complicated sharded approaches to make their server perform well.
Unless you have a massive database or extremely high write volume, it's
way more trouble than it's worth to go through distributing writes onto
multiple nodes.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Help needed with PostgreSQL clustering/switching from MySQL

From

Craig Ringer

Date:

22 June 2011, 00:22:59

On 22/06/11 10:00, Greg Smith wrote:
> On 06/21/2011 10:00 AM, Vick Khera wrote:
>> Postgres has nothing quite like the MySQL cluster mode with NDB. You
>> will have to re-think your solution if you want to use postgres to
>> distribute your queries and data across multiple servers.
>>
>
> The closest thing to a NDB cluster in PostgreSQL is using PL/Proxy to
> split data across multiple nodes.  Both use similar hash-key methods to
> distribute things across more than one system, and you can run queries
> that return a combined set of results bigger than any single node could
> have handled.

Doesn't PL/Proxy, like NDB, offer no referential integrity enforcement
between nodes?

Enforcing referential integrity in a multi-master cluster is one of the
things that adds synchronization overhead and makes multi-master
clustered systems slower. In extremely rough terms, before a node can do
certain operations it has to ask one or more other nodes "can I do
this?" and wait for them to reply (possibly waiting for their disk I/O
or for other tasks to finish). Each node then has to say "I've done
this" when it's ready to commit the change and make it visible across
the cluster, and other nodes have to handle that change. For example, if
you want to delete a tuple that's in a table targeted by a referential
integrity constraint you have to ask all the other nodes whether they
have any tuples that reference yours before you can go ahead and delete
it. As you can imagine, this gets slow.

There are various caching and optimistic execution tricks used to help
with these issues in some clustered RDBMSes, but they're all limited to
various degrees. The more effective tricks come with application-visible
downsides: for example, they can allow transactions to abort at commit
time because a concurrent modification occurred on another cluster node.

--
Craig Ringer