Thread: Real application clustering in postgres.
Hi team,
Is there any possibility/options to setup a real application clustering in Postgres as in Oracle we have a RAC feature.
What about multi-master replication in Postgres. would you please suggest how it is useful and how can setup it.
Thanks.
On Thu, 2020-03-05 at 07:45 +0000, Daulat Ram wrote: > Is there any possibility/options to setup a real application clustering in Postgres as in Oracle we have a RAC feature. No, and as far as I know nobody feels interested in providing it. RAC is a complicated architecture that doesn't do much good, so most people feel that it would be a waste of time and effort. RAC ist not really a scaling solution: because of the shared storage, you can only scale for more CPUs; I/O remains the bottleneck. RAC is not really a high availability solution: because of the shared storage, it has a sibgle point of failure. Today, people use shared-nothing architectures for high avaliability, like Patroni. > What about multi-master replication in Postgres. would you please suggest how it is useful and how can setup it. There is no support for that in core PostgreSQL. There is a closed-source implementation that you can buy: https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndquadrant/ But multi-master replication is complicated to get right, and an applicatoin that uses it has to be specifically designed for that. Very often a single-master failover solution is a better solution. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Thanks for your inputs Laurenz Albe. Would you please explain single-master failover solution. Suppose we have promoted standby (replica) as master after the h/w issue at Master. If after few hours we recovered the h/w then how we can switchback on the old primary. . As in Oracle we have switchover method for Dataguard. How we can do in Postgres. Thanks, -----Original Message----- From: Laurenz Albe <laurenz.albe@cybertec.at> Sent: Thursday, March 5, 2020 5:37 PM To: Daulat Ram <Daulat.Ram@exponential.com>; pgsql-general@lists.postgresql.org Subject: Re: Real application clustering in postgres. On Thu, 2020-03-05 at 07:45 +0000, Daulat Ram wrote: > Is there any possibility/options to setup a real application clustering in Postgres as in Oracle we have a RAC feature. No, and as far as I know nobody feels interested in providing it. RAC is a complicated architecture that doesn't do much good, so most people feel that it would be a waste of time and effort. RAC ist not really a scaling solution: because of the shared storage, you can only scale for more CPUs; I/O remains the bottleneck. RAC is not really a high availability solution: because of the shared storage, it has a sibgle point of failure. Today, people use shared-nothing architectures for high avaliability, like Patroni. > What about multi-master replication in Postgres. would you please suggest how it is useful and how can setup it. There is no support for that in core PostgreSQL. There is a closed-source implementation that you can buy: https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndquadrant/ But multi-master replication is complicated to get right, and an applicatoin that uses it has to be specifically designedfor that. Very often a single-master failover solution is a better solution. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Failover is easy but failback is little bit tricky.
I have implemented failback by doing following steps:
1. Start original primary which will be doing crash recovery. It should be designed in such a way that once it is up application should not start connecting to it otherwise there will be split brain and data-mistach between two instances. I implemented it by using a virtual IP mounting on server which is actual primary using keepalived.
2. Shutdown original primary and do a pg_rewind to make that as slave for new primary.
3. Once slave (original primary) is caught up with primary do failback
4. Repeat steps #1-#3 to make failed over instance slave again.
Regards,
Virendra
On Thursday, March 5, 2020, 8:48:54 AM PST, Daulat Ram <daulat.ram@exponential.com> wrote:
Thanks for your inputs Laurenz Albe.
Would you please explain single-master failover solution.
Suppose we have promoted standby (replica) as master after the h/w issue at Master.
If after few hours we recovered the h/w then how we can switchback on the old primary. .
As in Oracle we have switchover method for Dataguard. How we can do in Postgres.
Thanks,
-----Original Message-----
From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: Thursday, March 5, 2020 5:37 PM
To: Daulat Ram <Daulat.Ram@exponential.com>; pgsql-general@lists.postgresql.org
Subject: Re: Real application clustering in postgres.
On Thu, 2020-03-05 at 07:45 +0000, Daulat Ram wrote:
> Is there any possibility/options to setup a real application clustering in Postgres as in Oracle we have a RAC feature.
No, and as far as I know nobody feels interested in providing it.
RAC is a complicated architecture that doesn't do much good, so most people feel that it would be a waste of time and effort.
RAC ist not really a scaling solution: because of the shared storage, you can only scale for more CPUs; I/O remains the bottleneck.
RAC is not really a high availability solution: because of the shared storage, it has a sibgle point of failure.
Today, people use shared-nothing architectures for high avaliability, like Patroni.
> What about multi-master replication in Postgres. would you please suggest how it is useful and how can setup it.
There is no support for that in core PostgreSQL.
There is a closed-source implementation that you can buy:
https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndquadrant/
But multi-master replication is complicated to get right, and an applicatoin that uses it has to be specifically designed for that.
Very often a single-master failover solution is a better solution.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Would you please explain single-master failover solution.
Suppose we have promoted standby (replica) as master after the h/w issue at Master.
If after few hours we recovered the h/w then how we can switchback on the old primary. .
As in Oracle we have switchover method for Dataguard. How we can do in Postgres.
Thanks,
-----Original Message-----
From: Laurenz Albe <laurenz.albe@cybertec.at>
Sent: Thursday, March 5, 2020 5:37 PM
To: Daulat Ram <Daulat.Ram@exponential.com>; pgsql-general@lists.postgresql.org
Subject: Re: Real application clustering in postgres.
On Thu, 2020-03-05 at 07:45 +0000, Daulat Ram wrote:
> Is there any possibility/options to setup a real application clustering in Postgres as in Oracle we have a RAC feature.
No, and as far as I know nobody feels interested in providing it.
RAC is a complicated architecture that doesn't do much good, so most people feel that it would be a waste of time and effort.
RAC ist not really a scaling solution: because of the shared storage, you can only scale for more CPUs; I/O remains the bottleneck.
RAC is not really a high availability solution: because of the shared storage, it has a sibgle point of failure.
Today, people use shared-nothing architectures for high avaliability, like Patroni.
> What about multi-master replication in Postgres. would you please suggest how it is useful and how can setup it.
There is no support for that in core PostgreSQL.
There is a closed-source implementation that you can buy:
https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndquadrant/
But multi-master replication is complicated to get right, and an applicatoin that uses it has to be specifically designed for that.
Very often a single-master failover solution is a better solution.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
On Thu, 2020-03-05 at 17:06 +0000, Virendra Kumar wrote: > Failover is easy but failback is little bit tricky. > I have implemented failback by doing following steps: > > 1. Start original primary which will be doing crash recovery. It should be designed in such a way that once it is up applicationshould not start connecting to it otherwise there will be split brain > and data-mistach between two instances. I implemented it by using a virtual IP mounting on server which is actual primaryusing keepalived. > 2. Shutdown original primary and do a pg_rewind to make that as slave for new primary. > 3. Once slave (original primary) is caught up with primary do failback > 4. Repeat steps #1-#3 to make failed over instance slave again. Or you use Patroni and simply say patronictl.py switchover --candidate db2 --scheduled '2020-03-06 12:00:00' mycluster Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On 3/5/20 6:07 AM, Laurenz Albe wrote:
This is utter nonsense. Dual redundant storage controllers connected to disks in RAID-10 configurations have been around for at least 25 years.
Oracle got it's clustering technology from DEC, and I know that works. Cluster members, storage controllers and disks have all gone down, while the database and application keep on humming along.
On Thu, 2020-03-05 at 07:45 +0000, Daulat Ram wrote:Is there any possibility/options to setup a real application clustering in Postgres as in Oracle we have a RAC feature.No, and as far as I know nobody feels interested in providing it. RAC is a complicated architecture that doesn't do much good, so most people feel that it would be a waste of time and effort. RAC ist not really a scaling solution: because of the shared storage, you can only scale for more CPUs; I/O remains the bottleneck.
RAC is not really a high availability solution: because of the shared storage, it has a sibgle point of failure.
This is utter nonsense. Dual redundant storage controllers connected to disks in RAID-10 configurations have been around for at least 25 years.
Oracle got it's clustering technology from DEC, and I know that works. Cluster members, storage controllers and disks have all gone down, while the database and application keep on humming along.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On Fri, 2020-03-06 at 03:25 -0600, Ron wrote: > > RAC is not really a high availability solution: because of the shared > > storage, it has a sibgle point of failure. > > This is utter nonsense. Dual redundant storage controllers > connected to disks in RAID-10 configurations have been around for at > least 25 years. > > Oracle got it's clustering technology from DEC, and I know > that works. Cluster members, storage controllers and disks have all > gone down, while the database and application keep on humming along. I am not saying that it is buggy, it is limited by design. If you have mirrored disks, and you write junk (e.g, because of a flaw in a fibre channel cable, something I have witnessed), then you have two perfectly fine copies of the junk. I am not saying the (physical) disk is the single point of failure, the (logical) file system is (Oracle calls it ASM / tablespace, but it is still a file system). Yours, Laurenz Albe
> > If you have mirrored disks, and you write junk (e.g, because of > a flaw in a fibre channel cable, something I have witnessed), > then you have two perfectly fine copies of the junk. > Few years ago didn't this happen to Salesforce where a firmware bug corrupted the Disk, resulting in corruption of Oracletablespace blocks and all RAC nodes were equally useless since all of them read from the same disk. Salesforce lost 4 hours worth of data.
On 3/6/20 8:55 AM, Laurenz Albe wrote: > On Fri, 2020-03-06 at 03:25 -0600, Ron wrote: >>> RAC is not really a high availability solution: because of the shared >>> storage, it has a sibgle point of failure. >> This is utter nonsense. Dual redundant storage controllers >> connected to disks in RAID-10 configurations have been around for at >> least 25 years. >> >> Oracle got it's clustering technology from DEC, and I know >> that works. Cluster members, storage controllers and disks have all >> gone down, while the database and application keep on humming along. > I am not saying that it is buggy, it is limited by design. > > If you have mirrored disks, and you write junk (e.g, because of > a flaw in a fibre channel cable, something I have witnessed), > then you have two perfectly fine copies of the junk. Why do you have just one FC path? > I am not saying the (physical) disk is the single point of failure, the > (logical) file system is (Oracle calls it ASM / tablespace, but it is > still a file system). Why isn't the filesystem (or RDBMS) throwing checksum errors? This was standard stuff in legacy Enterprise RDBMSs 20 years ago. -- Angular momentum makes the world go 'round.
Yup, if you need true shared storage, Oracle RAC is still the only solution out there, All the other multi-master solutions work by capturing the sql statements themselves. And properly configured it RAC is indeed part of an HA solution. Any time you have everything in a single data center, as you do with basic RAC, you do have a potential single point of failure however, a tornado for example can take out the entire data center. There are solutions like stretch RAC that can account for that issue, as well as data guard. Oracle also has a nice DR feature called Dataguard, which creates an exact duplicate of your running database..The nice point of oracle dataguard is that it is a block by block copy, while all of the Postgres Multi-Master and master-slave replication solutions work by SQL capture.
Potentially DRDB could be used to set up a standby database for postgres, though it is considerably more involved to create than Oracle data guard.
With the advent of virtualization and the concomitant ease of adding CPU and memory to a running database instance, the use cases for true Oracle RAC are much less than in the past, scalability is no longer nearly the issue it used to be, and with the speed of failover at the virtual guest level, the HA requirements are much less substantial than in the past. There are probably still some use cases where the down time required for patching is not acceptable, in which case RAC is still required. The same is true for Postgres or any other Multi-Master instances of course, unless the intent of clustering/multi-master is availability instead of scalability, there isnt much point considering the ease of adding resources to virtual servers.
On Fri, Mar 6, 2020 at 10:56 AM Ron <ronljohnsonjr@gmail.com> wrote:
On 3/6/20 8:55 AM, Laurenz Albe wrote:
> On Fri, 2020-03-06 at 03:25 -0600, Ron wrote:
>>> RAC is not really a high availability solution: because of the shared
>>> storage, it has a sibgle point of failure.
>> This is utter nonsense. Dual redundant storage controllers
>> connected to disks in RAID-10 configurations have been around for at
>> least 25 years.
>>
>> Oracle got it's clustering technology from DEC, and I know
>> that works. Cluster members, storage controllers and disks have all
>> gone down, while the database and application keep on humming along.
> I am not saying that it is buggy, it is limited by design.
>
> If you have mirrored disks, and you write junk (e.g, because of
> a flaw in a fibre channel cable, something I have witnessed),
> then you have two perfectly fine copies of the junk.
Why do you have just one FC path?
> I am not saying the (physical) disk is the single point of failure, the
> (logical) file system is (Oracle calls it ASM / tablespace, but it is
> still a file system).
Why isn't the filesystem (or RDBMS) throwing checksum errors? This was
standard stuff in legacy Enterprise RDBMSs 20 years ago.
--
Angular momentum makes the world go 'round.
--
Andrew W. Kerber
'If at first you dont succeed, dont take up skydiving.'
'If at first you dont succeed, dont take up skydiving.'
On 3/6/20 01:25, Ron wrote: > On 3/5/20 6:07 AM, Laurenz Albe wrote: >> On Thu, 2020-03-05 at 07:45 +0000, Daulat Ram wrote: >>> Is there any possibility/options to setup a real application clustering in Postgres as in Oracle we have a RAC feature. >> No, and as far as I know nobody feels interested in providing it. >> >> RAC is a complicated architecture that doesn't do much good, so most >> people feel that it would be a waste of time and effort. >> >> RAC ist not really a scaling solution: because of the shared storage, >> you can only scale for more CPUs; I/O remains the bottleneck. >> RAC is not really a high availability solution: because of the shared >> storage, it has a sibgle point of failure. > > This is utter nonsense. Dual redundant storage controllers connected to > disks in RAID-10 configurations have been around for *at least* 25 years. > > Oracle got it's clustering technology from DEC, and I *know* that > works. Cluster members, storage controllers and disks have all gone > down, while the database and application keep on humming along. Just want to point out that while the merits of RAC are considered as universal truth in the Oracle marketing community, they are certainly still debated and nuanced within the Oracle technical community. Some great reading would be Mogens Norgaard's article "You Probably Don't Need RAC" from 2003 (which is surprisingly hard to find on the internet) and much of the ensuing discussion between then and now, for example this recent message over on the oracle-l list: https://www.freelists.org/post/oracle-l/Chuckleworthy-issue-of-the-NoCOUG-Journal,1 For my part, I spent many years running RAC clusters and solving the usual handful of problems you bump into, and I definitely have a strong bias now toward as simple of architectures as possible. Regardless of which parties participate in your operations, and regardless of who owns the data center where your stack is running. Note that I apply this to all the new databases as well. I enjoy and appreciate the opportunities I've had to work on some really interesting new technology - but I also still see merit in simple, boring, mature, well-understood architectures if they are doing the job. Complexity will find us all soon enough driven by true business needs without being helped by the pursuit of shiny things! It seemed to me there was a major and well-funded effort to market and sell cluster databases for many years, and as a result I suspect that while there are certainly some good use cases, there are probably also some people using RAC today who would do fine (or better) without it. Seems I even touched on this all the way back in 2007 in an article on my own blog... https://ardentperf.com/2007/05/10/fidelity-case-study/ -Jeremy -- http://about.me/jeremy_schneider
On 2020-03-06 15:55:27 +0100, Laurenz Albe wrote: > On Fri, 2020-03-06 at 03:25 -0600, Ron wrote: > > > RAC is not really a high availability solution: because of the shared > > > storage, it has a sibgle point of failure. > > > > This is utter nonsense. Dual redundant storage controllers > > connected to disks in RAID-10 configurations have been around for at > > least 25 years. > > > > Oracle got it's clustering technology from DEC, and I know > > that works. Cluster members, storage controllers and disks have all > > gone down, while the database and application keep on humming along. > > I am not saying that it is buggy, it is limited by design. > > If you have mirrored disks, and you write junk (e.g, because of > a flaw in a fibre channel cable, something I have witnessed), > then you have two perfectly fine copies of the junk. I have certainly seen enterprise SAN boxes go down (or deliver corrupted data) because of controller or firmware problems or just because a second disk in a RAID-5 failed before the spare could be brought online. But to be fair, a master/slave setup a la patroni isn't immune against "writing junk" either: Not on the hardware level (either of the nodes may have faulty hardware, and you may not notice it until too late), and more importantly, not on the software level. An erroneus DML statement (because of a bug in the application, or because the user/admin made a mistake) will cause the same wrong data to be distributed to all nodes (of course this also applies to RAC). hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
## Andrew Kerber (andrew.kerber@gmail.com): > The nice point of oracle > dataguard is that it is a block by block copy, while all of the Postgres > Multi-Master and master-slave replication solutions work by SQL capture. https://www.postgresql.org/docs/12/warm-standby.html#STREAMING-REPLICATION I wouldn't exactly call our streaming replication "SQL capture". Regards, Christoph -- Spare Space
On Fri, 2020-03-06 at 10:56 -0600, Ron wrote: > > > > RAC is not really a high availability solution: because of the shared > > > > storage, it has a sibgle point of failure. > > > This is utter nonsense. Dual redundant storage controllers > > > connected to disks in RAID-10 configurations have been around for at > > > least 25 years. > > > > > > Oracle got it's clustering technology from DEC, and I know > > > that works. Cluster members, storage controllers and disks have all > > > gone down, while the database and application keep on humming along. > > > > I am not saying that it is buggy, it is limited by design. > > > > If you have mirrored disks, and you write junk (e.g, because of > > a flaw in a fibre channel cable, something I have witnessed), > > then you have two perfectly fine copies of the junk. > > Why do you have just one FC path? We didn't. It just happened that the cable that the data were sent over was buggy. > > I am not saying the (physical) disk is the single point of failure, the > > (logical) file system is (Oracle calls it ASM / tablespace, but it is > > still a file system). > > Why isn't the filesystem (or RDBMS) throwing checksum errors? This was > standard stuff in legacy Enterprise RDBMSs 20 years ago. Checksums are nice for telling you that your storage is screwed. They don't fix the problem. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Sun, 2020-03-08 at 21:13 +0100, Peter J. Holzer wrote: > But to be fair, a master/slave setup a la patroni isn't immune against > "writing junk" either: Not on the hardware level (either of the nodes > may have faulty hardware, and you may not notice it until too late), and > more importantly, not on the software level. An erroneus DML statement > (because of a bug in the application, or because the user/admin made a > mistake) will cause the same wrong data to be distributed to all nodes > (of course this also applies to RAC). Of course, nobody debates that. A high-availability solution only protects you from certain, well-defined kinds of problems, usually related to hardware. There is no way to protect yourself from software bugs or user errors. If there is a hardware problem that causes one of the databases in the Patroni cluster to become corrupted, the others are not immediately affected. That's the point of a shared-nothing architecture. Of course, if the corrupted database is the primary, corruption can eventually spread to the others. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Am 05.03.20 um 13:07 schrieb Laurenz Albe: > There is a closed-source implementation that you can buy: > https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndquadrant/ > > But multi-master replication is complicated to get right, and > an applicatoin that uses it has to be specifically designed for that. Depends. We have an automation tool for setup (TPAexec), and as long you are not using special feature like CAMO (Commit At Most Once) you can use applications without changes. (you have to set some parameters, for instance for sequences) It works well, we have a lot of happy customers. Some of them using it for globally distributed databases, other for high availability with 5 nines. Here you can find more information: https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndquadrant/ Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
On 2020-03-09 09:57:37 +0100, Laurenz Albe wrote: > On Sun, 2020-03-08 at 21:13 +0100, Peter J. Holzer wrote: > > But to be fair, a master/slave setup a la patroni isn't immune against > > "writing junk" either: Not on the hardware level (either of the nodes > > may have faulty hardware, and you may not notice it until too late), and > > more importantly, not on the software level. An erroneus DML statement > > (because of a bug in the application, or because the user/admin made a > > mistake) will cause the same wrong data to be distributed to all nodes > > (of course this also applies to RAC). > > Of course, nobody debates that. > > A high-availability solution only protects you from certain, well-defined > kinds of problems, usually related to hardware. Right. And enterprise class SAN storage does this: It protects you from failure of a single disk, a single cable, a single controller. Very often you can physically spread out the components so that loss of a whole rack (or server room) wouldn't affect availability. There are of course limits: When a message sent over a single cable is corrupted in a way that the checksum doesn't catch, corrupted data may be stored. But then if data in RAM is corrupted that ECC doesn't catch, the same will happen. A Patroni-based cluster isn't free of single points of failure either. So I don't buy the argument "X isn't a high availability solution because it uses shared storage". While I have seen expensive SAN boxes fail, I've also managed to wreck Patroni clusters. I'm not at all convinced that the availability of a Patroni cluster is higher than that of a failover cluster using shared storage. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"