Re: Built-in Raft replication - Mailing list pgsql-hackers
From | Andrey Borodin |
---|---|
Subject | Re: Built-in Raft replication |
Date | |
Msg-id | 212D5973-FDD0-4CF5-BCD0-2760EC319DF3@yandex-team.ru Whole thread Raw |
In response to | Re: Built-in Raft replication (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: Built-in Raft replication
|
List | pgsql-hackers |
> On 16 Apr 2025, at 09:33, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > In my experience, the load of managing hundreds of replicas which all > participate in RAFT protocol becomes more than regular transaction > load. So making every replica a RAFT participant will affect the > ability to deploy hundreds of replica. No need to make all standbys voting. And no need to make plain topology. pg_consul is using 2/3 or 3/5 HA groups, and cascadesall others from HA group. Existing tools already solve the original problem, Konstantin is just proposing to solve it in some standard “official” way. > We may build an extension which > has a similar role in PostgreSQL world as zookeeper in Hadoop. Patroni, pg_consul and others already use zookeeper, etcd and similar systems for consensus. Is it any better as extension than as etcd? > It can > be then used for other distributed systems as well - like shared > nothing clusters based on FDW. I didn’t get FDW analogy. Why other distributed systems should choose Postgres extension over Zookeeper? > There's already a proposal to bring > CREATE SERVER to the world of logical replication - so I see these two > worlds uniting in future. Again, I’m lost here. Which two worlds? > The way I imagine it is some PostgreSQL > instances, which have this extension installed, will act as a RAFT > cluster (similar to Zookeeper ensemble or etcd cluster). That’s exactly what is proposed here. > The > distributed system based on logical replication or FDW or both will > use this ensemble to manage its shared state. The same ensemble can be > shared across multiple distributed clusters if it has scaling > capabilities. Yes, shared DCS are common these days. AFAIK, we use one Zookeeper instance per hundred Postgres clusters to coordinate pg_consuls. Actually, scalability is opposite to topic of this thread. Let me explain. Currently, Postgres automatic failover tools rely on databases with built-in automatic failover. Konstantin is proposingto shorten this loop and make Postgres use its build-in automatic failover. So, existing tooling allows you to have 3 hosts for DCS, with majority of 2 hosts able to elect new leader in case of failover. And you can have only 2 hosts for Postgres - Primary and Standby. You can have 2 big Postgres machines with 64 CPUs. And3 one-CPU hosts for Zookeper\etcd. If you use build-in failover you have to resort to 3 big Postgres machines because you need 2/3 majority. Of course, youcan install MySQL-stype arbiter - host that had no real PGDATA, only participates in voting. But this is a solution toproblem induced by built-in autofailover. Best regards, Andrey Borodin.
pgsql-hackers by date: