Thread: [HACKERS] new high availability feature for the system with both asynchronousand synchronous replication
[HACKERS] new high availability feature for the system with both asynchronousand synchronous replication
From
"Higuchi, Daisuke"
Date:
Hi all, I propose a new feature for high availability. This configuration is effective for following configuration: 1. Primary and synchronous standby are in the same center; called main center. 2. Asynchronous standby is in the another center; called backup center. (The backup center is located far away from themain center. If replication mode is synchronous, performance will be deteriorated. So, this replication must be Asynchronous.) 3. Asynchronous replication is performed in the backup center too. 4. When primary in main center abnormally stops, standby in main center is promoted, and the standby in backup center connectsto the new primary. This configuration is also shown in the figure below. [Main center] |--------------------------------------------| | |----------| synchronous |----------| | | | | replication | | | | | primary | <--------------> | standby1 | | | |----------| |----------| | |----||--------------------------------------| || || asynchronous || replication || || [Backup center] |----||--------------------------------------| | |----------| asynchronous |----------| | | | | replication | | | | | standby2 | <--------------> | standby3 | | | |----------| |----------| | |--------------------------------------------| When the load in the main center becomes high, although WAL reaches standby in backup center, WAL may not reach synchronous standby in main center for various reasons. In other words, standby in the backup center may advance beyond synchronous standby in main center. When the primary abnormally stops and standby in main center promotes, two standbys in backup center must be recovered by pg_rewind. However, it is necessary to stop new primary for pg_rewind. If pg_basebackup is used, recovery of backup center takes some times. This is not high availability. [Proposal Concept] In this feature, just switch the connection destination and restart it. So, it is not necessary to stop new primary.There is no need for recovering by pg_rewind or pg_basebackup because standby in the backup center will not advance beyond the standby in the main center. In my idea, this feature is enabled when the new GDU parameter is set. In the case that synchronous standby and asynchronous standby are connected to primary, walsender check if WAL is sent to synchronous standby before sending WAL to the asynchronous standby. After walsender confirm WAL has been sent to synchronous standby, it also sends the WAL to the asynchronous standby. I would appreciate it if you give any comments for this feature. Regards, Daisuke Higuchi
[HACKERS] Re: new high availability feature for the system with bothasynchronous and synchronous replication
From
"Higuchi, Daisuke"
Date:
Hi all, I create POC patch for my proposal of new feature for high availability. I want to discuss about this feature. But this feature might be PG11 because discussion is not enough. This patch enables walsender for async to wait until walsender for sync confirm WAL is flashed to Disk. This feature is activated when GUC parameter "async_walsender_delay" is set on. I write the case when this feature is useful (this is the same as I wrote before): 1. Primary and synchronous standby are in the same center; called main center. 2. Asynchronous standby is in the another center; called backup center. (The backup center is located far away from the main center. If replication mode is synchronous, performance will be deteriorated. So, this replication must be Asynchronous. ) 3. Asynchronous replication is performed in the backup center too. 4. When primary in main center abnormally stops, standby in main center is promoted, and the standby in backup center connects to the new primary. [Main center] |--------------------------------------------| | |----------| synchronous |----------| | | | | replication | | | | | primary | <--------------> | standby1 | | | |----------| |----------| | |----||--------------------------------------| || || asynchronous || replication || || [Backup center] |----||--------------------------------------| | |----------| asynchronous |----------| | | | | replication | | | | | standby2 | <--------------> | standby3 | | | |----------| |----------| | |--------------------------------------------| When the load in the main center becomes high, although WAL reaches standby in backup center, WAL may not reach synchronous standby in main center for various reasons. In other words, standby in the backup center may advance beyond synchronous standby in main center. When the primary abnormally stops and standby in main center promotes, two standbys in backup center must be recovered by pg_rewind. However, it is necessary to stop new primary for pg_rewind. If pg_basebackup is used, recovery of backup center takes some times. This is not high availability. Regards, Daisuke Higuchi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Re: new high availability feature for the system withboth asynchronous and synchronous replication
From
Masahiko Sawada
Date:
On Tue, Feb 28, 2017 at 1:56 PM, Higuchi, Daisuke <higuchi.daisuke@jp.fujitsu.com> wrote: > Hi all, > > I create POC patch for my proposal of new feature for high availability. > I want to discuss about this feature. But this feature might be PG11 > because discussion is not enough. > > This patch enables walsender for async to wait until walsender for sync confirm > WAL is flashed to Disk. This feature is activated when GUC parameter > "async_walsender_delay" is set on. > > I write the case when this feature is useful (this is the same as I wrote before): > 1. Primary and synchronous standby are in the same center; called main center. > 2. Asynchronous standby is in the another center; called backup center. > (The backup center is located far away from the main center. If replication > mode is synchronous, performance will be deteriorated. So, this replication > must be Asynchronous. ) > 3. Asynchronous replication is performed in the backup center too. > 4. When primary in main center abnormally stops, standby in main center is > promoted, and the standby in backup center connects to the new primary. > > [Main center] > |--------------------------------------------| > | |----------| synchronous |----------| | > | | | replication | | | > | | primary | <--------------> | standby1 | | > | |----------| |----------| | > |----||--------------------------------------| > || > || asynchronous > || replication > || > || [Backup center] > |----||--------------------------------------| > | |----------| asynchronous |----------| | > | | | replication | | | > | | standby2 | <--------------> | standby3 | | > | |----------| |----------| | > |--------------------------------------------| > > When the load in the main center becomes high, although WAL reaches standby in > backup center, WAL may not reach synchronous standby in main center for various > reasons. In other words, standby in the backup center may advance beyond > synchronous standby in main center. > > When the primary abnormally stops and standby in main center promotes, two > standbys in backup center must be recovered by pg_rewind. However, it is > necessary to stop new primary for pg_rewind. If pg_basebackup is used, > recovery of backup center takes some times. This is not high availability. > If the standby server in main center promoted to the new primary server, why do we need to stop it in order to execute pg_rewind to the standbys on backup center? I guess you don't need to stop the new primary server by using --source-server option. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Re: new high availability feature for the system withboth asynchronous and synchronous replication
From
Robert Haas
Date:
On Tue, Feb 28, 2017 at 10:26 AM, Higuchi, Daisuke <higuchi.daisuke@jp.fujitsu.com> wrote: > I create POC patch for my proposal of new feature for high availability. > I want to discuss about this feature. But this feature might be PG11 > because discussion is not enough. > > This patch enables walsender for async to wait until walsender for sync confirm > WAL is flashed to Disk. This feature is activated when GUC parameter > "async_walsender_delay" is set on. So this new option makes asynchronous replication synchronous? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Re: new high availability feature for the system withboth asynchronous and synchronous replication
From
"Higuchi, Daisuke"
Date:
From: Robert Haas [mailto:robertmhaas@gmail.com] >> This patch enables walsender for async to wait until walsender for sync confirm >> WAL is flashed to Disk. This feature is activated when GUC parameter >> "async_walsender_delay" is set on. > So this new option makes asynchronous replication synchronous? No, this feature only delays the start of WAL transfer of asynchronous replication. Asynchronous replication on this feature does not wait for response from standby. (This behavior does not be changed, so it is the same as before. ) Regards, Daisuke Higuchi