Fujii Masao wrote:
> What makes the sender process bottleneck?
The keyword here is "might". There's many possibilities, like:
- Slow network.
- Ridiculously fast disk. Like a RAM disk. If you have a synchronous
slave you can fail over to, putting WAL on a RAM disk isn't that crazy.
- slower WAL disk on the slave.
etc.
>> Backends then wait
>> * not at all for asynch commit
>> * just for Write for local synch commit
>> * for both Write and Send for remote synch commit
>> (various additional options for what happens to confirm Send)
>
> I'd like to introduce new parameter "synchronous_replication" which specifies
> whether backends waits for the response from WAL sender process. By
> combining synchronous_commit and synchronous_replication, users can
> choose various options.
There's one thing I haven't figured out in this discussion. Does the
write to the disk happen before or after the write to the slave? Can you
guarantee that if a transaction is committed in the master, it's also
committed in the slave, or vice versa?
>> Another thought occurs that we might measure the time a Send takes and
>> specify a limit on how long we are prepared to wait for confirmation.
>> Limit=0 => asynchronous. Limit > 0 implies synchronous-up-to-the-limit.
>> This would give better user behaviour across a highly variable network
>> connection.
>
> In the viewpoint of detection of a network failure, this feature is necessary.
> When the network goes down, WAL sender can be blocked until it detects
> the network failure, i.e. WAL sender keeps waiting for the response which
> never comes. A timeout notification is necessary in order to detect a
> network failure soon.
Agreed. But what happens if you hit that timeout? Should we enforce that
timeout within the server, or should we leave that to the external
heartbeat system?
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com