Re: Streaming replication and a disk full in primary - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Streaming replication and a disk full in primary |
Date | |
Msg-id | g2n3f0b79eb1004072333yde79121v902a7e0a8bb02eac@mail.gmail.com Whole thread Raw |
In response to | Re: Streaming replication and a disk full in primary (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Streaming replication and a disk full in primary
(Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
|
List | pgsql-hackers |
Thanks for the great patch! I apologize for leaving the issue half-finished for long time :( On Wed, Apr 7, 2010 at 7:02 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > In your version of this patch, the default was still the current > behavior where the primary retains WAL files that are still needed by > connected stadby servers indefinitely. I think that's a dangerous > default, so I changed it so that if you don't set standby_keep_segments, > the primary doesn't retain any extra segments; the number of WAL > segments available for standby servers is determined only by the > location of the previous checkpoint, and the status of WAL archiving. > That makes the code a bit simpler too, as we never care how far the > walsenders are. In fact, the GetOldestWALSenderPointer() function is now > dead code. It's OK for me to change the default behavior. We can remove the GetOldestWALSenderPointer() function. doc/src/sgml/config.sgml - archival or to recover from a checkpoint. If standby_keep_segments + archival or to recover from a checkpoint. If <varname>standby_keep_segments</> The word "standby_keep_segments" always needs the <varname> tag, I think. We should remove the document "25.2.5.2. Monitoring"? Why is standby_keep_segments used even if max_wal_senders is zero? In that case, ISTM we don't need to keep any WAL files in pg_xlog for the standby. When XLogRead() reads two WAL files and only the older of them is recycled during being read, it might fail in checking whether the read data is valid. This is because the variable "recptr" can advance to the newer WAL file before the check. When walreceiver has gotten stuck for some reason, walsender would be unable to pass through the send() system call, and also get stuck. In the patch, such a walsender cannot exit forever because it cannot call XLogRead(). So I think that the bgwriter needs to send the exit-signal to such a too lagged walsender. Thought? The shmem of latest recycled WAL file is updated before checking whether it's already been archived. If archiving is not working for some reason, the WAL file which that shmem indicates might not actually have been recycled yet. In this case, the standby cannot obtain the WAL file from the primary because it's been marked as "latest recycled", and from the archive because it's not been archived yet. This seems to be a big problem. How about moving the update of the shmem to after calling XLogArchiveCheckDone() in RemoveOldXlogFiles()? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: