Re: Streaming replication and a disk full in primary - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Streaming replication and a disk full in primary
Date
Msg-id g2n3f0b79eb1004072333yde79121v902a7e0a8bb02eac@mail.gmail.com
Whole thread Raw
In response to Re: Streaming replication and a disk full in primary  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Streaming replication and a disk full in primary  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
Thanks for the great patch! I apologize for leaving the issue
half-finished for long time :(

On Wed, Apr 7, 2010 at 7:02 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> In your version of this patch, the default was still the current
> behavior where the primary retains WAL files that are still needed by
> connected stadby servers indefinitely. I think that's a dangerous
> default, so I changed it so that if you don't set standby_keep_segments,
> the primary doesn't retain any extra segments; the number of WAL
> segments available for standby servers is determined only by the
> location of the previous checkpoint, and the status of WAL archiving.
> That makes the code a bit simpler too, as we never care how far the
> walsenders are. In fact, the GetOldestWALSenderPointer() function is now
> dead code.

It's OK for me to change the default behavior. We can remove
the GetOldestWALSenderPointer() function.

doc/src/sgml/config.sgml
-        archival or to recover from a checkpoint. If standby_keep_segments
+        archival or to recover from a checkpoint. If
<varname>standby_keep_segments</>

The word "standby_keep_segments" always needs the <varname> tag, I think.

We should remove the document "25.2.5.2. Monitoring"?

Why is standby_keep_segments used even if max_wal_senders is zero?
In that case, ISTM we don't need to keep any WAL files in pg_xlog
for the standby.

When XLogRead() reads two WAL files and only the older of them is recycled
during being read, it might fail in checking whether the read data is valid.
This is because the variable "recptr" can advance to the newer WAL file
before the check.

When walreceiver has gotten stuck for some reason, walsender would be
unable to pass through the send() system call, and also get stuck.
In the patch, such a walsender cannot exit forever because it cannot
call XLogRead(). So I think that the bgwriter needs to send the
exit-signal to such a too lagged walsender. Thought?

The shmem of latest recycled WAL file is updated before checking whether
it's already been archived. If archiving is not working for some reason,
the WAL file which that shmem indicates might not actually have been
recycled yet. In this case, the standby cannot obtain the WAL file from
the primary because it's been marked as "latest recycled", and from the
archive because it's not been archived yet. This seems to be a big problem.
How about moving the update of the shmem to after calling XLogArchiveCheckDone()
in RemoveOldXlogFiles()?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Takahiro Itagaki
Date:
Subject: Oddly indented raw_expression_tree_walker
Next
From: Fujii Masao
Date:
Subject: Re: Remaining Streaming Replication Open Items