Home > mailing lists

Re: Streaming replication and a disk full in primary - Mailing list pgsql-hackers

From	Fujii Masao
Subject	Re: Streaming replication and a disk full in primary
Date	April 8, 2010 06:34:04
Msg-id	g2n3f0b79eb1004072333yde79121v902a7e0a8bb02eac@mail.gmail.com Whole thread Raw
In response to	Re: Streaming replication and a disk full in primary (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses	Re: Streaming replication and a disk full in primary (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List	pgsql-hackers

Tree view

Thanks for the great patch! I apologize for leaving the issue
half-finished for long time :(

On Wed, Apr 7, 2010 at 7:02 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> In your version of this patch, the default was still the current
> behavior where the primary retains WAL files that are still needed by
> connected stadby servers indefinitely. I think that's a dangerous
> default, so I changed it so that if you don't set standby_keep_segments,
> the primary doesn't retain any extra segments; the number of WAL
> segments available for standby servers is determined only by the
> location of the previous checkpoint, and the status of WAL archiving.
> That makes the code a bit simpler too, as we never care how far the
> walsenders are. In fact, the GetOldestWALSenderPointer() function is now
> dead code.

It's OK for me to change the default behavior. We can remove
the GetOldestWALSenderPointer() function.

doc/src/sgml/config.sgml
-        archival or to recover from a checkpoint. If standby_keep_segments
+        archival or to recover from a checkpoint. If
<varname>standby_keep_segments</>

The word "standby_keep_segments" always needs the <varname> tag, I think.

We should remove the document "25.2.5.2. Monitoring"?

Why is standby_keep_segments used even if max_wal_senders is zero?
In that case, ISTM we don't need to keep any WAL files in pg_xlog
for the standby.

When XLogRead() reads two WAL files and only the older of them is recycled
during being read, it might fail in checking whether the read data is valid.
This is because the variable "recptr" can advance to the newer WAL file
before the check.

When walreceiver has gotten stuck for some reason, walsender would be
unable to pass through the send() system call, and also get stuck.
In the patch, such a walsender cannot exit forever because it cannot
call XLogRead(). So I think that the bgwriter needs to send the
exit-signal to such a too lagged walsender. Thought?

The shmem of latest recycled WAL file is updated before checking whether
it's already been archived. If archiving is not working for some reason,
the WAL file which that shmem indicates might not actually have been
recycled yet. In this case, the standby cannot obtain the WAL file from
the primary because it's been marked as "latest recycled", and from the
archive because it's not been archived yet. This seems to be a big problem.
How about moving the update of the shmem to after calling XLogArchiveCheckDone()
in RemoveOldXlogFiles()?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

pgsql-hackers by date:

From: Takahiro Itagaki
Date: 08 April 2010, 05:38:43
Subject: Oddly indented raw_expression_tree_walker

From: Fujii Masao
Date: 08 April 2010, 06:54:26
Subject: Re: Remaining Streaming Replication Open Items

Re: Streaming replication and a disk full in primary - Mailing list pgsql-hackers

Previous

Next