Hi,
Thanks for your response.
Am 07.10.2011 22:05, schrieb Derrick Rice:
> On Thu, Oct 6, 2011 at 3:47 AM, Frank Lanitz <frank@frank.uvena.de
> <mailto:frank@frank.uvena.de>> wrote:
>
> Hi folks,
>
> I want to refer to a question Rob did back in 2008 at
> http://archives.postgresql.org/pgsql-general/2008-07/msg01167.php as we
> are currently running into a similar question:
> We are using warm standby via PITR using a shared drive between master
> and slave node.
>
> Our setup currently is set to archive_timeout = 60s and
> checkpoint_timeout = 600s.
>
> We expected that now every minute a WAL-file is written to the share,
> but somehow we might misunderstood some part of the documentation as in
> periods with low traffic on database the interval between WAL files is
> >1min up to ten minutes.
>
>
> The 8.4 docs lack this detail, but the 9.0 docs explain this. I don't
> believe it's a behavior change; I think it's just more clarification in
> the documents (
> http://www.postgresql.org/docs/9.0/interactive/runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT
> )
>
> " When this parameter is greater than zero, the server will switch to a
> new segment file whenever this many seconds have elapsed since the last
> segment file switch, ***and there has been any database activity,
> including a single checkpoint.***" (emphasis mine)
>
> Tom said something similar in the thread you referenced:
>
> http://archives.postgresql.org/pgsql-general/2008-07/msg01166.php
>
> "One possible connection is that an xlog file switch will not actually
> happen unless some xlog output has been generated since the last switch.
> If you were watching an otherwise-idle system then maybe the checkpoint
> records are needed to make it look like a switch is needed. OTOH if
> it's *that* idle then the checkpoints should be no-ops too."
We are recognizing import failures on slave after we lower the
archive_timeout below the checkpoint_timeout. Did I understand it
correctly that these errors might get caused by this?
> However, the goal was to have a WAL file every minute so disaster
> recovering can be done fast with a minimum of lost data.
>
>
>
> If there was any data, it's existence in the transaction log would
> trigger the archive_timeout behavior. With no database activity, you
> aren't missing anything.
>
>
> Question is: What did we miss? Do we need to put checkpoint_timeout also
> to 60s and does this makes sense at all?
>
>
> You are getting what you need (maximum 60s between data and the
> corresponding data being sent through archive_command), just not exactly
> what you thought you asked for.
>
> If you absolutely must have a file every in order to sleep well, you can
> lower checkpoint_timeout. Keep in mind the cost of checkpoints.
We will have to think about this.
Cheers,
Frank