Home > mailing lists

Re: warm standby server stops doing checkpoints after awhile - Mailing list pgsql-general

From	Frank Wittig
Subject	Re: warm standby server stops doing checkpoints after awhile
Date	June 1, 2007 08:33:51
Msg-id	46600411.4030207@weisshuhn.de Whole thread Raw
In response to	Re: warm standby server stops doing checkpoints after awhile ("Simon Riggs" <simon@2ndquadrant.com>)
Responses	Re: warm standby server stops doing checkpoints afterawhile
List	pgsql-general

Tree view

Simon Riggs schrieb:

> This is repeatable, yes?
Yes, it occures every time I begin with a new base backup. And it seem
to happen during recreation of tsearch2 vectors of large amounts of data
sets.

> Has anything crashed on your server?
No. Crashes didn't occur duriung that times.

> Are you using GIN or GIST indexes?
I'm using GIN indesex on tsearch2 vectors of very large ammount of data
sets. (About 3,8 million data sets of which about 30-50 thousands are
recreated and indexed when the descibed behavior occures.)

> I'll look at putting some debug information in there that logs whether
> multi-WAL actions remain unresolved for any length of time.
Extra debug info would be great.
I tested myself adding some debug output into the function Tom Lane
mentioned and found that after the server stopped checkpointing every
time the function is called it exits at this point:

  /*
   * Is it safe to checkpoint?  We must ask each of the resource managers
   * whether they have any partial state information that might prevent a
   * correct restart from this point.  If so, we skip this opportunity, but
   * return at the next checkpoint record for another try.
   */
  for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
  {
    if (RmgrTable[rmid].rm_safe_restartpoint != NULL)
      if (!(RmgrTable[rmid].rm_safe_restartpoint()))
        return;
  }

It exits every time with the same value for rmid.
Logs look like this (The quoted lines repeat):

<2007-06-01 13:10:28.936 CEST:%> DEBUG:  00000: executing restore
command "/var/lib/pgsql/restore.pl
/mnt/wal_archive/00000001000000C9000000C2 pg_xlog/RECOVERYXLOG"
<2007-06-01 13:10:28.936 CEST:%> LOCATION:  RestoreArchivedFile, xlog.c:2474
<2007-06-01 13:11:29.055 CEST:%> LOG:  00000: restored log file
"00000001000000C9000000C2" from archive
<2007-06-01 13:11:29.055 CEST:%> LOCATION:  RestoreArchivedFile, xlog.c:2504
<2007-06-01 13:11:29.364 CEST:%> DEBUG:  00000: found Checkpoint in XLOG
<2007-06-01 13:11:29.364 CEST:%> CONTEXT:  xlog redo checkpoint: redo
C9/C20DE050; undo 0/0; tli 1; xid 0/36130541; oid 241990328; multi 8;
offset 15; online
<2007-06-01 13:11:29.364 CEST:%> LOCATION:  RecoveryRestartPoint,
xlog.c:5739
<2007-06-01 13:11:29.365 CEST:%> DEBUG:  00000: Ressource manager (13)
has partial state information
<2007-06-01 13:11:29.365 CEST:%> CONTEXT:  xlog redo checkpoint: redo
C9/C20DE050; undo 0/0; tli 1; xid 0/36130541; oid 241990328; multi 8;
offset 15; online
<2007-06-01 13:11:29.365 CEST:%> LOCATION:  RecoveryRestartPoint,
xlog.c:5769

best regards,
  Frank Wittig

Attachment

signature.asc

pgsql-general by date:

From: Dudás József
Date: 01 June 2007, 08:30:11
Subject: Re: invalid memory alloc after insert with c trigger function

From: "Pavel Stehule"
Date: 01 June 2007, 08:37:48
Subject: Re: how to use array with "holes" ?

Re: warm standby server stops doing checkpoints after awhile - Mailing list pgsql-general

Attachment

Previous

Next