Re: warm standby server stops doing checkpoints after awhile - Mailing list pgsql-general

From Frank Wittig
Subject Re: warm standby server stops doing checkpoints after awhile
Date
Msg-id 46600411.4030207@weisshuhn.de
Whole thread Raw
In response to Re: warm standby server stops doing checkpoints after awhile  ("Simon Riggs" <simon@2ndquadrant.com>)
Responses Re: warm standby server stops doing checkpoints afterawhile
List pgsql-general
Simon Riggs schrieb:

> This is repeatable, yes?
Yes, it occures every time I begin with a new base backup. And it seem
to happen during recreation of tsearch2 vectors of large amounts of data
sets.

> Has anything crashed on your server?
No. Crashes didn't occur duriung that times.

> Are you using GIN or GIST indexes?
I'm using GIN indesex on tsearch2 vectors of very large ammount of data
sets. (About 3,8 million data sets of which about 30-50 thousands are
recreated and indexed when the descibed behavior occures.)

> I'll look at putting some debug information in there that logs whether
> multi-WAL actions remain unresolved for any length of time.
Extra debug info would be great.
I tested myself adding some debug output into the function Tom Lane
mentioned and found that after the server stopped checkpointing every
time the function is called it exits at this point:

  /*
   * Is it safe to checkpoint?  We must ask each of the resource managers
   * whether they have any partial state information that might prevent a
   * correct restart from this point.  If so, we skip this opportunity, but
   * return at the next checkpoint record for another try.
   */
  for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
  {
    if (RmgrTable[rmid].rm_safe_restartpoint != NULL)
      if (!(RmgrTable[rmid].rm_safe_restartpoint()))
        return;
  }

It exits every time with the same value for rmid.
Logs look like this (The quoted lines repeat):

<2007-06-01 13:10:28.936 CEST:%> DEBUG:  00000: executing restore
command "/var/lib/pgsql/restore.pl
/mnt/wal_archive/00000001000000C9000000C2 pg_xlog/RECOVERYXLOG"
<2007-06-01 13:10:28.936 CEST:%> LOCATION:  RestoreArchivedFile, xlog.c:2474
<2007-06-01 13:11:29.055 CEST:%> LOG:  00000: restored log file
"00000001000000C9000000C2" from archive
<2007-06-01 13:11:29.055 CEST:%> LOCATION:  RestoreArchivedFile, xlog.c:2504
<2007-06-01 13:11:29.364 CEST:%> DEBUG:  00000: found Checkpoint in XLOG
<2007-06-01 13:11:29.364 CEST:%> CONTEXT:  xlog redo checkpoint: redo
C9/C20DE050; undo 0/0; tli 1; xid 0/36130541; oid 241990328; multi 8;
offset 15; online
<2007-06-01 13:11:29.364 CEST:%> LOCATION:  RecoveryRestartPoint,
xlog.c:5739
<2007-06-01 13:11:29.365 CEST:%> DEBUG:  00000: Ressource manager (13)
has partial state information
<2007-06-01 13:11:29.365 CEST:%> CONTEXT:  xlog redo checkpoint: redo
C9/C20DE050; undo 0/0; tli 1; xid 0/36130541; oid 241990328; multi 8;
offset 15; online
<2007-06-01 13:11:29.365 CEST:%> LOCATION:  RecoveryRestartPoint,
xlog.c:5769

best regards,
  Frank Wittig




Attachment

pgsql-general by date:

Previous
From: Dudás József
Date:
Subject: Re: invalid memory alloc after insert with c trigger function
Next
From: "Pavel Stehule"
Date:
Subject: Re: how to use array with "holes" ?