Re: warm standby server stops doingcheckpointsafterawhile - Mailing list pgsql-general
From | Simon Riggs |
---|---|
Subject | Re: warm standby server stops doingcheckpointsafterawhile |
Date | |
Msg-id | 1180779074.26297.323.camel@silverbirch.site Whole thread Raw |
In response to | Re: warm standby server stops doingcheckpointsafterawhile (Frank Wittig <fw@weisshuhn.de>) |
Responses |
Re: warm standby server stops doingcheckpointsafterawhile
|
List | pgsql-general |
On Fri, 2007-06-01 at 23:14 +0200, Frank Wittig wrote: > Teodor Sigaev schrieb: > > Hope, attached patch fix that. Pls, test it. > > It still happens. > The log is full of incomplete split dumps: > > <2007-06-01 23:00:00.001 CEST:%> LOG: GIN incomplete splits=8 > <2007-06-01 23:00:00.001 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:00:00.001 CEST:%> LOG: GIN incomplete split root:696 > l:93996 r:111778 at redo CF/4B669DF0 > <2007-06-01 23:00:00.001 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:00:00.001 CEST:%> LOG: GIN incomplete split root:562 > l:4491 r:111780 at redo CF/9FEF39F0 > <2007-06-01 23:00:00.001 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:00:00.001 CEST:%> LOG: GIN incomplete split root:200 > l:63015 r:111781 at redo CF/B45AC9B0 > <2007-06-01 23:00:00.001 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:00:00.001 CEST:%> LOG: GIN incomplete split root:318 > l:322 r:111782 at redo CF/B9939B58 > <2007-06-01 23:00:00.001 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:00:00.001 CEST:%> LOG: GIN incomplete split root:374 > l:1912 r:111783 at redo CF/B9B6CC00 > <2007-06-01 23:00:00.001 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:00:00.001 CEST:%> LOG: GIN incomplete split root:1552 > l:1555 r:111784 at redo CF/F1C6D770 > <2007-06-01 23:00:00.001 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:00:00.001 CEST:%> LOG: GIN incomplete split root:221 > l:222 r:111785 at redo D0/ED2F6F0 > <2007-06-01 23:00:00.001 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:00:00.001 CEST:%> LOG: GIN incomplete split root:1147 > l:111771 r:111786 at redo D0/1DDE64C8 > <2007-06-01 23:00:00.001 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/28020F48; undo 0/0; tli 1; xid 0/36761651; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:00:00.814 CEST:%> LOG: restored log file > "00000001000000D000000029" from archive > <2007-06-01 23:00:03.325 CEST:%> LOG: restored log file > "00000001000000D00000002A" from archive > <2007-06-01 23:00:06.145 CEST:%> LOG: restored log file > "00000001000000D00000002B" from archive > <2007-06-01 23:00:10.996 CEST:%> LOG: restored log file > "00000001000000D00000002C" from archive > <2007-06-01 23:00:14.283 CEST:%> LOG: restored log file > "00000001000000D00000002D" from archive > <2007-06-01 23:00:17.086 CEST:%> LOG: restored log file > "00000001000000D00000002E" from archive > <2007-06-01 23:01:19.587 CEST:%> LOG: restored log file > "00000001000000D00000002F" from archive > <2007-06-01 23:01:22.700 CEST:%> LOG: restored log file > "00000001000000D000000030" from archive > <2007-06-01 23:01:24.038 CEST:%> LOG: GIN incomplete splits=9 > <2007-06-01 23:01:24.038 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:01:24.038 CEST:%> LOG: GIN incomplete split root:696 > l:93996 r:111778 at redo CF/4B669DF0 > <2007-06-01 23:01:24.038 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:01:24.038 CEST:%> LOG: GIN incomplete split root:562 > l:4491 r:111780 at redo CF/9FEF39F0 > <2007-06-01 23:01:24.038 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:01:24.038 CEST:%> LOG: GIN incomplete split root:200 > l:63015 r:111781 at redo CF/B45AC9B0 > <2007-06-01 23:01:24.038 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:01:24.038 CEST:%> LOG: GIN incomplete split root:318 > l:322 r:111782 at redo CF/B9939B58 > <2007-06-01 23:01:24.038 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:01:24.038 CEST:%> LOG: GIN incomplete split root:374 > l:1912 r:111783 at redo CF/B9B6CC00 > <2007-06-01 23:01:24.038 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:01:24.038 CEST:%> LOG: GIN incomplete split root:1552 > l:1555 r:111784 at redo CF/F1C6D770 > <2007-06-01 23:01:24.038 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:01:24.038 CEST:%> LOG: GIN incomplete split root:221 > l:222 r:111785 at redo D0/ED2F6F0 > <2007-06-01 23:01:24.038 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:01:24.038 CEST:%> LOG: GIN incomplete split root:1147 > l:111771 r:111786 at redo D0/1DDE64C8 > <2007-06-01 23:01:24.039 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8; > offset 15; online > <2007-06-01 23:01:24.039 CEST:%> LOG: GIN incomplete split root:34 > l:1704 r:111787 at redo D0/289FC958 > <2007-06-01 23:01:24.039 CEST:%> CONTEXT: xlog redo checkpoint: redo > D0/30018F68; undo 0/0; tli 1; xid 0/36763650; oid 243915448; multi 8; > offset 15; online OK, here's what I see: 1. After a certain point, consecutive GIN index splits cause a problem. The new RHS block numbers are consecutive from 111780+ 2. The incomplete splits stay around indefinitely after creation and we aren't trying to remove the wrong split at any point. We're either never creating an xlog record, or we are ignoring it in recovery, or we are somehow making multiple entries then not removing all of them. 3. The root seems to move, which isn't what I personally was expecting to see. It seems root refers to the highest parent involved in the split. 4. We're writing lots of redo in between failed page splits. So *almost* everything is working correctly. 5. This starts to happen when we have very large indexes. This may be coincidental but the first relation file is fairly full (900+ MB). I'll do a patch to identify whether these are entry or data parts of the GIN index, and whether we are ignoring records somehow when they hit gin_redo. Perhaps we can have identified the exact issue by the time Teodor returns. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
pgsql-general by date: