Thread: Miscalculation in IsCheckpointOnSchedule()
I run long DBT-2 with 8.3beta2 and saw checkpoint spikes periodically. The progress against WAL segments consumption *jumped up* in such cases. It seems to be a miscalculation in IsCheckpointOnSchedule(). xrecoff is uint32, therefore the difference of two xrecoffs could be between -4G and +4G. We should not cast it to int32, that domain is [-2G, +2G). Here is a patch to fix it, casting xrecoff to double directly. Index: src/backend/postmaster/bgwriter.c =================================================================== --- src/backend/postmaster/bgwriter.c (HEAD) +++ src/backend/postmaster/bgwriter.c (working copy) @@ -718,7 +718,7 @@ recptr = GetInsertRecPtr(); elapsed_xlogs = (((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile + - ((double) (int32) (recptr.xrecoff - ckpt_start_recptr.xrecoff)) / XLogSegSize) / + ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / CheckPointSegments; if (progress < elapsed_xlogs) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > - ((double) (int32) (recptr.xrecoff - ckpt_start_recptr.xrecoff)) / XLogSegSize) / > + ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / Surely this makes matters worse, not better. What happens near a segment boundary crossing? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > > - ((double) (int32) (recptr.xrecoff - ckpt_start_recptr.xrecoff)) / XLogSegSize) / > > + ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / > > Surely this makes matters worse, not better. What happens near a segment > boundary crossing? Here is the dumped progres information by the attached patch (only for debug purpose). cur prog. xlog prog. time prog. [-400ms] 0.030503 0.019247 0.031158 (diff xlogid=0, xrecoff=82665472) [-200ms] 0.031176 0.019957 0.031839 (diff xlogid=0, xrecoff=85712896) [*] 0.031860 1.020706 0.032521 (diff xlogid=1, xrecoff=105709568) > recptr.xrecoff - ckpt_start_recptr.xrecoff recptr.xrecoff is reset to 0 or so when xlogid is bumped up. At that time, if ckpt_start_recptr.xrecoff is greater than 2G, we cannot represent the difference with int32 because the value is less than -2G. Casting double after int32 does not help us. We use ( xlogid * 255 * 16MB + xrecoff ) as a base value for the calculation. If we interprets xrecoff=105709568 at [*] as "minus uint32", we can calcurate it correctly as below: without fix with fix [-400ms] 82665472 82665472 (same) [-200ms] 85712896 85712896 (same) [*] 4383899648 88932353 (= 1*255*16MB - (0xFFFFFFFF - 105709568) ) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Tom Lane wrote: > ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: >> - ((double) (int32) (recptr.xrecoff - ckpt_start_recptr.xrecoff)) / XLogSegSize) / >> + ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) / > > Surely this makes matters worse, not better. What happens near a segment > boundary crossing? Hmm. There seems to be another little bug in there. XLogSegsPerFile is defined as 0xffffffff/XLogSegSize, which is 255 with default settings. It should be 256. That leads to negative elapsed_xlogs estimates at xlog file boundaries. XLogCheckpointNeeded suffers from it too; the number of segments consumed since last checkpoint is off by one per each xlogid, so we trigger the checkpoint a little bit too late. Otherwise, the patch looks good to me. I also tested the calculation with a little C program (attached), and it seems to work on segment and xlog file boundaries just fine. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com #include <stdio.h> #include <stdlib.h> typedef unsigned char uint8; /* == 8 bits */ typedef unsigned short uint16; /* == 16 bits */ typedef unsigned int uint32; /* == 32 bits */ typedef signed char int8; /* == 8 bits */ typedef signed short int16; /* == 16 bits */ typedef signed int int32; /* == 32 bits */ #define XLOG_SEG_SIZE (16*1024*1024) #define XLogSegSize ((uint32) XLOG_SEG_SIZE) #define XLogSegsPerFile (((uint32) 0xffffffff) / XLogSegSize) int main(int argc, char **argv) { double elapsed_xlogs; uint32 cur_xrecoff, cur_xlogid; uint32 ckpt_xrecoff, ckpt_xlogid; double a, b; cur_xlogid = strtoul(argv[1], NULL, 10); cur_xrecoff = strtoul(argv[2], NULL, 10); ckpt_xlogid = strtoul(argv[3], NULL, 10); ckpt_xrecoff = strtoul(argv[4], NULL, 10); //a = (cur_xlogid - ckpt_xlogid) * XLogSegsPerFile; //b = (((double) cur_xrecoff) - ((double) ckpt_xrecoff)) / ((double )XLogSegSize); //a = ((double) (int32) (cur_xlogid - ckpt_xlogid)) * XLogSegsPerFile; //b = ((double) (int32) (cur_xrecoff - ckpt_xrecoff)) / XLogSegSize; a = (cur_xlogid - ckpt_xlogid) * XLogSegsPerFile; b = ((double) cur_xrecoff - (double) ckpt_xrecoff) / XLogSegSize; elapsed_xlogs = a + b; printf("XLogSegsPerFile: %d\n", XLogSegsPerFile); printf("xrecoff: %u %f\n", ckpt_xrecoff, (double) ckpt_xrecoff); printf("a: %f\n", a); printf("b: %f\n", b); printf("elapsed_xlogs = %f\n", elapsed_xlogs); }
Heikki Linnakangas wrote: > Tom Lane wrote: >> ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: >>> - ((double) (int32) (recptr.xrecoff - >>> ckpt_start_recptr.xrecoff)) / XLogSegSize) / >>> + ((double) recptr.xrecoff - (double) >>> ckpt_start_recptr.xrecoff) / XLogSegSize) / >> >> Surely this makes matters worse, not better. What happens near a segment >> boundary crossing? > > Hmm. There seems to be another little bug in there. XLogSegsPerFile is > defined as 0xffffffff/XLogSegSize, which is 255 with default settings. > It should be 256. That leads to negative elapsed_xlogs estimates at xlog > file boundaries. XLogCheckpointNeeded suffers from it too; the number of > segments consumed since last checkpoint is off by one per each xlogid, > so we trigger the checkpoint a little bit too late. I'll take that back. We intentionally don't use the last possible segment of each xlog file, to avoid overflows. Sorry for the noise. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Surely this makes matters worse, not better. What happens near a segment >> boundary crossing? > Here is the dumped progres information by the attached patch > (only for debug purpose). Oh, I take that back. I was thinking that conversion of unsigned to double usually does the Wrong Thing, but in this context it seems to be the right thing. regards, tom lane