Thread: probably cause (and fix) for floating-point assist faults on itanium
Hi folks, I'm running PG 8.3.15 on an itanium box and was seeing lots of floating-point assist faults by the kernel. Searched around, found a couple references/discussions here and there: http://archives.postgresql.org/pgsql-general/2008-08/msg00244.php http://archives.postgresql.org/pgsql-performance/2011-06/msg00093.php http://archives.postgresql.org/pgsql-performance/2011-06/msg00102.php I took up Tom's challenge and found that the buffer allocation prediction code in BgBufferSync() is the likely culprit: if (smoothed_alloc <= (float) recent_alloc) smoothed_alloc = recent_alloc; else smoothed_alloc += ((float) recent_alloc - smoothed_alloc) / smoothing_samples; smoothed_alloc (float) is moving towards 0 during any extended period of time when recent_alloc (uint32) remains 0. In my case it takes just a minute or two before it becomes small enough to start triggering the fault. Given how smoothed_alloc is used just after this place in the code it seems overkill to allow it to continue to shrink so small, so I made a little mod: if (smoothed_alloc <= (float) recent_alloc) smoothed_alloc = recent_alloc; else if (smoothed_alloc >= 0.00001) smoothed_alloc += ((float) recent_alloc - smoothed_alloc) / smoothing_samples; This seems to have done the trick. From what I can tell this section of code is unchanged in 9.1.1 - perhaps in a future version a similar mod could be made? FWIW, I don't think it's really much of a performance impact for the database, because if recent_alloc remains 0 for a long while it probably means the DB isn't doing much anyway. However it is annoying when system logs fill up, and the extra floating point handling may affect some other process(es). -Greg
On Thu, Nov 17, 2011 at 10:07 PM, Greg Matthews <gregory.a.matthews@nasa.gov> wrote: > if (smoothed_alloc <= (float) recent_alloc) > smoothed_alloc = recent_alloc; > else if (smoothed_alloc >= 0.00001) > smoothed_alloc += ((float) recent_alloc - smoothed_alloc) / > smoothing_samples; > I don't think that logic is sound. Rather, if (smoothed_alloc <= (float) recent_alloc) { smoothed_alloc = recent_alloc; } else { if (smoothed_alloc < 0.000001) smoothed_alloc = 0; smoothed_alloc += ((float) recent_alloc - smoothed_alloc) / smoothing_samples; }
Claudio Freire <klaussfreire@gmail.com> writes: > On Thu, Nov 17, 2011 at 10:07 PM, Greg Matthews > <gregory.a.matthews@nasa.gov> wrote: >> if (smoothed_alloc <= (float) recent_alloc) >> smoothed_alloc = recent_alloc; >> else if (smoothed_alloc >= 0.00001) >> smoothed_alloc += ((float) recent_alloc - smoothed_alloc) / >> smoothing_samples; >> > I don't think that logic is sound. > Rather, > if (smoothed_alloc <= (float) recent_alloc) { > smoothed_alloc = recent_alloc; > } else { > if (smoothed_alloc < 0.000001) > smoothed_alloc = 0; > smoothed_alloc += ((float) recent_alloc - smoothed_alloc) / > smoothing_samples; > } The real problem with either of these is the cutoff number is totally arbitrary. I'm thinking of something like this: /* * Track a moving average of recent buffer allocations. Here, rather than * a true average we want a fast-attack, slow-decline behavior: we * immediately follow any increase. */ if (smoothed_alloc <= (float) recent_alloc) smoothed_alloc = recent_alloc; else smoothed_alloc += ((float) recent_alloc - smoothed_alloc) / smoothing_samples; /* Scale the estimate by a GUC to allow more aggressive tuning. */ upcoming_alloc_est = smoothed_alloc * bgwriter_lru_multiplier; + /* + * If recent_alloc remains at zero for many cycles, + * smoothed_alloc will eventually underflow to zero, and the + * underflows produce annoying kernel warnings on some platforms. + * Once upcoming_alloc_est has gone to zero, there's no point in + * tracking smaller and smaller values of smoothed_alloc, so just + * reset it to exactly zero to avoid this syndrome. + */ + if (upcoming_alloc_est == 0) + smoothed_alloc = 0; /* * Even in cases where there's been little or no buffer allocation * activity, we want to make a small amount of progress through the buffer regards, tom lane
Greg Matthews <gregory.a.matthews@nasa.gov> writes: > Looks good to me. I built PG with this change, no kernel warnings after > ~10 minutes of running. I'll continue to monitor but I think this fixes > the syndrome. Thanks Tom. Patch committed -- thanks for checking it. regards, tom lane
Looks good to me. I built PG with this change, no kernel warnings after ~10 minutes of running. I'll continue to monitor but I think this fixes the syndrome. Thanks Tom. -Greg On Fri, 18 Nov 2011, Tom Lane wrote: > Claudio Freire <klaussfreire@gmail.com> writes: >> On Thu, Nov 17, 2011 at 10:07 PM, Greg Matthews >> <gregory.a.matthews@nasa.gov> wrote: >>> if (smoothed_alloc <= (float) recent_alloc) >>> smoothed_alloc = recent_alloc; >>> else if (smoothed_alloc >= 0.00001) >>> smoothed_alloc += ((float) recent_alloc - smoothed_alloc) / >>> smoothing_samples; >>> > >> I don't think that logic is sound. > >> Rather, > >> if (smoothed_alloc <= (float) recent_alloc) { >> smoothed_alloc = recent_alloc; >> } else { >> if (smoothed_alloc < 0.000001) >> smoothed_alloc = 0; >> smoothed_alloc += ((float) recent_alloc - smoothed_alloc) / >> smoothing_samples; >> } > > The real problem with either of these is the cutoff number is totally > arbitrary. I'm thinking of something like this: > > /* > * Track a moving average of recent buffer allocations. Here, rather than > * a true average we want a fast-attack, slow-decline behavior: we > * immediately follow any increase. > */ > if (smoothed_alloc <= (float) recent_alloc) > smoothed_alloc = recent_alloc; > else > smoothed_alloc += ((float) recent_alloc - smoothed_alloc) / > smoothing_samples; > > /* Scale the estimate by a GUC to allow more aggressive tuning. */ > upcoming_alloc_est = smoothed_alloc * bgwriter_lru_multiplier; > > + /* > + * If recent_alloc remains at zero for many cycles, > + * smoothed_alloc will eventually underflow to zero, and the > + * underflows produce annoying kernel warnings on some platforms. > + * Once upcoming_alloc_est has gone to zero, there's no point in > + * tracking smaller and smaller values of smoothed_alloc, so just > + * reset it to exactly zero to avoid this syndrome. > + */ > + if (upcoming_alloc_est == 0) > + smoothed_alloc = 0; > > /* > * Even in cases where there's been little or no buffer allocation > * activity, we want to make a small amount of progress through the buffer > > > regards, tom lane >