Hi,
On 2021-10-26 07:57:36 -0700, Arjan van de Ven wrote:
> src/port/snprintf.c: Optimize the common base=10 case in fmtint
>
> fmtint() turns an integer into a string for a given base, and to do this
> it does a divide/modulo operation iteratively.
>
> On just about any CPU, divides are a pretty expensive operation, generally
> 10x to 20x or more expensive than adds or multiplies.
This has been bothering me too, thanks for doing something about it.
> By special casing the super common case of base==10, the (gcc) compiler can (and will)
> replace the divide by a multiply with 0xcccccccccccccccd, yielding a lot faster code.
> (fmtint dropped drastically in the perf profiles after this change)
>
> Even though this only shows up in the database creation phase of pgbench and not so much
> during the normal run time, the optimization is simple and high value enough that
> in my opinion it's worth doing
It does even show up during normal running for me, in readonly pgbench.
> diff --git a/src/port/snprintf.c b/src/port/snprintf.c
> index 7c21429369..5957e6f2aa 100644
> --- a/src/port/snprintf.c
> +++ b/src/port/snprintf.c
> @@ -1076,11 +1076,24 @@ fmtint(long long value, char type, int forcesign, int leftjust,
> else
> {
> /* make integer string */
> - do
> - {
> - convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
> - uvalue = uvalue / base;
> - } while (uvalue);
> +
> + /*
> + * Special case a base of 10 because it is super common and by special casing the compiler can
> + * avoid an expensive divide operation (the compiler will use a multiply for this)
> + */
> + if (likely(base == 10)) {
> + do
> + {
> + convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10];
> + uvalue = uvalue / 10;
> + } while (uvalue);
> + } else {
> + do
> + {
> + convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
> + uvalue = uvalue / base;
> + } while (uvalue);
> + }
> }
>
> zeropad = Max(0, precision - vallen);
Since all the bases are known / set earlier in the function, it seems better
to just split the function into two, with the new helper doing the conversion.
It's harder than it should be, because that code is a bit, uh, tangled, but I
think I can see a way through...
Greetings,
Andres Freund