Thread: src/port/snprintf.c: Optimize the common base=10 case in fmtint

src/port/snprintf.c: Optimize the common base=10 case in fmtint

From
Arjan van de Ven
Date:
src/port/snprintf.c: Optimize the common base=10 case in fmtint

fmtint() turns an integer into a string for a given base, and to do this
it does a divide/modulo operation iteratively.

On just about any CPU, divides are a pretty expensive operation, generally
10x to 20x or more expensive than adds or multiplies.

By special casing the super common case of base==10, the (gcc) compiler can (and will)
replace the divide by a multiply with 0xcccccccccccccccd, yielding a lot faster code.
(fmtint dropped drastically in the perf profiles after this change)

Even though this only shows up in the database creation phase of pgbench and not so much
during the normal run time, the optimization is simple and high value enough that
in my opinion it's worth doing




diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 7c21429369..5957e6f2aa 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -1076,11 +1076,24 @@ fmtint(long long value, char type, int forcesign, int leftjust,
      else
      {
          /* make integer string */
-        do
-        {
-            convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
-            uvalue = uvalue / base;
-        } while (uvalue);
+
+        /*
+         * Special case a base of 10 because it is super common and by special casing the compiler can
+         * avoid an expensive divide operation (the compiler will use a multiply for this)
+         */
+        if (likely(base == 10)) {
+            do
+            {
+                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10];
+                uvalue = uvalue / 10;
+            } while (uvalue);
+        } else {
+            do
+            {
+                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
+                uvalue = uvalue / base;
+            } while (uvalue);
+        }
      }

      zeropad = Max(0, precision - vallen);



Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

From
Mark Dilger
Date:

> On Oct 26, 2021, at 7:57 AM, Arjan van de Ven <arjan@linux.intel.com> wrote:
>
> By special casing the super common case of base==10, the (gcc) compiler can (and will)
> replace the divide by a multiply with 0xcccccccccccccccd, yielding a lot faster code.
> (fmtint dropped drastically in the perf profiles after this change)

It appears fmtint only has three options for base, being 10, 16, and 8.  Have you profiled with either of the others
specialcased as well?  I don't see much use in optimizing for octal, but hexadecimal is used quite a bit in wal with
patternslike "%08X%08X%08X". 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

From
Andres Freund
Date:
Hi,

On 2021-10-26 07:57:36 -0700, Arjan van de Ven wrote:
> src/port/snprintf.c: Optimize the common base=10 case in fmtint
> 
> fmtint() turns an integer into a string for a given base, and to do this
> it does a divide/modulo operation iteratively.
> 
> On just about any CPU, divides are a pretty expensive operation, generally
> 10x to 20x or more expensive than adds or multiplies.

This has been bothering me too, thanks for doing something about it.


> By special casing the super common case of base==10, the (gcc) compiler can (and will)
> replace the divide by a multiply with 0xcccccccccccccccd, yielding a lot faster code.
> (fmtint dropped drastically in the perf profiles after this change)
> 
> Even though this only shows up in the database creation phase of pgbench and not so much
> during the normal run time, the optimization is simple and high value enough that
> in my opinion it's worth doing

It does even show up during normal running for me, in readonly pgbench.


> diff --git a/src/port/snprintf.c b/src/port/snprintf.c
> index 7c21429369..5957e6f2aa 100644
> --- a/src/port/snprintf.c
> +++ b/src/port/snprintf.c
> @@ -1076,11 +1076,24 @@ fmtint(long long value, char type, int forcesign, int leftjust,
>      else
>      {
>          /* make integer string */
> -        do
> -        {
> -            convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
> -            uvalue = uvalue / base;
> -        } while (uvalue);
> +
> +        /*
> +         * Special case a base of 10 because it is super common and by special casing the compiler can
> +         * avoid an expensive divide operation (the compiler will use a multiply for this)
> +         */
> +        if (likely(base == 10)) {
> +            do
> +            {
> +                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10];
> +                uvalue = uvalue / 10;
> +            } while (uvalue);
> +        } else {
> +            do
> +            {
> +                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
> +                uvalue = uvalue / base;
> +            } while (uvalue);
> +        }
>      }
> 
>      zeropad = Max(0, precision - vallen);

Since all the bases are known / set earlier in the function, it seems better
to just split the function into two, with the new helper doing the conversion.

It's harder than it should be, because that code is a bit, uh, tangled, but I
think I can see a way through...

Greetings,

Andres Freund



Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

From
Tom Lane
Date:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
> It appears fmtint only has three options for base, being 10, 16, and 8.  Have you profiled with either of the others
specialcased as well?  I don't see much use in optimizing for octal, but hexadecimal is used quite a bit in wal with
patternslike "%08X%08X%08X". 

I'd be inclined to just hard-wire the three allowed cases, and not have
an arbitrary-divisor code path at all.

            regards, tom lane



Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

From
Arjan van de Ven
Date:
On 10/26/2021 10:51 AM, Tom Lane wrote:
> Mark Dilger <mark.dilger@enterprisedb.com> writes:
>> It appears fmtint only has three options for base, being 10, 16, and 8.  Have you profiled with either of the others
specialcased as well?  I don't see much use in optimizing for octal, but hexadecimal is used quite a bit in wal with
patternslike "%08X%08X%08X".
 
> 
> I'd be inclined to just hard-wire the three allowed cases, and not have
> an arbitrary-divisor code path at all.
> 

ok so feedback is "Yes please but we want more of it" :)

I'll go poke at making an updated patch that does 8/10/16 and nothing else.





Re: src/port/snprintf.c: Optimize the common base=10 case in fmtint

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2021-10-26 13:51:55 -0400, Tom Lane wrote:
>> I'd be inclined to just hard-wire the three allowed cases, and not have
>> an arbitrary-divisor code path at all.

> Yea, I came to the same conclusion. But I'd implement it by moving the
> division into a separate inline function called from the switch. I tested that
> locally and it works, but I got sidetracked by [1].

Uh, why not just a "switch (base)" around three copies of the loop?
Don't overthink this.

            regards, tom lane