Thread: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

[PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

From

Arjan van de Ven

Date:

26 October 2021, 23:58:17

[PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

fmtint() turns an integer into a string for a given base, and to do this
it does a divide/modulo operation iteratively.
The only possible base values are 8, 10 and 16

On just about any CPU, divides are a pretty expensive operation, generally
10x to 20x or more expensive than adds or multiplies.

By special casing the base values, the compiler (gcc or other) can (and will)
replace the divide by a multiply with 0xcccccccccccccccd (for base 10) or bitops
for base 8 and 16, yielding a lot faster code.

I considered a switch statement, but since base 10 is the most common by far,
I implemented it as a series of if/else statements with a likely() marking the 10 case.

Even though this only shows up in the database creation phase of pgbench and not so much
during the normal run time, the optimization is simple and high value enough that
in my opinion it's worth doing




diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 7c21429369..547a59d4a0 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -1076,11 +1076,31 @@ fmtint(long long value, char type, int forcesign, int leftjust,
      else
      {
          /* make integer string */
-        do
-        {
-            convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
-            uvalue = uvalue / base;
-        } while (uvalue);
+
+        /*
+         * Special case each of the possible base values  (8, 10, 16) to avoid an
+         * expensive divide operation
+         * (the compiler will use a multiply, shift or boolean ops for this)
+         */
+        if (likely(base == 10)) {
+            do
+            {
+                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10];
+                uvalue = uvalue / 10;
+            } while (uvalue);
+        } else if (base == 16) {
+            do
+            {
+                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 16];
+                uvalue = uvalue / 16;
+            } while (uvalue);
+        } else if (base == 8) {
+            do
+            {
+                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 8];
+                uvalue = uvalue / 8;
+            } while (uvalue);
+        }
      }

      zeropad = Max(0, precision - vallen);

Re: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

From

Japin Li

Date:

27 October 2021, 04:36:35

On Wed, 27 Oct 2021 at 04:58, Arjan van de Ven <arjan@linux.intel.com> wrote:
> [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint
>
> fmtint() turns an integer into a string for a given base, and to do this
> it does a divide/modulo operation iteratively.
> The only possible base values are 8, 10 and 16
>
> On just about any CPU, divides are a pretty expensive operation, generally
> 10x to 20x or more expensive than adds or multiplies.
>
> By special casing the base values, the compiler (gcc or other) can (and will)
> replace the divide by a multiply with 0xcccccccccccccccd (for base 10) or bitops
> for base 8 and 16, yielding a lot faster code.
>
> I considered a switch statement, but since base 10 is the most common by far,
> I implemented it as a series of if/else statements with a likely() marking the 10 case.
>
> Even though this only shows up in the database creation phase of pgbench and not so much
> during the normal run time, the optimization is simple and high value enough that
> in my opinion it's worth doing
>
>


+        if (likely(base == 10)) {
+            do
+            {
+                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10];
+                uvalue = uvalue / 10;
+            } while (uvalue);
+        } else if (base == 16) {

Why do we need likely() for base=10, however, base=16 and base=8 don't need?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

From

Tom Lane

Date:

27 October 2021, 04:39:54

Japin Li <japinli@hotmail.com> writes:
> Why do we need likely() for base=10, however, base=16 and base=8 don't need?

Yeah, I was a little unconvinced about that too.  I concur with writing
it as an if/else chain instead of a switch, but I'm not sure that likely()
adds anything to that.

            regards, tom lane

Re: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

From

Arjan van de Ven

Date:

28 October 2021, 01:18:13

On 10/26/2021 6:39 PM, Tom Lane wrote:
> Japin Li <japinli@hotmail.com> writes:
>> Why do we need likely() for base=10, however, base=16 and base=8 don't need?
> 
> Yeah, I was a little unconvinced about that too.  I concur with writing
> it as an if/else chain instead of a switch, but I'm not sure that likely()
> adds anything to that.

fair enough:

[PATCH v3] src/port/snprintf.c: Optimize the division away in fmtint

fmtint() turns an integer into a string for a given base, and to do this
it does a divide/modulo operation iteratively.
The only possible base values are 8, 10 and 16

On just about any CPU, generic divides are a pretty expensive operation, generally
10x to 20x or more expensive than adds or multiplies.

By special casing the base values, the compiler (gcc or other) can (and will)
replace the divide by a multiply with 0xcccccccccccccccd (for base 10) or bitops
for base 8 and 16, yielding a lot faster code.

I considered a switch statement, but since base 10 is the most common by far,
I implemented it as a series of if/else statements for simplicity.

Even though this only shows up in the database creation phase of pgbench and not so much
during the normal run time, the optimization is simple and high value enough that
in my opinion it's worth doing




diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 7c21429369..547a59d4a0 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -1076,11 +1076,31 @@ fmtint(long long value, char type, int forcesign, int leftjust,
      else
      {
          /* make integer string */
-        do
-        {
-            convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base];
-            uvalue = uvalue / base;
-        } while (uvalue);
+
+        /*
+         * Special case each of the possible base values  (8, 10, 16) to avoid an
+         * expensive divide operation
+         * (the compiler will use a multiply, shift or boolean ops for this)
+         */
+        if (base == 10) {
+            do
+            {
+                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10];
+                uvalue = uvalue / 10;
+            } while (uvalue);
+        } else if (base == 16) {
+            do
+            {
+                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 16];
+                uvalue = uvalue / 16;
+            } while (uvalue);
+        } else if (base == 8) {
+            do
+            {
+                convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 8];
+                uvalue = uvalue / 8;
+            } while (uvalue);
+        }
      }

      zeropad = Max(0, precision - vallen);

Re: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

From

Chapman Flack

Date:

28 October 2021, 02:50:45

On 10/27/21 18:18, Arjan van de Ven wrote:
> +        /*
> +         * Special case each of the possible base values  (8, 10, 16) to
> avoid an
> +         * expensive divide operation
> +         * (the compiler will use a multiply, shift or boolean ops for this)
> +         */


Was 'boolean' the intended word there? To me it is distinct from 'bitwise'.

Regards,
-Chap

Re: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

From

Tom Lane

Date:

28 October 2021, 20:46:49

Chapman Flack <chap@anastigmatix.net> writes:
> On 10/27/21 18:18, Arjan van de Ven wrote:
>> +        /*
>> +         * Special case each of the possible base values  (8, 10, 16) to
>> avoid an
>> +         * expensive divide operation
>> +         * (the compiler will use a multiply, shift or boolean ops for this)
>> +         */

> Was 'boolean' the intended word there? To me it is distinct from 'bitwise'.

I think the comment is overly specific anyway.  We should just say
"division by a constant is faster than general-purpose division".
Only compiler geeks will care about the details, and they probably
know them already.

Personally, I failed to measure any speedup at all on pgbench, either
in the init phase or regular transactions; whatever difference there
may be is below the noise level.  However, I wrote a simple C function
with a tight loop around snprintf(), and that showed about a 2X
improvement, so there is some win here.

I went ahead and pushed it with a rewritten comment.

            regards, tom lane

Re: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

From

Andres Freund

Date:

28 October 2021, 23:27:49

Hi,

On 2021-10-28 13:46:49 -0400, Tom Lane wrote:
> Personally, I failed to measure any speedup at all on pgbench, either
> in the init phase or regular transactions; whatever difference there
> may be is below the noise level.  However, I wrote a simple C function
> with a tight loop around snprintf(), and that showed about a 2X
> improvement, so there is some win here.

Odd - at least with an earlier patch I saw optimized pgbench initialization go
down by ~25%.

> I went ahead and pushed it with a rewritten comment.

Imo the code now is a bit odd, because we first switch (type) setting base,
and then separately have branches for the different bases.

Greetings,

Andres Freund

Re: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint

From

Tom Lane

Date:

28 October 2021, 23:34:06

Andres Freund <andres@anarazel.de> writes:
> Imo the code now is a bit odd, because we first switch (type) setting base,
> and then separately have branches for the different bases.

It'd be hard to merge, I think, given that the cases in the switch
don't line up one-for-one with the different bases.  You could
probably do something involving falling through between different
cases, but I think that that would be a lot harder to read;
and I'm still of the opinion that micro-optimizing this code
is probably a waste of effort for our usage.

            regards, tom lane