Thread: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint
[PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint
From
Arjan van de Ven
Date:
[PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint fmtint() turns an integer into a string for a given base, and to do this it does a divide/modulo operation iteratively. The only possible base values are 8, 10 and 16 On just about any CPU, divides are a pretty expensive operation, generally 10x to 20x or more expensive than adds or multiplies. By special casing the base values, the compiler (gcc or other) can (and will) replace the divide by a multiply with 0xcccccccccccccccd (for base 10) or bitops for base 8 and 16, yielding a lot faster code. I considered a switch statement, but since base 10 is the most common by far, I implemented it as a series of if/else statements with a likely() marking the 10 case. Even though this only shows up in the database creation phase of pgbench and not so much during the normal run time, the optimization is simple and high value enough that in my opinion it's worth doing diff --git a/src/port/snprintf.c b/src/port/snprintf.c index 7c21429369..547a59d4a0 100644 --- a/src/port/snprintf.c +++ b/src/port/snprintf.c @@ -1076,11 +1076,31 @@ fmtint(long long value, char type, int forcesign, int leftjust, else { /* make integer string */ - do - { - convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base]; - uvalue = uvalue / base; - } while (uvalue); + + /* + * Special case each of the possible base values (8, 10, 16) to avoid an + * expensive divide operation + * (the compiler will use a multiply, shift or boolean ops for this) + */ + if (likely(base == 10)) { + do + { + convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10]; + uvalue = uvalue / 10; + } while (uvalue); + } else if (base == 16) { + do + { + convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 16]; + uvalue = uvalue / 16; + } while (uvalue); + } else if (base == 8) { + do + { + convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 8]; + uvalue = uvalue / 8; + } while (uvalue); + } } zeropad = Max(0, precision - vallen);
On Wed, 27 Oct 2021 at 04:58, Arjan van de Ven <arjan@linux.intel.com> wrote: > [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint > > fmtint() turns an integer into a string for a given base, and to do this > it does a divide/modulo operation iteratively. > The only possible base values are 8, 10 and 16 > > On just about any CPU, divides are a pretty expensive operation, generally > 10x to 20x or more expensive than adds or multiplies. > > By special casing the base values, the compiler (gcc or other) can (and will) > replace the divide by a multiply with 0xcccccccccccccccd (for base 10) or bitops > for base 8 and 16, yielding a lot faster code. > > I considered a switch statement, but since base 10 is the most common by far, > I implemented it as a series of if/else statements with a likely() marking the 10 case. > > Even though this only shows up in the database creation phase of pgbench and not so much > during the normal run time, the optimization is simple and high value enough that > in my opinion it's worth doing > > + if (likely(base == 10)) { + do + { + convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10]; + uvalue = uvalue / 10; + } while (uvalue); + } else if (base == 16) { Why do we need likely() for base=10, however, base=16 and base=8 don't need? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Japin Li <japinli@hotmail.com> writes: > Why do we need likely() for base=10, however, base=16 and base=8 don't need? Yeah, I was a little unconvinced about that too. I concur with writing it as an if/else chain instead of a switch, but I'm not sure that likely() adds anything to that. regards, tom lane
Re: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint
From
Arjan van de Ven
Date:
On 10/26/2021 6:39 PM, Tom Lane wrote: > Japin Li <japinli@hotmail.com> writes: >> Why do we need likely() for base=10, however, base=16 and base=8 don't need? > > Yeah, I was a little unconvinced about that too. I concur with writing > it as an if/else chain instead of a switch, but I'm not sure that likely() > adds anything to that. fair enough: [PATCH v3] src/port/snprintf.c: Optimize the division away in fmtint fmtint() turns an integer into a string for a given base, and to do this it does a divide/modulo operation iteratively. The only possible base values are 8, 10 and 16 On just about any CPU, generic divides are a pretty expensive operation, generally 10x to 20x or more expensive than adds or multiplies. By special casing the base values, the compiler (gcc or other) can (and will) replace the divide by a multiply with 0xcccccccccccccccd (for base 10) or bitops for base 8 and 16, yielding a lot faster code. I considered a switch statement, but since base 10 is the most common by far, I implemented it as a series of if/else statements for simplicity. Even though this only shows up in the database creation phase of pgbench and not so much during the normal run time, the optimization is simple and high value enough that in my opinion it's worth doing diff --git a/src/port/snprintf.c b/src/port/snprintf.c index 7c21429369..547a59d4a0 100644 --- a/src/port/snprintf.c +++ b/src/port/snprintf.c @@ -1076,11 +1076,31 @@ fmtint(long long value, char type, int forcesign, int leftjust, else { /* make integer string */ - do - { - convert[sizeof(convert) - (++vallen)] = cvt[uvalue % base]; - uvalue = uvalue / base; - } while (uvalue); + + /* + * Special case each of the possible base values (8, 10, 16) to avoid an + * expensive divide operation + * (the compiler will use a multiply, shift or boolean ops for this) + */ + if (base == 10) { + do + { + convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 10]; + uvalue = uvalue / 10; + } while (uvalue); + } else if (base == 16) { + do + { + convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 16]; + uvalue = uvalue / 16; + } while (uvalue); + } else if (base == 8) { + do + { + convert[sizeof(convert) - (++vallen)] = cvt[uvalue % 8]; + uvalue = uvalue / 8; + } while (uvalue); + } } zeropad = Max(0, precision - vallen);
Re: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint
From
Chapman Flack
Date:
On 10/27/21 18:18, Arjan van de Ven wrote: > + /* > + * Special case each of the possible base values (8, 10, 16) to > avoid an > + * expensive divide operation > + * (the compiler will use a multiply, shift or boolean ops for this) > + */ Was 'boolean' the intended word there? To me it is distinct from 'bitwise'. Regards, -Chap
Chapman Flack <chap@anastigmatix.net> writes: > On 10/27/21 18:18, Arjan van de Ven wrote: >> + /* >> + * Special case each of the possible base values (8, 10, 16) to >> avoid an >> + * expensive divide operation >> + * (the compiler will use a multiply, shift or boolean ops for this) >> + */ > Was 'boolean' the intended word there? To me it is distinct from 'bitwise'. I think the comment is overly specific anyway. We should just say "division by a constant is faster than general-purpose division". Only compiler geeks will care about the details, and they probably know them already. Personally, I failed to measure any speedup at all on pgbench, either in the init phase or regular transactions; whatever difference there may be is below the noise level. However, I wrote a simple C function with a tight loop around snprintf(), and that showed about a 2X improvement, so there is some win here. I went ahead and pushed it with a rewritten comment. regards, tom lane
Re: [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint
From
Andres Freund
Date:
Hi, On 2021-10-28 13:46:49 -0400, Tom Lane wrote: > Personally, I failed to measure any speedup at all on pgbench, either > in the init phase or regular transactions; whatever difference there > may be is below the noise level. However, I wrote a simple C function > with a tight loop around snprintf(), and that showed about a 2X > improvement, so there is some win here. Odd - at least with an earlier patch I saw optimized pgbench initialization go down by ~25%. > I went ahead and pushed it with a rewritten comment. Imo the code now is a bit odd, because we first switch (type) setting base, and then separately have branches for the different bases. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Imo the code now is a bit odd, because we first switch (type) setting base, > and then separately have branches for the different bases. It'd be hard to merge, I think, given that the cases in the switch don't line up one-for-one with the different bases. You could probably do something involving falling through between different cases, but I think that that would be a lot harder to read; and I'm still of the opinion that micro-optimizing this code is probably a waste of effort for our usage. regards, tom lane