Re: Improving and extending int128.h to more of numeric.c - Mailing list pgsql-hackers

From John Naylor
Subject Re: Improving and extending int128.h to more of numeric.c
Date
Msg-id CANWCAZYp1ZQTNT7pkuJpPSH0WnUi67i6ENboFnxXEb_9cCOqDA@mail.gmail.com
Whole thread Raw
In response to Re: Improving and extending int128.h to more of numeric.c  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: Improving and extending int128.h to more of numeric.c
List pgsql-hackers
On Thu, Jul 10, 2025 at 9:06 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Wed, 9 Jul 2025 at 22:31, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> >
> > On Wed, 9 Jul 2025 at 18:27, Andres Freund <andres@anarazel.de> wrote:
> > >
> > > I think we should wire this up to the buildsystem and our testsuite...  Having
> > > testcode that is not run automatically may be helpful while originally
> > > developing something, but it doesn't do anything to detect portability issues
> > > or regressions.
> >
> > Yes, perhaps we should convert src/tools/testint128.c into a new test
> > extension, src/test/modules/test_int128
>
> Here's an update doing that (in 0001). 0002-0005 are unchanged.

(Looking at v3) The new test module runs 10 million rather than a
billion iterations. That still takes 1.2s (after 0005), which seems
excessive for regular buildfarm testing. It seems like we could get by
with fewer than that, by using the time of day for the PRNG seed
(which would also need to be logged on error).

On Mon, Jun 23, 2025 at 3:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> 0002 is a bit of preparatory refactoring of int128.h -- instead of
> having all the native implementations at the top of the file, and the
> non-native implementations at the bottom, this brings them together
> (more like include/common/int.h).

+1

> 0003 optimises the non-native addition code. Specifically, the test
> for whether it needs to propagate a carry to the high part can be made
> much simpler by noting that the low-part addition is unsigned integer
> arithmetic, which is just modular arithmetic, so all it needs to do is
> check for modular wrap-around, which can be done with a single "new <
> old" test. In addition, it's possible to code this in a way that is
> typically branchless, and produces the same machine code as the native
> int128 code (e.g., an ADD and an ADC instruction). For me, this
> significantly reduces the runtime of testint128 (from 31s to 16s).

I see 1/3 less time with the new module, but still noticeably better.

> 0004 simplifies the non-native multiplication code a bit by using
> signed integer multiplication for the first three product terms, which
> simplifies the code needed to add the products to the result. Looking
> on godbolt.org, this typically leads to significantly smaller output,
> with less branching, though I found it only gave around a 3%
> improvement to the runtime of testint128. Nonetheless, I still think
> it's worth doing, to make the code simpler and more readable.

+1

> 0005 is the main patch. It adds a few more functions to int128.h and
> uses them in numeric.c to allow various functions (mainly aggregate
> functions) to use 128-bit integers unconditionally on all platforms.
> This applies to the following aggregates:
>
> - sum(int8)
> - avg(int8)
> - stddev_pop(int4)
> - stddev_samp(int4)
> - var_pop(int4)
> - var_samp(int4)
>
> Excluding the new test code, 0005 gives a slight net reduction in the
> total line count, and eliminates nearly all "#ifdef HAVE_INT128"
> conditional code from numeric.c, making it significantly simpler and
> easier to follow.

I haven't looked too closely, but wanted to point out:

+ /* check 128/32-bit division */
+ t3.hl.hi = x;
+ t3.hl.lo = y;
+ t1.i128 = t3.i128 / z32;
+ r1 = (int32) (t3.i128 % z32);
+ t2 = t3;
+ int128_div_mod_int32(&t2.I128, z32, &r2);
+
+ if (t1.hl.hi != t2.hl.hi || t1.hl.lo != t2.hl.lo)
+ {
+ printf("%016lX%016lX / signed %lX\n", t3.hl.hi, t3.hl.lo, z32);

On gcc 14.3 -Og this gives

warning: format ‘%lX’ expects argument of type ‘long unsigned int’,
but argument 4 has type ‘int32’ {aka ‘int’} [-Wformat=]

...and printing r1 and r2 has the same warnings.

+ if (r1 != r2)
+ {
+ printf("%016lX%016lX % signed %lX\n", t3.hl.hi, t3.hl.lo, z32);

And this gives the above plus

warning: ' ' flag used with ‘%s’ gnu_printf format [-Wformat=]
warning: format ‘%s’ expects argument of type ‘char *’, but argument 4
has type ‘int32’ {aka ‘int’} [-Wformat=]

> Testing on a 32-bit system without native int128 support, I see
> something like a 1.3-1.5x speedup in a couple of simple queries using
> those aggregates.

Nice!

--
John Naylor
Amazon Web Services



pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: 024_add_drop_pub.pl might fail due to deadlock
Next
From: Japin Li
Date:
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2