Re: [WIP Patch] Using 128-bit integers for sum, avg and statistics aggregates - Mailing list pgsql-hackers

From Arthur Silva
Subject Re: [WIP Patch] Using 128-bit integers for sum, avg and statistics aggregates
Date
Msg-id CAO_YK0X2FYVwfJ7Cv+xwk2CTH36dnyxu2VHbEEEuBg0UnxCwGA@mail.gmail.com
Whole thread
In response to [WIP Patch] Using 128-bit integers for sum, avg and statistics aggregates  (Andreas Karlsson <andreas@proxel.se>)
Responses Re: [WIP Patch] Using 128-bit integers for sum, avg and statistics aggregates
Re: [WIP Patch] Using 128-bit integers for sum, avg and statistics aggregates
List pgsql-hackers

On Sat, Oct 25, 2014 at 12:38 PM, Andreas Karlsson <andreas@proxel.se> wrote:
Hi,

There was recently talk about if we should start using 128-bit integers (where available) to speed up the aggregate functions over integers which uses numeric for their internal state. So I hacked together a patch for this to see what the performance gain would be.

Previous thread: http://www.postgresql.org/message-id/20141017182500.GF2075@alap3.anarazel.de

What the patch does is switching from using numerics in the aggregate state to int128 and then convert the type from the 128-bit integer in the final function.

The functions where we can make use of int128 states are:

- sum(int8)
- avg(int8)
- var_*(int2)
- var_*(int4)
- stdev_*(int2)
- stdev_*(int4)

The initial benchmark results look very promising. When summing 10 million int8 I get a speedup of ~2.5x and similarly for var_samp() on 10 million int4 I see a speed up of ~3.7x. To me this indicates that it is worth the extra code. What do you say? Is this worth implementing?

The current patch still requires work. I have not written the detection of int128 support yet, and the patch needs code cleanup (for example: I used an int16_ prefix on the added functions, suggestions for better names are welcome). I also need to decide on what estimate to use for the size of that state.

The patch should work and pass make check on platforms where __int128_t is supported.

The simple benchmarks:

CREATE TABLE test_int8 AS SELECT x::int8 FROM generate_series(1, 10000000) x;

Before:

# SELECT sum(x) FROM test_int8;
      sum
----------------
 50000005000000
(1 row)

Time: 2521.217 ms

After:

# SELECT sum(x) FROM test_int8;
      sum
----------------
 50000005000000
(1 row)

Time: 1022.811 ms

CREATE TABLE test_int4 AS SELECT x::int4 FROM generate_series(1, 10000000) x;

Before:

# SELECT var_samp(x) FROM test_int4;
      var_samp
--------------------
 8333334166666.6667
(1 row)

Time: 3808.546 ms

After:

# SELECT var_samp(x) FROM test_int4;
      var_samp
--------------------
 8333334166666.6667
(1 row)

Time: 1033.243 ms

Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


These are some nice improvements.

As far as I'm aware int128 types are supported on every major compiler when compiling for 64bit platforms. Right?

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: alter user/role CURRENT_USER
Next
From: Robert Haas
Date:
Subject: Re: Directory/File Access Permissions for COPY and Generic File Access Functions