Re: Using 128-bit integers for sum, avg and statistics aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Using 128-bit integers for sum, avg and statistics aggregates
Date
Msg-id CAApHDvp9ZUC5fUur-_5nJ7VPUYe_G+pcUTb73pvw0k0rH6xgzg@mail.gmail.com
Whole thread Raw
In response to Re: Using 128-bit integers for sum, avg and statistics aggregates  (Andreas Karlsson <andreas@proxel.se>)
Responses Re: Using 128-bit integers for sum, avg and statistics aggregates  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On 24 December 2014 at 16:04, Andreas Karlsson <andreas@proxel.se> wrote:

On 12/16/2014 11:04 AM, David Rowley wrote:> These are some very promising performance increases.

I've done a quick pass of reading the patch. I currently don't have a
system with a 128bit int type, but I'm working on that.

Sorry for taking some time to get back. I have been busy before Christmas. A new version of the patch is attached.


Ok I've had another look at this, and I think the only things that I have to say have been mentioned already:

1. Do we need to keep the 128 byte aggregate state size for machines without 128 bit ints? This has been reduced to 48 bytes in the patch, which is in favour code being compiled with a compiler which has 128 bit ints.  I kind of think that we need to keep the 128 byte estimates for compilers that don't support int128, but I'd like to hear any counter arguments.

2. References to int16 meaning 16 bytes. I'm really in two minds about this, it's quite nice to keep the natural flow, int4, int8, int16, but I can't help think that this will confuse someone one day. I think it'll be a long time before it confused anyone if we called it int128 instead, but I'm not that excited about seeing it renamed either. I'd like to hear what others have to say... Is there a chance that some sql standard in the distant future will have HUGEINT and we might regret not getting the internal names nailed down?

I also checked the performance of some windowing function calls, since these call the final function for each row, I thought I'd better make sure there was no regression as the final function must perform a conversion from int128 to numeric for each row. 

It seems there's still an increase in performance:

 
Setup:
create table bi_win (i bigint primary key);
insert into bi_win select x.x from generate_series(1,10000) x(x);
vacuum analyze;

Query:
select sum(i) over () from bi_win;

** Master
./pgbench -f ~/int128_window.sql -n -T 60 postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 6567
latency average: 9.137 ms
tps = 109.445841 (including connections establishing)
tps = 109.456941 (excluding connections establishing)

** Patched
./pgbench -f ~/int128_window.sql -n -T 60 postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 7841
latency average: 7.652 ms
tps = 130.670253 (including connections establishing)
tps = 130.675743 (excluding connections establishing)

Setup:
create table i_win (i int primary key);
insert into i_win select x.x from generate_series(1,10000) x(x);
vacuum analyze;

Query:
select stddev(i) over () from i_win;

** Master
./pgbench -f ~/int128_window.sql -n -T 60 postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 5084
latency average: 11.802 ms
tps = 84.730362 (including connections establishing)
tps = 84.735693 (excluding connections establishing)

** Patched
./pgbench -f ~/int128_window.sql -n -T 60 postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 7557
latency average: 7.940 ms
tps = 125.934787 (including connections establishing)
tps = 125.943176 (excluding connections establishing)

Regards

David Rowley

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Additional role attributes && superuser review
Next
From: Arne Scheffer
Date:
Subject: Re: [PATCH] explain sortorder