Re: Numeric multiplication overflow errors - Mailing list pgsql-hackers

From David Rowley
Subject Re: Numeric multiplication overflow errors
Date
Msg-id CAApHDvr5MFsg4gyZYNFVcdpJsQjRY8soJgJEk8SEw4OFEtgYmA@mail.gmail.com
Whole thread Raw
In response to Re: Numeric multiplication overflow errors  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Numeric multiplication overflow errors  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
 patchOn Thu, 1 Jul 2021 at 13:00, David Rowley <dgrowleyml@gmail.com> wrote:
> I've only looked at the numeric-agg-sumX2-overflow-v2.patch one and it
> all looks mostly ok.

I forgot to mention earlier, after looking at the code a bit more I
wondered if we should just serialise the NumericSumAccum instead of
the NumericVar.  If we're changing this, then maybe it's worth just
doing it once and making it as optimal as possible.

Right now we translate the NumericSumAccum to a NumericVar in the
serial function, only to translate it back again in the deserial
function. This is a bit of a waste of CPU effort, although it might be
fewer bytes to copy over to the main process.  Since deserial can be
pretty hot if we've got a lot of workers throwing serialised aggregate
states at the main process as fast as they all can go. Reducing the
overheads in the serial part of the query could provide some big wins.

I played about with the following case:

create table num (a int not null, b numeric not null, c numeric not
null, d numeric not null, e numeric not null, f numeric not null, g
numeric not null, h numeric not null);
insert into num select x,y,y,y,y,y,y,y from generate_series(1,1000000)
x, generate_Series(1000000000,1000000099) y order by x;
explain analyze select
a,sum(b),sum(c),sum(d),sum(e),sum(f),sum(g),sum(h) from num group by
a;

To try and load up the main process as much as possible, I set 128
workers to run. The profile of the main process looked like:

Master:
  14.10%  postgres            [.] AllocSetAlloc
   7.03%  postgres            [.] accum_sum_carry.part.0
   4.87%  postgres            [.] ExecInterpExpr
   3.72%  postgres            [.] numeric_sum
   3.52%  postgres            [.] accum_sum_copy
   3.06%  postgres            [.] pq_getmsgint
   2.95%  postgres            [.] palloc
   2.82%  postgres            [.] ExecStoreMinimalTuple
   2.62%  postgres            [.] accum_sum_add
   2.60%  postgres            [.] make_result_opt_error
   2.58%  postgres            [.] numeric_avg_deserialize
   2.21%  [kernel]            [k] copy_user_generic_string
   2.08%  postgres            [.] tuplehash_insert_hash_internal

So it is possible to get this stuff to show up.

Your numeric-agg-sumX2-overflow-v2.patch patch speeds this up quite a
bit already, so it makes me think it might be worth doing the extra
work to further reduce the overhead.

Master @ 3788c6678

Execution Time: 8306.319 ms
Execution Time: 8407.785 ms
Execution Time: 8491.056 ms

Master + numeric-agg-sumX2-overflow-v2.patch
Execution Time: 6633.278 ms
Execution Time: 6657.350 ms
Execution Time: 6568.184 ms

A possible reason we might not want to do this is that we currently
don't have a NumericSumAccum for some functions when the compiler has
a working int128 type.  At the moment we translate the int128
accumulator into a NumericVar. We could just serialize the int128 type
in those cases. It would just mean the serialised format is not as
consistent between different builds. We currently have nothing that
depends on them matching across different machines.

Do you think it's worth doing this?

David



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Preventing abort() and exit() calls in libpq
Next
From: Amit Langote
Date:
Subject: Re: Record a Bitmapset of non-pruned partitions