Re: Using 128-bit integers for sum, avg and statistics aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Using 128-bit integers for sum, avg and statistics aggregates
Date
Msg-id CAApHDvoshk1L1r8fCJjf_=OuCwaycnZp4W3C2R-js-sJsFr+Uw@mail.gmail.com
Whole thread Raw
In response to Re: Using 128-bit integers for sum, avg and statistics aggregates  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Using 128-bit integers for sum, avg and statistics aggregates  (Andres Freund <andres@2ndquadrant.com>)
Re: Using 128-bit integers for sum, avg and statistics aggregates  (Robert Haas <robertmhaas@gmail.com>)
Re: Using 128-bit integers for sum, avg and statistics aggregates  (Andreas Karlsson <andreas@proxel.se>)
List pgsql-hackers
On 31 December 2014 at 18:20, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Dec 26, 2014 at 7:57 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> 1. Do we need to keep the 128 byte aggregate state size for machines without
> 128 bit ints? This has been reduced to 48 bytes in the patch, which is in
> favour code being compiled with a compiler which has 128 bit ints.  I kind
> of think that we need to keep the 128 byte estimates for compilers that
> don't support int128, but I'd like to hear any counter arguments.

I think you're referring to the estimated state size in pg_aggregate
here, and I'd say it's probably not a big deal one way or the other.
Presumably, at some point, 128-bit arithmetic will become common
enough that we'll want to change that estimate, but I don't know
whether we've reached that point or not.


Yeah, that's what I was talking about.
I'm just looking at the code which uses this size estimate in choose_hashed_grouping(). I'd be a bit worried giving the difference between 48 and 128 that we'd under estimate the hash size too much and end up going to disk during hashagg.

I think the patch should be modified so that it uses 128 bytes for the size estimate on machines that don't support int128, but I'm not quite sure at this stage if that causes issues for replication, if 1 machine had support and one didn't, would it matter if the pg_aggregate row on the replica was 48 bytes instead of 128? Is this worth worrying about?

 
> 2. References to int16 meaning 16 bytes. I'm really in two minds about this,
> it's quite nice to keep the natural flow, int4, int8, int16, but I can't
> help think that this will confuse someone one day. I think it'll be a long
> time before it confused anyone if we called it int128 instead, but I'm not
> that excited about seeing it renamed either. I'd like to hear what others
> have to say... Is there a chance that some sql standard in the distant
> future will have HUGEINT and we might regret not getting the internal names
> nailed down?

Yeah, I think using int16 to mean 16-bytes will be confusing to
someone almost immediately.


hmm, I think it should be changed to int128 then.  Pitty int4 was selected as a name instead of int32 back in the day...

I'm going to mark the patch as waiting on author, pending those two changes.

My view with the size estimates change is that if a committer deems it overkill, then they can rip it out, but at least it's been thought of and considered rather than forgotten about.

Regards

David Rowley

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Performance improvement for joins where outer side is unique
Next
From: Stephen Frost
Date:
Subject: Re: Additional role attributes && superuser review