Re: Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Reducing the overhead of NUMERIC data
Date
Msg-id 1130960179.8300.1777.camel@localhost.localdomain
Whole thread Raw
In response to Re: Reducing the overhead of NUMERIC data  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Reducing the overhead of NUMERIC data
List pgsql-hackers
On Wed, 2005-11-02 at 13:46 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Tue, 2005-11-01 at 17:55 -0500, Tom Lane wrote:
> >> I don't think it'd be worth having 2 types.  Remember that the weight is
> >> measured in base-10k digits.  Suppose for instance
> >>     sign        1 bit
> >>     weight        7 bits (-64 .. +63)
> >>     dscale        8 bits (0..255)
>
> > I've coded a short patch to do this, which is the result of two
> > alternate patches and some thinking, but maybe not enough yet.
>
> What your patch does is

Thanks for checking this so quickly.

>
>     sign        2 bits

OK, thats just a mistake in my second patch. Thats easily corrected.
Please ignore that for now.

>     weight        8 bits (-128..127)
>     dscale        6 bits (0..63)
>
> which is simply pretty lame: weight effectively has a factor of 8 more
> dynamic range than dscale in this representation.  What's the point of
> being able to represent 1 * 10000 ^ -128 (ie, 10^-512) if the dscale
> only lets you show 63 fractional digits?  You've got to allocate the
> bits in a saner fashion.  Yes, that takes a little more work.

I wasn't trying to claim the bit assignment made sense. My point was
that the work to mangle the two fields together to make it make sense
looked like it would take more CPU (since the standard representation of
signed integers is different for +ve and -ve values). It is the more CPU
I'm worried about, not the wasted bits on the weight. Spending CPU
cycles on *all* numerics just so we can have numbers with > +/-64
decimal places doesn't seem a good trade. Hence I stuck the numeric sign
back on the dscale, and so dscale and weight seem out of balance.

So, AFAICS, the options are:
0 (current cvstip)
   Numeric range up to 1000, with additional 2 bytes per column value
1. Numeric range up to 128, but with overhead to extract last bit
2. Numeric range up to 64

I'm suggesting we choose (2).... other views are welcome.

(I'll code it whichever way we decide.)

> Also, since the internal (unpacked) calculation representation has a
> much wider dynamic range than this, it'd probably be appropriate to add
> some range checks to the code that forms a packed value from unpacked.

Well, there already is one that does that, otherwise I would have added
one as you suggest. (The unpacked code has int values, whereas the
previous packed format used u/int16 values).

Best Regards, Simon Riggs


pgsql-hackers by date:

Previous
From: Robert Creager
Date:
Subject: Assert failure found in 8.1RC1
Next
From: Tom Lane
Date:
Subject: Re: Reducing the overhead of NUMERIC data