Re: Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Reducing the overhead of NUMERIC data
Date
Msg-id 1130885167.8300.1702.camel@localhost.localdomain
Whole thread Raw
In response to Re: Reducing the overhead of NUMERIC data  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, 2005-11-01 at 16:54 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > varlen is int32 to match the standard varlena header. However, the max
> > number of digits of the datatype is less than the threshold at which
> > values get toasted. So no NUMERIC values ever get toasted - in which
> > case, why worry about matching the size of varlena - lets reduce it to 2
> > bytes which still gives us up to 1000 digits as we have now.
> 
> Because that will require an extra case in the code that disassembles
> tuples, which will slow down *everything* even in databases that don't
> contain one single NUMERIC value.  I think you need more than "let's
> save 2 bytes in NUMERICs" to justify that.

OK

> > n_weight seems to exist because we do not store trailing zeroes. So
> > 1000000 is stored as 1 with a weight of 6. My experience is that large
> > numbers of trailing zeroes do not occur with any frequency in real
> > measurement or financial data and that this is an over-optimization.
> 
> That seems debatable.  Keep in mind that not storing extra zeroes cuts
> computation time as well as storage.

Check what % difference this makes. 2 bytes on everything makes more
difference than a 1 byte saving on a few percent of values.

> > n_sign_dscale shows us where the decimal point is. We could actually
> > store a marker representing the decimal point, which would cost us 0.5
> > byte rather than 2 bytes. Since we have 4 bits to represent a decimal
> > number, that leaves a few bits spare to represent either a decimal-
> > point-and-positive-sign and decimal-point-and-negative-sign. (We would
> > still need to store trailing zeroes even after the decimal point).
> 
> This is completely bogus.  How are you going to remember the sign except
> by always storing a marker?  ISTM this proposal just moves the
> sign/decimalpoint overhead from one place to another, ie, somewhere in
> the NumericDigit array instead of in a fixed field.

That is exactly my proposal. Thus an overhead of 0.5 bytes rather than 2
bytes, as I explained....but

> Also, you can't drop dscale without abandoning the efficient base-10000
> representation, at least not unless you want people complaining that the
> database shows NUMERIC(3) data with four decimal places.

... I take it I have misunderstood the storage format.

> It might be reasonable to restrict the range of NUMERIC to the point
> that we could fit the weight/sign/dscale into 2 bytes instead of 4,
> thereby saving 2 bytes per NUMERIC.  I'm not excited about the other
> aspects of this, though.

That seems easily doable - it seemed like something would stick.

Restricting total number of digits to 255 and maxscale of 254 would
allow that saving, yes?

We can then have a BIGNUMERIC which would allow up to 1000 digits for
anybody out there that ever got that high. I'm sure there's a few %, so
I won't dismiss you entirely, guys...

Best Regards, Simon Riggs




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Reducing the overhead of NUMERIC data
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Reducing the overhead of NUMERIC data