Re: Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Reducing the overhead of NUMERIC data
Date
Msg-id 28566.1130882051@sss.pgh.pa.us
Whole thread Raw
In response to Reducing the overhead of NUMERIC data  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Reducing the overhead of NUMERIC data
Re: Reducing the overhead of NUMERIC data
List pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> writes:
> varlen is int32 to match the standard varlena header. However, the max
> number of digits of the datatype is less than the threshold at which
> values get toasted. So no NUMERIC values ever get toasted - in which
> case, why worry about matching the size of varlena - lets reduce it to 2
> bytes which still gives us up to 1000 digits as we have now.

Because that will require an extra case in the code that disassembles
tuples, which will slow down *everything* even in databases that don't
contain one single NUMERIC value.  I think you need more than "let's
save 2 bytes in NUMERICs" to justify that.

> n_weight seems to exist because we do not store trailing zeroes. So
> 1000000 is stored as 1 with a weight of 6. My experience is that large
> numbers of trailing zeroes do not occur with any frequency in real
> measurement or financial data and that this is an over-optimization.

That seems debatable.  Keep in mind that not storing extra zeroes cuts
computation time as well as storage.

> n_sign_dscale shows us where the decimal point is. We could actually
> store a marker representing the decimal point, which would cost us 0.5
> byte rather than 2 bytes. Since we have 4 bits to represent a decimal
> number, that leaves a few bits spare to represent either a decimal-
> point-and-positive-sign and decimal-point-and-negative-sign. (We would
> still need to store trailing zeroes even after the decimal point).

This is completely bogus.  How are you going to remember the sign except
by always storing a marker?  ISTM this proposal just moves the
sign/decimalpoint overhead from one place to another, ie, somewhere in
the NumericDigit array instead of in a fixed field.

Also, you can't drop dscale without abandoning the efficient base-10000
representation, at least not unless you want people complaining that the
database shows NUMERIC(3) data with four decimal places.

It might be reasonable to restrict the range of NUMERIC to the point
that we could fit the weight/sign/dscale into 2 bytes instead of 4,
thereby saving 2 bytes per NUMERIC.  I'm not excited about the other
aspects of this, though.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Dave Page"
Date:
Subject: Attn: Richard Huxton
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Reducing the overhead of NUMERIC data