Home > mailing lists

Re: Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Reducing the overhead of NUMERIC data
Date	November 1, 2005 17:54:13
Msg-id	28566.1130882051@sss.pgh.pa.us Whole thread Raw
In response to	Reducing the overhead of NUMERIC data (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: Reducing the overhead of NUMERIC data Re: Reducing the overhead of NUMERIC data
List	pgsql-hackers

Tree view

Simon Riggs <simon@2ndquadrant.com> writes:
> varlen is int32 to match the standard varlena header. However, the max
> number of digits of the datatype is less than the threshold at which
> values get toasted. So no NUMERIC values ever get toasted - in which
> case, why worry about matching the size of varlena - lets reduce it to 2
> bytes which still gives us up to 1000 digits as we have now.

Because that will require an extra case in the code that disassembles
tuples, which will slow down *everything* even in databases that don't
contain one single NUMERIC value.  I think you need more than "let's
save 2 bytes in NUMERICs" to justify that.

> n_weight seems to exist because we do not store trailing zeroes. So
> 1000000 is stored as 1 with a weight of 6. My experience is that large
> numbers of trailing zeroes do not occur with any frequency in real
> measurement or financial data and that this is an over-optimization.

That seems debatable.  Keep in mind that not storing extra zeroes cuts
computation time as well as storage.

> n_sign_dscale shows us where the decimal point is. We could actually
> store a marker representing the decimal point, which would cost us 0.5
> byte rather than 2 bytes. Since we have 4 bits to represent a decimal
> number, that leaves a few bits spare to represent either a decimal-
> point-and-positive-sign and decimal-point-and-negative-sign. (We would
> still need to store trailing zeroes even after the decimal point).

This is completely bogus.  How are you going to remember the sign except
by always storing a marker?  ISTM this proposal just moves the
sign/decimalpoint overhead from one place to another, ie, somewhere in
the NumericDigit array instead of in a fixed field.

Also, you can't drop dscale without abandoning the efficient base-10000
representation, at least not unless you want people complaining that the
database shows NUMERIC(3) data with four decimal places.

It might be reasonable to restrict the range of NUMERIC to the point
that we could fit the weight/sign/dscale into 2 bytes instead of 4,
thereby saving 2 bytes per NUMERIC.  I'm not excited about the other
aspects of this, though.
        regards, tom lane

pgsql-hackers by date:

From: "Dave Page"
Date: 01 November 2005, 17:33:18
Subject: Attn: Richard Huxton

From: Martijn van Oosterhout
Date: 01 November 2005, 18:17:06
Subject: Re: Reducing the overhead of NUMERIC data

Re: Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

Previous

Next