Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

From Simon Riggs
Subject Reducing the overhead of NUMERIC data
Date
Msg-id 1130880137.8300.1648.camel@localhost.localdomain
Whole thread Raw
Responses Re: Reducing the overhead of NUMERIC data  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Reducing the overhead of NUMERIC data  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Currently, the overhead of NUMERIC datatype is 8 bytes. Each value is
stored on disk as

typedef struct NumericData
{int32    varlen;        /* Variable size (std varlena header) */int16    n_weight;    /* Weight of 1st digit
*/uint16   n_sign_dscale;    /* Sign + display scale */char    n_data[1];    /* Digits (really array of NumericDigit)
 
} NumericData;

Let's see if we can reduce that:

varlen is int32 to match the standard varlena header. However, the max
number of digits of the datatype is less than the threshold at which
values get toasted. So no NUMERIC values ever get toasted - in which
case, why worry about matching the size of varlena - lets reduce it to 2
bytes which still gives us up to 1000 digits as we have now.

n_weight seems to exist because we do not store trailing zeroes. So
1000000 is stored as 1 with a weight of 6. My experience is that large
numbers of trailing zeroes do not occur with any frequency in real
measurement or financial data and that this is an over-optimization.
This is probably a hang over from the original algorithm, rather than a
conscious design goal for PostgreSQL?

n_sign_dscale shows us where the decimal point is. We could actually
store a marker representing the decimal point, which would cost us 0.5
byte rather than 2 bytes. Since we have 4 bits to represent a decimal
number, that leaves a few bits spare to represent either a decimal-
point-and-positive-sign and decimal-point-and-negative-sign. (We would
still need to store trailing zeroes even after the decimal point).

So, assuming I have this all correct, means we could reduce the on disk
storage for NUMERIC datatypes to the following struct. This gives an
overhead of just 2.5 bytes, plus the loss of the optimization of
trailing zeroes, which I assess as having almost no value anyway in
99.9999% of data values (literally...).

typedef struct NumericData
{int16    varlen;        /* Variable size (std varlena header) */char    n_data[1];    /* Digits (really array of
NumericDigit)
} NumericData;
so that sign/decimal point stored as 0.5 byte in the record.

A saving of 5.5 bytes per NUMERIC column per row could make an
interesting difference to row length in large tables.

The above would require reasonable amount of change to the existing
datatype code, so I'm not suggesting its a simple change to an .h file.
But the changes as proposed would seem to be able to be made to the
existing NUMERIC type, rather than invent another similar one.

Is my thinking straight on this?

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: slru.c race condition
Next
From: "Dave Page"
Date:
Subject: Attn: Richard Huxton