Home > mailing lists

Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Reducing the overhead of NUMERIC data
Date	November 1, 2005 17:24:33
Msg-id	1130880137.8300.1648.camel@localhost.localdomain Whole thread Raw
Responses	Re: Reducing the overhead of NUMERIC data Re: Reducing the overhead of NUMERIC data
List	pgsql-hackers

Tree view

Currently, the overhead of NUMERIC datatype is 8 bytes. Each value is
stored on disk as

typedef struct NumericData
{int32 varlen; /* Variable size (std varlena header) */int16 n_weight; /* Weight of 1st digit
*/uint16 n_sign_dscale; /* Sign + display scale */char n_data[1]; /* Digits (really array of NumericDigit)

} NumericData;

Let's see if we can reduce that:

varlen is int32 to match the standard varlena header. However, the max
number of digits of the datatype is less than the threshold at which
values get toasted. So no NUMERIC values ever get toasted - in which
case, why worry about matching the size of varlena - lets reduce it to 2
bytes which still gives us up to 1000 digits as we have now.

n_weight seems to exist because we do not store trailing zeroes. So
1000000 is stored as 1 with a weight of 6. My experience is that large
numbers of trailing zeroes do not occur with any frequency in real
measurement or financial data and that this is an over-optimization.
This is probably a hang over from the original algorithm, rather than a
conscious design goal for PostgreSQL?

n_sign_dscale shows us where the decimal point is. We could actually
store a marker representing the decimal point, which would cost us 0.5
byte rather than 2 bytes. Since we have 4 bits to represent a decimal
number, that leaves a few bits spare to represent either a decimal-
point-and-positive-sign and decimal-point-and-negative-sign. (We would
still need to store trailing zeroes even after the decimal point).

So, assuming I have this all correct, means we could reduce the on disk
storage for NUMERIC datatypes to the following struct. This gives an
overhead of just 2.5 bytes, plus the loss of the optimization of
trailing zeroes, which I assess as having almost no value anyway in
99.9999% of data values (literally...).

typedef struct NumericData
{int16 varlen; /* Variable size (std varlena header) */char n_data[1]; /* Digits (really array of
NumericDigit)
} NumericData;
so that sign/decimal point stored as 0.5 byte in the record.

A saving of 5.5 bytes per NUMERIC column per row could make an
interesting difference to row length in large tables.

The above would require reasonable amount of change to the existing
datatype code, so I'm not suggesting its a simple change to an .h file.
But the changes as proposed would seem to be able to be made to the
existing NUMERIC type, rather than invent another similar one.

Is my thinking straight on this?

Best Regards, Simon Riggs

pgsql-hackers by date:

From: Alvaro Herrera
Date: 01 November 2005, 16:48:47
Subject: Re: slru.c race condition

From: "Dave Page"
Date: 01 November 2005, 17:33:18
Subject: Attn: Richard Huxton

Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers

Previous

Next