Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Reducing the overhead of NUMERIC data |
Date | |
Msg-id | 1130880137.8300.1648.camel@localhost.localdomain Whole thread Raw |
Responses |
Re: Reducing the overhead of NUMERIC data
Re: Reducing the overhead of NUMERIC data |
List | pgsql-hackers |
Currently, the overhead of NUMERIC datatype is 8 bytes. Each value is stored on disk as typedef struct NumericData {int32 varlen; /* Variable size (std varlena header) */int16 n_weight; /* Weight of 1st digit */uint16 n_sign_dscale; /* Sign + display scale */char n_data[1]; /* Digits (really array of NumericDigit) } NumericData; Let's see if we can reduce that: varlen is int32 to match the standard varlena header. However, the max number of digits of the datatype is less than the threshold at which values get toasted. So no NUMERIC values ever get toasted - in which case, why worry about matching the size of varlena - lets reduce it to 2 bytes which still gives us up to 1000 digits as we have now. n_weight seems to exist because we do not store trailing zeroes. So 1000000 is stored as 1 with a weight of 6. My experience is that large numbers of trailing zeroes do not occur with any frequency in real measurement or financial data and that this is an over-optimization. This is probably a hang over from the original algorithm, rather than a conscious design goal for PostgreSQL? n_sign_dscale shows us where the decimal point is. We could actually store a marker representing the decimal point, which would cost us 0.5 byte rather than 2 bytes. Since we have 4 bits to represent a decimal number, that leaves a few bits spare to represent either a decimal- point-and-positive-sign and decimal-point-and-negative-sign. (We would still need to store trailing zeroes even after the decimal point). So, assuming I have this all correct, means we could reduce the on disk storage for NUMERIC datatypes to the following struct. This gives an overhead of just 2.5 bytes, plus the loss of the optimization of trailing zeroes, which I assess as having almost no value anyway in 99.9999% of data values (literally...). typedef struct NumericData {int16 varlen; /* Variable size (std varlena header) */char n_data[1]; /* Digits (really array of NumericDigit) } NumericData; so that sign/decimal point stored as 0.5 byte in the record. A saving of 5.5 bytes per NUMERIC column per row could make an interesting difference to row length in large tables. The above would require reasonable amount of change to the existing datatype code, so I'm not suggesting its a simple change to an .h file. But the changes as proposed would seem to be able to be made to the existing NUMERIC type, rather than invent another similar one. Is my thinking straight on this? Best Regards, Simon Riggs
pgsql-hackers by date: