NUMERIC type efficiency problem - Mailing list pgsql-hackers

From Mark Butler
Subject NUMERIC type efficiency problem
Date
Msg-id 3AD68AEC.2E1613A9@middle.net
Whole thread Raw
Responses Re: NUMERIC type efficiency problem  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I noticed the storage format for the numeric type is rather inefficient: 

typedef struct NumericData
{   int32       varlen;         /* Variable size        */   int16       n_weight;       /* Weight of 1st digit  */
uint16     n_rscale;       /* Result scale         */   uint16      n_sign_dscale;  /* Sign + display scale */
unsignedchar n_data[1];    /* Digit data (2 decimal digits/byte) */
 
} NumericData;
typedef NumericData *Numeric;

Oracle uses a similar variable length format for all numeric types, and they
document its storage requirement as 2 + (sig digits/2) bytes.  One byte is
used for the column length, one byte for the exponent, a variable number of
bytes for the significant digits. 

A zero value uses two bytes total in Oracle, where in the current version of
PostgreSQL it uses ten bytes.  Given the pending demise of the money type, the
remaining alternative is rather wasteful for use in large financial
applications.

Is there a reason why varlen has to be an int32?  uint8 would be more than
enough.  The other three fields could be int8 as well.  I do not understand
why we need four header fields - a much more efficient decimal type could be
implemented as follows:

typedef struct DecimalData
{int8 varlen;                /* variable size */int8 d_sign_exponent;       /* 1 bit sign, 7 bit exponent */int8
d_mantissa[1];        /* variable precision binary integer mantissa */
 
};

Value represented is (-1 ^ sign)*(mantissa)*(10 ^ exponent).

This would be more space efficient than Oracle and would support precisions up
to DECIMAL(63).  Having a reasonable maximum precision would allow a fixed
length internal representation which make processing *much* faster* by using
binary arithmetic and eliminating the necessity to palloc() buffers for every
temporary result.  

(Aside: Doesn't the current numeric type use up memory in a hurry in a large
sum(numeric_column) query? - Or are all those digitbuf_free()'s actually being
cleaned up? And shouldn't the type operator calling convention be changed to
pass a result buffer so these palloc()'s could be mostly avoided? )

As an even faster, lower max precision alternative:

typedef struct FastDecimalData 
{int64 fd_mantissa;  int8  fd_sign;int8  fd_exponent;
};

Value represented is (-1 ^ sign)*(mantissa)*(10 ^ exponent).

This would support precisions up to DECIMAL(18).   Intermediate results could
be stored using a 128 bit format to avoid loss of precision.

Any comments?
 - Mark Butler


pgsql-hackers by date:

Previous
From: Mark Butler
Date:
Subject: Re: DOUBLE synonym for DOUBLE PRECISION
Next
From: Hiroshi Inoue
Date:
Subject: Re: ALTER TABLE MODIFY COLUMN