Home > mailing lists

NUMERIC type efficiency problem - Mailing list pgsql-hackers

From	Mark Butler
Subject	NUMERIC type efficiency problem
Date	April 13, 2001 04:38:07
Msg-id	3AD68AEC.2E1613A9@middle.net Whole thread Raw
Responses	Re: NUMERIC type efficiency problem (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

I noticed the storage format for the numeric type is rather inefficient: 

typedef struct NumericData
{   int32       varlen;         /* Variable size        */   int16       n_weight;       /* Weight of 1st digit  */
uint16     n_rscale;       /* Result scale         */   uint16      n_sign_dscale;  /* Sign + display scale */
unsignedchar n_data[1];    /* Digit data (2 decimal digits/byte) */
 
} NumericData;
typedef NumericData *Numeric;

Oracle uses a similar variable length format for all numeric types, and they
document its storage requirement as 2 + (sig digits/2) bytes.  One byte is
used for the column length, one byte for the exponent, a variable number of
bytes for the significant digits. 

A zero value uses two bytes total in Oracle, where in the current version of
PostgreSQL it uses ten bytes.  Given the pending demise of the money type, the
remaining alternative is rather wasteful for use in large financial
applications.

Is there a reason why varlen has to be an int32?  uint8 would be more than
enough.  The other three fields could be int8 as well.  I do not understand
why we need four header fields - a much more efficient decimal type could be
implemented as follows:

typedef struct DecimalData
{int8 varlen;                /* variable size */int8 d_sign_exponent;       /* 1 bit sign, 7 bit exponent */int8
d_mantissa[1];        /* variable precision binary integer mantissa */
 
};

Value represented is (-1 ^ sign)*(mantissa)*(10 ^ exponent).

This would be more space efficient than Oracle and would support precisions up
to DECIMAL(63).  Having a reasonable maximum precision would allow a fixed
length internal representation which make processing *much* faster* by using
binary arithmetic and eliminating the necessity to palloc() buffers for every
temporary result.  

(Aside: Doesn't the current numeric type use up memory in a hurry in a large
sum(numeric_column) query? - Or are all those digitbuf_free()'s actually being
cleaned up? And shouldn't the type operator calling convention be changed to
pass a result buffer so these palloc()'s could be mostly avoided? )

As an even faster, lower max precision alternative:

typedef struct FastDecimalData 
{int64 fd_mantissa;  int8  fd_sign;int8  fd_exponent;
};

Value represented is (-1 ^ sign)*(mantissa)*(10 ^ exponent).

This would support precisions up to DECIMAL(18).   Intermediate results could
be stored using a 128 bit format to avoid loss of precision.

Any comments?
 - Mark Butler

pgsql-hackers by date:

From: Mark Butler
Date: 13 April 2001, 04:25:17
Subject: Re: DOUBLE synonym for DOUBLE PRECISION

From: Hiroshi Inoue
Date: 13 April 2001, 04:50:42
Subject: Re: ALTER TABLE MODIFY COLUMN

NUMERIC type efficiency problem - Mailing list pgsql-hackers

Previous

Next