Re: Reducing the overhead of NUMERIC data - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Reducing the overhead of NUMERIC data |
Date | |
Msg-id | 1130885167.8300.1702.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Reducing the overhead of NUMERIC data (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Tue, 2005-11-01 at 16:54 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > varlen is int32 to match the standard varlena header. However, the max > > number of digits of the datatype is less than the threshold at which > > values get toasted. So no NUMERIC values ever get toasted - in which > > case, why worry about matching the size of varlena - lets reduce it to 2 > > bytes which still gives us up to 1000 digits as we have now. > > Because that will require an extra case in the code that disassembles > tuples, which will slow down *everything* even in databases that don't > contain one single NUMERIC value. I think you need more than "let's > save 2 bytes in NUMERICs" to justify that. OK > > n_weight seems to exist because we do not store trailing zeroes. So > > 1000000 is stored as 1 with a weight of 6. My experience is that large > > numbers of trailing zeroes do not occur with any frequency in real > > measurement or financial data and that this is an over-optimization. > > That seems debatable. Keep in mind that not storing extra zeroes cuts > computation time as well as storage. Check what % difference this makes. 2 bytes on everything makes more difference than a 1 byte saving on a few percent of values. > > n_sign_dscale shows us where the decimal point is. We could actually > > store a marker representing the decimal point, which would cost us 0.5 > > byte rather than 2 bytes. Since we have 4 bits to represent a decimal > > number, that leaves a few bits spare to represent either a decimal- > > point-and-positive-sign and decimal-point-and-negative-sign. (We would > > still need to store trailing zeroes even after the decimal point). > > This is completely bogus. How are you going to remember the sign except > by always storing a marker? ISTM this proposal just moves the > sign/decimalpoint overhead from one place to another, ie, somewhere in > the NumericDigit array instead of in a fixed field. That is exactly my proposal. Thus an overhead of 0.5 bytes rather than 2 bytes, as I explained....but > Also, you can't drop dscale without abandoning the efficient base-10000 > representation, at least not unless you want people complaining that the > database shows NUMERIC(3) data with four decimal places. ... I take it I have misunderstood the storage format. > It might be reasonable to restrict the range of NUMERIC to the point > that we could fit the weight/sign/dscale into 2 bytes instead of 4, > thereby saving 2 bytes per NUMERIC. I'm not excited about the other > aspects of this, though. That seems easily doable - it seemed like something would stick. Restricting total number of digits to 255 and maxscale of 254 would allow that saving, yes? We can then have a BIGNUMERIC which would allow up to 1000 digits for anybody out there that ever got that high. I'm sure there's a few %, so I won't dismiss you entirely, guys... Best Regards, Simon Riggs
pgsql-hackers by date: