Re: reducing NUMERIC size for 9.1 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: reducing NUMERIC size for 9.1
Date
Msg-id AANLkTindFE6p5owv7ZARW222xgistX-QqVLEWk6UjY28@mail.gmail.com
Whole thread Raw
In response to Re: reducing NUMERIC size for 9.1  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: reducing NUMERIC size for 9.1
List pgsql-hackers
On Wed, Jul 28, 2010 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jul 16, 2010 at 2:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I don't like the way you did that either (specifically, not the kluge
>>> in NUMERIC_DIGITS()).  It would probably work better if you declared
>>> two different structs, or a union of same, to represent the two layout
>>> cases.
>>>
>>> n_sign_dscale is now pretty inappropriately named, probably better to
>>> change the field name.  This will also help to catch anything that's
>>> not using the macros.  (Renaming the n_weight field, or at least burying
>>> it in an extra level of struct, would be helpful for the same reason.)
>
>> I'm not sure what you have in mind here.  If we create a union of two
>> structs, we'll still have to pick one of them to use to check the high
>> bits of the first word, so I'm not sure we'll be adding all that much
>> in terms of clarity.
>
> No, you can do something like this:
>
> typedef struct numeric_short
> {
>        uint16  word1;
>        NumericDigit digits[1];
> } numeric_short;
>
> typedef struct numeric_long
> {
>        uint16  word1;
>        int16   weight;
>        NumericDigit digits[1];
> } numeric_long;
>
> typedef union numeric
> {
>        uint16  word1;
>        numeric_short   short;
>        numeric_long    long;
> } numeric;

That doesn't quite work because there's also a varlena header that has
to be accounted for, so the third member of the union can't be a
simple uint16.  I'm wondering if it makes sense to do something along
these lines:

typedef struct NumericData
{   int32           varlen;   int16           n_header;   union {       struct {           char n_data[1];       }
short;      struct {           uint16 n_weight;           char n_data[1];       } long;   }; 
} NumericData;

Why n_data as char[1] instead of NumericDigit, you ask?  It's that way
now, mostly I think so that the rest of the system isn't allowed to
know what underlying type is being used for NumericDigit; it looks
like previously it was signed char, but now it's int16.

>>> It seems like you've handled the NAN case a bit awkwardly.  Since the
>>> weight is uninteresting for a NAN, it's okay to not store the weight
>>> field, so I think what you should do is consider that the dscale field
>>> is still full-width, ie the format of the first word remains old-style
>>> not new-style.  I don't remember whether dscale is meaningful for a NAN,
>>> but if it is, your approach is constraining what is possible to store,
>>> and is also breaking compatibility with old databases.
>
>> There is only one NaN value.  Neither weight or dscale is meaningful.
>> I think if the high two bits of the first word are 11 we never examine
>> anything else - do you see somewhere that we're doing otherwise?
>
> I hadn't actually looked.  I think though that it's a mistake to break
> compatibility on both dscale and weight when you only need to break one.
> Also, weight is *certainly* uninteresting for NaNs since it's not even
> meaningful unless there are digits.  dscale could conceivably be worth
> something.

I don't think I'm breaking compatibility on anything.  Can you clarify
what part of the code you're referring to here?  I'm sort of lost.

>>> The sign extension code in the NUMERIC_WEIGHT() macro seems a bit
>>> awkward; I wonder if there's a better way.  One solution might be to
>>> offset the value (ie, add or subtract NUMERIC_SHORT_WEIGHT_MIN) rather
>>> than try to sign-extend per se.
>
>> Hmm... so, if the weight is X we store the value
>> X-NUMERIC_SHORT_WEIGHT_MIN as an unsigned integer?  That's kind of a
>> funny representation - I *think* it works out to sign extension with
>> the high bit flipped.  I guess we could do it that way, but it might
>> make it harder/more confusing to do bit arithmetic with the weight
>> sign bit later on.
>
> Yeah, it was just an idea.  It seems like there should be an easier way
> to extract the sign-extended value, though.

It seemed a bit awkward to me, too, but I'm not sure there's a better one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: string_to_array has to be stable?
Next
From: Robert Haas
Date:
Subject: Re: string_to_array has to be stable?