Re: Thoughts on NBASE=100000000 - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Thoughts on NBASE=100000000
Date
Msg-id d4ba5a57-bee8-4a37-8c36-955d0b1a61ef@app.fastmail.com
Whole thread Raw
In response to Re: Thoughts on NBASE=100000000  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
On Mon, Jul 8, 2024, at 12:45, Matthias van de Meent wrote:
> On Sun, 7 Jul 2024, 22:40 Joel Jacobson, <joel@compiler.org> wrote:
>> Today, since 64-bit architectures are dominant, NBASE=1e8 seems like it would
>> have been the best choice, since the square of that still fits in
>> a 64-bit signed int.
>
> Back then 64-bit was by far not as dominant (server and consumer chips
> with AMD64 ISA only got released that year after the commit), so I
> don't think 1e8 would have been the best choice at that point in time.
> Would be better now, yes, but not back then.

Oh, grammar mistake by me!
I meant to say it "would be the best choice", in line with what I wrote above:

>> Last time numeric's base was changed was back in 2003, when d72f6c75038 changed
>> it from 10 to 10000. Back then, 32-bit architectures were still dominant,
>> so base-10000 was clearly the best choice at this time.

>> Changing NBASE might seem impossible at first, due to the existing numeric data
>> on disk, and incompatibility issues when numeric data is transferred on the
>> wire.
>>
>> Here are some ideas on how to work around some of these:
>>
>> - Incrementally changing the data on disk, e.g. upon UPDATE/INSERT
>> and supporting both NBASE=1e4 (16-bit) and NBASE=1e8 (32-bit)
>> when reading data.
>
> I think that a dynamic decision would make more sense here. At low
> precision, the overhead of 4+1 bytes vs 2 bytes is quite significant.
> This sounds important for overall storage concerns, especially if the
> padding bytes (mentioned below) are added to indicate types.

Right, I agree.

Another idea: It seems possible to reduce the disk space for numerics
that fit into one byte, i.e. 0 <= val <= 255, which could be communicated
via NUMERIC_NBYTES=1.
At least the value 0 should be quite common.

>> - Due to the lack of a version field in the NumericVar struct,
>> we need a way to detect if a Numeric value on disk uses
>> the existing NBASE=1e4, or NBASE=1e8.
>> One hack I've thought about is to exploit the fact that NUMERIC_NBYTES,
>> defined as:
>>     #define NUMERIC_NBYTES(num) (VARSIZE(num) - NUMERIC_HEADER_SIZE(num))
>> will always be divisible by two, since a NumericDigit is an int16 (2 bytes).
>> The idea is then to let "NUMERIC_NBYTES divisible by three"
>> indicate NBASE=1e8, at the cost of one to three extra padding bytes.
>
> Do you perhaps mean NUMERIC_NBYTES *not divisible by 2*, i.e. an
> uneven NUMERIC_NBYTES as indicator for NBASE=1e8, rather than only
> multiples of 3?

Oh, yes of course! Thinko.

> While I don't think this is worth implementing for general usage, it
> could be worth exploring for the larger numeric values, where the
> relative overhead of the larger representation is lower.

Yes, I agree it's definitively seems like a win for larger numeric values.
Not sure about smaller numeric values, maybe it's possible
to improve upon.

Regards,
Joel



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Parallel CREATE INDEX for GIN indexes
Next
From: ikedarintarof
Date:
Subject: Re: doc: modify the comment in function libpqrcv_check_conninfo()