Re: Thoughts on NBASE=100000000 - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Thoughts on NBASE=100000000 |
Date | |
Msg-id | d4ba5a57-bee8-4a37-8c36-955d0b1a61ef@app.fastmail.com Whole thread Raw |
In response to | Re: Thoughts on NBASE=100000000 (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
List | pgsql-hackers |
On Mon, Jul 8, 2024, at 12:45, Matthias van de Meent wrote: > On Sun, 7 Jul 2024, 22:40 Joel Jacobson, <joel@compiler.org> wrote: >> Today, since 64-bit architectures are dominant, NBASE=1e8 seems like it would >> have been the best choice, since the square of that still fits in >> a 64-bit signed int. > > Back then 64-bit was by far not as dominant (server and consumer chips > with AMD64 ISA only got released that year after the commit), so I > don't think 1e8 would have been the best choice at that point in time. > Would be better now, yes, but not back then. Oh, grammar mistake by me! I meant to say it "would be the best choice", in line with what I wrote above: >> Last time numeric's base was changed was back in 2003, when d72f6c75038 changed >> it from 10 to 10000. Back then, 32-bit architectures were still dominant, >> so base-10000 was clearly the best choice at this time. >> Changing NBASE might seem impossible at first, due to the existing numeric data >> on disk, and incompatibility issues when numeric data is transferred on the >> wire. >> >> Here are some ideas on how to work around some of these: >> >> - Incrementally changing the data on disk, e.g. upon UPDATE/INSERT >> and supporting both NBASE=1e4 (16-bit) and NBASE=1e8 (32-bit) >> when reading data. > > I think that a dynamic decision would make more sense here. At low > precision, the overhead of 4+1 bytes vs 2 bytes is quite significant. > This sounds important for overall storage concerns, especially if the > padding bytes (mentioned below) are added to indicate types. Right, I agree. Another idea: It seems possible to reduce the disk space for numerics that fit into one byte, i.e. 0 <= val <= 255, which could be communicated via NUMERIC_NBYTES=1. At least the value 0 should be quite common. >> - Due to the lack of a version field in the NumericVar struct, >> we need a way to detect if a Numeric value on disk uses >> the existing NBASE=1e4, or NBASE=1e8. >> One hack I've thought about is to exploit the fact that NUMERIC_NBYTES, >> defined as: >> #define NUMERIC_NBYTES(num) (VARSIZE(num) - NUMERIC_HEADER_SIZE(num)) >> will always be divisible by two, since a NumericDigit is an int16 (2 bytes). >> The idea is then to let "NUMERIC_NBYTES divisible by three" >> indicate NBASE=1e8, at the cost of one to three extra padding bytes. > > Do you perhaps mean NUMERIC_NBYTES *not divisible by 2*, i.e. an > uneven NUMERIC_NBYTES as indicator for NBASE=1e8, rather than only > multiples of 3? Oh, yes of course! Thinko. > While I don't think this is worth implementing for general usage, it > could be worth exploring for the larger numeric values, where the > relative overhead of the larger representation is lower. Yes, I agree it's definitively seems like a win for larger numeric values. Not sure about smaller numeric values, maybe it's possible to improve upon. Regards, Joel
pgsql-hackers by date: