Re: A varint implementation for PG? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: A varint implementation for PG? |
Date | |
Msg-id | 20191213054529.lqhbt63ufdnckyqu@alap3.anarazel.de Whole thread Raw |
In response to | Re: A varint implementation for PG? (Craig Ringer <craig@2ndquadrant.com>) |
Responses |
Re: A varint implementation for PG?
|
List | pgsql-hackers |
Hi, On 2019-12-13 13:31:55 +0800, Craig Ringer wrote: > Am I stabbing completely in the dark when wondering if this might be a step > towards a way to lift the size limit on VARLENA Datums like bytea ? It could be - but I think it'd be a pretty small piece of it. But yes, I have mused about that. > > Even with those caveats, I think that's a pretty good result. Other > > encodings were more expensive. And I think there's definitely some room > > for optimization left. > > > I don't feel at all qualified to question your analysis of the appropriate > representation. But your explanation certainly makes a lot of sense as > someone approaching the topic mostly fresh - I've done a bit with BCD but > not much else. > > I assume we'd be paying a price in padding and alignment in most cases, and > probably more memory copying, but these representations would likely be > appearing mostly in places where other costs are overwhelmingly greater > like network or disk I/O. I don't really see where padding/alignment costs come into play here? > If data lengths longer than that are required for a use case > > > If baking a new variant integer format now, I think limiting it to 64 bits > is probably a mistake given how long-lived PostgreSQL is, and how hard it > can be to change things in the protocol, on disk, etc. I don't think it's ever going to be sensible to transport 64bit quanta of data. Also, uh, it'd be larger than the data a postgres instance could really contain, given LSNs are 64 bit. > > it > > probably is better to either a) use the max-representable 8 byte integer > > as an indicator that the length is stored or b) sacrifice another bit to > > represent whether the integer is the data itself or the length. > I'd be inclined to suspect that (b) is likely worth doing. If nothing else > because not being able to represent the full range of a 64-bit integer in > the variant type is potentially going to be a seriously annoying hassle at > points where we're interacting with places that could use the full width. > We'd then have the potential for variant integers of > 2^64 but at least > that's wholly under our control. I'm very very staunchly against doing either of these for the varints used widely. Throwing away even a bit is quite painful, as it e.g. reduces the range representable in a single byte from 0 - 127/-64 - 63 to 0 - 63/-32 - 31. Without ever being useful, given what kind of things varints are commonly going to describe. There's e.g. simply no practical use of describing a single WAL record length that's bigger than 63 bit can represent. I *can* see a separate varint type, probably sharing some code, that supports storing arbitrarily large numbers. But using that everywhere would be pointless. > I'd be quick to want to expose it to SQL too. It'll be a bit problmeatic to deal with all the casting necessary, and with the likely resulting overload resolution issues. I'm wondering whether it'd be worthwhile to have a ALTER TABLE ... STORAGE ... option that encodes int2/4/8 as varints when inside a tuple, but otherwise just let it be a normal integer. Greetings, Andres Freund
pgsql-hackers by date: