Re: A varint implementation for PG? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: A varint implementation for PG?
Date
Msg-id 20191213054529.lqhbt63ufdnckyqu@alap3.anarazel.de
Whole thread Raw
In response to Re: A varint implementation for PG?  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: A varint implementation for PG?
List pgsql-hackers
Hi,

On 2019-12-13 13:31:55 +0800, Craig Ringer wrote:
> Am I stabbing completely in the dark when wondering if this might be a step
> towards a way to lift the size limit on VARLENA Datums like bytea ?

It could be - but I think it'd be a pretty small piece of it. But yes, I
have mused about that.



> > Even with those caveats, I think that's a pretty good result. Other
> > encodings were more expensive. And I think there's definitely some room
> > for optimization left.
> 
> 
> I don't feel at all qualified to question your analysis of the appropriate
> representation. But your explanation certainly makes a lot of sense as
> someone approaching the topic mostly fresh - I've done a bit with BCD but
> not much else.
> 
> I assume we'd be paying a price in padding and alignment in most cases, and
> probably more memory copying, but these representations would likely be
> appearing mostly in places where other costs are overwhelmingly greater
> like network or disk I/O.

I don't really see where padding/alignment costs come into play here?



> If data lengths longer than that are required for a use case
> 
> 
> If baking a new variant integer format now, I think limiting it to 64 bits
> is probably a mistake given how long-lived PostgreSQL is, and how hard it
> can be to change things in the protocol, on disk, etc.

I don't think it's ever going to be sensible to transport 64bit quanta
of data. Also, uh, it'd be larger than the data a postgres instance
could really contain, given LSNs are 64 bit.



> > it
> > probably is better to either a) use the max-representable 8 byte integer
> > as an indicator that the length is stored or b) sacrifice another bit to
> > represent whether the integer is the data itself or the length.

> I'd be inclined to suspect that (b) is likely worth doing. If nothing else
> because not being able to represent the full range of a 64-bit integer in
> the variant type is potentially going to be a seriously annoying hassle at
> points where we're interacting with places that could use the full width.
> We'd then have the potential for variant integers of > 2^64 but at least
> that's wholly under our control.

I'm very very staunchly against doing either of these for the varints
used widely. Throwing away even a bit is quite painful, as it
e.g. reduces the range representable in a single byte from 0 - 127/-64 -
63 to 0 - 63/-32 - 31.  Without ever being useful, given what kind of
things varints are commonly going to describe. There's e.g. simply no
practical use of describing a single WAL record length that's bigger
than 63 bit can represent.

I *can* see a separate varint type, probably sharing some code, that
supports storing arbitrarily large numbers. But using that everywhere
would be pointless.


> I'd be quick to want to expose it to SQL too.

It'll be a bit problmeatic to deal with all the casting necessary, and
with the likely resulting overload resolution issues.  I'm wondering
whether it'd be worthwhile to have a ALTER TABLE ... STORAGE ... option
that encodes int2/4/8 as varints when inside a tuple, but otherwise just
let it be a normal integer.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: pg_ls_tmpdir to show shared filesets
Next
From: Craig Ringer
Date:
Subject: Re: Questions about PostgreSQL implementation details