Re: A varint implementation for PG? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: A varint implementation for PG?
Date
Msg-id 20210803193229.kz6blqcn65a74bam@alap3.anarazel.de
Whole thread Raw
In response to Re: A varint implementation for PG?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: A varint implementation for PG?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2021-08-03 14:26:16 -0400, Robert Haas wrote:
> [ resurrecting this 2-year-old thread ]
> On Fri, Dec 13, 2019 at 12:45 AM Andres Freund <andres@anarazel.de> wrote:
> > I don't think it's ever going to be sensible to transport 64bit quanta
> > of data. Also, uh, it'd be larger than the data a postgres instance
> > could really contain, given LSNs are 64 bit.
> 
> We already use 128-bit integers in some places in the source code, and
> it seems more likely than not that use of 128-bit integer data types
> will continue to expand over time. If we want to represent 64 bit
> integers in a variable number of bits today, it does seem pretty
> likely that some day we will want to do the same thing with 128 bit
> integers, and maybe eventually larger. And I think that all of that is
> true even though exhausting a 64-bit LSN space looks difficult today
> and will likely remain difficult for the foreseeable future.

Yea, my answer here was intended to be about areas where we "bake in", to
quote Craig, the varint format, like varlena tags or protocol messages. For
those a 64bit length tag seems entirely sufficient.

I think e.g. adding an SQL level variable width integer type would potentially
come to a different outcome. But even for things like that I suspect that more
often than not you'd be better off with a variable length length value, rather
than encoding the whole "huge" value as a single variable width integer.


> So to me, the right thing to do here probably depends a bit on the use
> case. If we are talking about using this for strictly internal
> purposes - e.g. cramming stuff into an internal data structure into
> which users have no direct visibility - then I think it's probably
> best to pick a representation that corresponds exactly to the C data
> type that we're trying to store. If we're starting out with a  int64
> and we want to byte-squeeze it, the representation you've chosen here
> seems like just the thing. And it's easy to imagine why we might also
> want to have similar transformations for uint64, int32, and uint32,
> and maybe eventually int128 and uint128. And if the details of the
> format are a little different in each case that's fine; for these
> sorts of use cases we're always going to unpack into the same data
> type that we packed originally, so it's not important to have exact
> compatibility across differences in signedness or bit count.

Agreed.


> However, I suspect that the whole approach should be completely
> revised for a user-visible data type. On the one hand, there's no
> telling how large a value some user will want to represent, so
> limiting ourselves to 64 bits does seem shortsighted. And on the othe
> hand, if we've got a varlena, we already know the length, so it seems
> like we shouldn't also encode the length in the value.

If we're talking varlenas, then I think using embedded varints only really
makes sense if they're "sub-components" of that varlena, not the entire
value...


> Now, we could think of introducing a new format for variable-length
> datums, essentially making this a new typlen rather than a new kind of
> varlena. That might be worth it, because if you are storing a lot of
> values that are small enough that this format could represent them in
> 3 bytes or less, which I think would be everything up to +/- 2^20, you
> could save a pretty significant amount of space even if your table was
> laid out to avoid alignment padding. However, there would be some
> distributed overhead to this, because all the places that have special
> handling for typlen = -1 and typlen = -2 would need to grow new cases
> for this. I'm not sure how much of a problem that is, really, but I
> don't think we can go nuts with adding new typlen values.

Yes - the branches for the different typelens already are quite visible in
profiles... It might be that we could compensate for that without too much
difficulty (e.g. by having a field in a tupdesc indicating what kind of types
are in use in a tuple to be formed/deformed, and dispatching to different
form/deform routines based on that), but it's not obviously a win.



> > It'll be a bit problmeatic to deal with all the casting necessary, and
> > with the likely resulting overload resolution issues.  I'm wondering
> > whether it'd be worthwhile to have a ALTER TABLE ... STORAGE ... option
> > that encodes int2/4/8 as varints when inside a tuple, but otherwise just
> > let it be a normal integer.
> 
> I don't see how a STORAGE option could work. We tend to treat those as
> hints, rather than critical data, which wouldn't work here.

I wasn't thinking that this would be something that could be changed without a
table rewrite - so maybe STORAGE would be the wrong place to put it, given how
it's currently used. OTOH, I don't think it'd be that big a problem to have a
rewrite for some, but not all STORAGE options...


> I think there are a number of problems that crop up, but one of them is the
> same thing we hit with the custom TOAST compression stuff. If you need to
> make a value of some row type out of a HeapTuple, you're not going to know
> which settings were used to create that heap tuple, and you're certainly not
> going to know that when you go to deform that tuple. The only thing you're
> going to have is the tuple descriptor, which AFAICS means that the
> representation needs to be a property of the type, not where the value is
> stored. Maybe you have a clever idea I'm not seeing.

All the information needed for deforming fields is a property of pg_attribute,
not pg_type (because the type could have been dropped after the column was
dropped, but we still need to skip over the column). So if one accepts needing
rewrites for changing the data encoding, I don't think there'd be a huge
issue here.

I think the compression stuff is a bit different because you want to prevent
compressed values of one type wandering into another table with a different
storage type. That problem wouldn't really exist for an SQL level varint as I
was imagining them - they'd always be deformed into the "normal" int2/4/8
Datum representation outside of a HeapTuple. We obviously don't want to do
that for varlenas, because they need to be fetched from out-of-line, which is
several orders of magnitude more expensive than decoding a varint.


I am now wondering if what we're talking about here would best be thought of
not as a variable width integer type, but a variable-width encoding for all
pass-by-value types.

Leaving on-disk compatibility aside (:)), ISTM that we by default could use
the following heuristic to decide how to encode pass-by-value types: If it's a
leading fixed-width NOT NULL type, store it in fixed-length
encoding. Otherwise use a variable-length encoding.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Lowering the ever-growing heap->pd_lower
Next
From: Tomas Vondra
Date:
Subject: Re: Commitfest overflow