Re: A varint implementation for PG? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: A varint implementation for PG?
Date
Msg-id 20210804174141.laucav2vfo3yp6jx@alap3.anarazel.de
Whole thread Raw
In response to Re: A varint implementation for PG?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2021-08-04 09:31:25 -0400, Robert Haas wrote:
> This is pretty integer-centric, though. If your pass-by-value type is
> storing timestamps, for example, they're not likely to be especially
> close to zero. Since a 64-bit address is pretty big, perhaps they're
> still close enough to zero that this will work out to a win, but I
> don't know, that seems a bit cheesy.

Yea, that's fair. The really bad™ example probably is negative numbers - which
wouldn't be easy to do something about in a datatype agnostic way.


> I grant that it could work out to a win -- pass-by-value data types whose
> distribution is very different from what's typical for integers, or for that
> matter columns full of integers that all happen to be toward the extreme
> values the data type can store, are probably not that common.

It'd work out as a wash for common timestamps:

./varint_test -u 681413261095983
processing unsigned
unsigned:    681413261095983
  input bytes:     00 02  6b bd  e3 5f  74 2f
8 output bytes:     01 02  6b bd  e3 5f  74 2f
decoded:    681413261095983

I don't think there's many workloads where plain integers would skew extreme
enough for it to work out to a loss often enough to matter. But:

> I just don't really like making such assumptions on a system-wide basis (as
> opposed to a per-datatype basis where it's easier to reason about the
> consequences).

I'd not at all be opposed to datatypes having influence over the on-disk
encoding. I was just musing about a default heuristic that could make sense. I
do think you'd want something that chooses the encoding for one pg_attribute
values based on preceding columns.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: RFC: Improve CPU cache locality of syscache searches
Next
From: Pavel Borisov
Date:
Subject: Re: Commitfest overflow