Re: Sequence Access Methods, round two - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Sequence Access Methods, round two
Date
Msg-id ZitH6eYaYXsAa05R@paquier.xyz
Whole thread Raw
In response to Re: Sequence Access Methods, round two  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Sequence Access Methods, round two
List pgsql-hackers
On Fri, Apr 19, 2024 at 04:00:28PM +0900, Michael Paquier wrote:
> I have plans to rework this patch set for the next commit fest,
> and this includes some investigation about custom data types that
> could be plugged into these AMs.

So, I have worked more on this patch set, and finished by reorganizing
it more, with more things:
- The in-core sequence access method is split into more files:
-- One for its callbacks, called seqlocalam.c.
-- The WAL replay routines are moved into their own file.
- As asked, Implementation of a contrib module that introduces a
sequence access method for snowflake IDs, to demonstrate what can be
done using the APIs of the patch.  The data of such sequences is
stored in an unlogged table, based on the assumption that the
timestamps and the machine IDs ensure the unicity of the IDs for the
sequences.  The advantage of what's presented here is that support for
lastval(), nextval() and currval() is straight-forward.  Identity
columns are able to feed on that.  cache is handled by sequence.c, not
the AM.  WAL-logging is needed for the init fork, it goes through the
generic WAL APIs like bloom to log full pages.  Some docs are
included.  This is still a rough WIP, though, and the buffer handling
is not optimal, and could be made transactional this time (assuming
autovacuum is able to process them at some point, or perhaps the
sequence AMs should offer a way for autovacuum to know if such
sequences should be cleaned up or not).

After having done that, I've also found about a module developed by
pgEdge, that copies a bunch of the code from sequence.c, though it is
not able to handle the sequence cache:
https://github.com/pgEdge/snowflake

The approach this module uses is quite similar to what I have here,
but it is smarter regarding clock ticking, where the internal sequence
counter is bumped only when we fetch the same timestamp as a previous
attempt.  The module presented could be switched to do something
similar by storing into the heap table used by the sequence a bit more
data than just the sequence counter.  Well, the point I want to make
at this stage is what can be done with sequence AMs, so let's discuss
about that later.

Finally, custom types, where I have come up with a list of open
questions:
- Catalog representation.  pg_sequence and pg_sequences switch to
something else than int64.
- The existing functions are also interesting to consider here.
nextval() & co would not be usable as they are for sequence AMs that
use more than int64.  Note that the current design taken in the patch
has a strong dependency on int64 (see sequenceam.h).  So the types
would need to reflect.  With identity columns, the change would not be
that hard as the executor has NextValueExpr.  Perhaps each sequence AM
should just have callback equivalents for currval(), nextval() and
lastval().  This hits with the fact that this makes sequence AMs less
transparent to applications because custom data types means different
functions than the native ones.
- Option representation.
- I have polled Twitter and Fosstodon with four choices:
-- No need for that, 64b representation is enough.
-- Possibility to have integer-like types (MSSQL does something like
that).
-- Support for 128b or larger (UUIDs, etc, with text-like
representation or varlenas).
-- Support for binary representation, which comes down to the
possibility of having sequence values even larger than 128b.

Based on the first estimations, 50%-ish of people mentioned than 64b
is more than enough, while Jelte mentioned that Citus has tackled this
problem with an equivalent of 128b (64b for the sequence values, some
more for machine states).  Then there's a trend of 25%-ish in favor of
128b and 25%-ish for more than that.  The results are far from being
final, but that's something.

My own take after pondering about it is that 64b is still more than
enough for the clustering cases I've seen in the past 15 years or so,
while offering room for implementations even if it comes to thousands
of nodes.  So there's some margin depending on the number of bits
reserved for the "machine" part of the sequence IDs when used in
clusters.

The next plan is to hopefully be able to trigger a discussion at the
next pgconf.dev at the end of May, but let's see how it goes.

Thanks,
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Row pattern recognition
Next
From: Richard Guo
Date:
Subject: Re: Short-circuit sort_inner_and_outer if there are no mergejoin clauses