Re: Sequence Access Methods, round two - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Sequence Access Methods, round two |
Date | |
Msg-id | Zamz28IYDtidWsCG@paquier.xyz Whole thread Raw |
In response to | Re: Sequence Access Methods, round two (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Sequence Access Methods, round two
|
List | pgsql-hackers |
On Thu, Jan 18, 2024 at 04:54:06PM +0100, Matthias van de Meent wrote: > On Thu, 18 Jan 2024, 16:06 Peter Eisentraut, <peter@eisentraut.org> wrote: >> On 01.12.23 06:00, Michael Paquier wrote: >>> Please find attached a patch set that aims at implementing sequence >>> access methods, with callbacks following a model close to table and >>> index AMs, with a few cases in mind: >>> - Global sequences (including range-allocation, local caching). >>> - Local custom computations (a-la-snowflake). >> >> That's a lot of code, but the use cases are summarized in two lines?!? >> >> I would like to see a lot more elaboration what these uses cases are (I >> recognize the words, but do we have the same interpretation of them?) >> and how they would be addressed by what you are proposing, and better >> yet an actual implementation of something useful, rather than just a >> dummy test module. > > At $prevjob we had a use case for PRNG to generate small, > non-sequential "random" numbers without the birthday problem occurring > in sqrt(option space) because that'd increase the printed length of > the numbers beyond a set limit. The sequence API proposed here > would've been a great alternative to the solution we found, as it > would allow a sequence to be backed by an Linear Congruential > Generator directly, rather than the implementation of our own > transactional random_sequence table. Interesting. Yes, one of the advantages of this API layer is that all the computation is hidden behind a sequence object at the PostgreSQL level, hence applications just need to set a GUC to select a given computation method *while* still using the same DDLs from their application, or just append USING to their CREATE SEQUENCE but I've heard that applications would just do the former and forget about it. The reason why this stuff has bumped into my desk is that we have no good solution in-core for globally-distributed transactions for active-active deployments. First, anything we have needs to be plugged into default expressions of attributes like with [1] or [2], or a tweak is to use sequence values that are computed with different increments to avoid value overlaps across nodes. Both of these require application changes, which is meh for a bunch of users. The second approach with integer-based values can be become particularly a pain if one has to fix value conflicts across nodes as they'd usually require extra tweaks with the sequence definitions, especially if it blocks applications in the middle of the night. Sequence AMs offer more control on that. For example, snowflake IDs can rely on a GUC to set a specific machine ID to force some of the bits of a 64-bit integer to be the same for a single node in an active-active deployment, ensuring that any value computed across *all* the nodes of a cluster are always unique, while being maintained behind a sequence object in-core. (I can post a module to demonstrate that based on the sequence AM APIs, just wait a min.. Having more than a test module and/or a contrib is a separate discussion.) By the way, patches 0001 to 0004 are just refactoring pieces. Particularly, 0001 redesigns pg_sequence_last_value() to work across the board for upgrades and dumps, while avoiding a scan of the sequence "heap" relation in pg_dump. These are improvements for the core code in any case. [1]: https://github.com/pgEdge/snowflake [2]: https://www.postgresql.org/message-id/TY3PR01MB988983D23E4F1DA10567BC5BF5B9A%40TY3PR01MB9889.jpnprd01.prod.outlook.com -- Michael
Attachment
pgsql-hackers by date: