Re: Sequence Access Methods, round two - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Sequence Access Methods, round two |
Date | |
Msg-id | ZnPIUPMmp5TzBPC2@paquier.xyz Whole thread Raw |
In response to | Re: Sequence Access Methods, round two (Michael Paquier <michael@paquier.xyz>) |
List | pgsql-hackers |
On Fri, Apr 26, 2024 at 03:21:29PM +0900, Michael Paquier wrote: > The next plan is to hopefully be able to trigger a discussion at the > next pgconf.dev at the end of May, but let's see how it goes. I am a bit behind an update of this thread, but there has been an unconference on the topic at the last pgconf.dev. This is based on my own notes written down after the session, so if there are gaps, feel free to correct me. The session was called "Sequences & Clusters", and was in two parts, with the first part covering this thread, and the second part covering the problem of sequences with logical replication for upgrade cases. I've taken a lot of time with the 1st part (sorry about that Amit K.!) still the second part has reached an agreement about what to do next there, and this is covered by this thread these days: https://www.postgresql.org/message-id/CAA4eK1LC%2BKJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ%40mail.gmail.com My overall feeling before this session was that I did not feel that folks grabbed the problem I was trying to solve, and, while it did not feel that the end of the session completely filled the gaps, and least folks finished with some idea of the reason why I've been trying something here. First, I have spoken for a few minutes about the use-cases I've been trying to solve, where parts of it involve Postgres-XC, an auto-proclaimed multi-master solution fork of Postgres, where sequences are handled by patching src/backend/commands/sequence.c to retrieve values from a GTM (global transaction manager, source of truth for value uniqueness shared by all the nodes), something I got my hands on between 2009~2012 (spoiler: people tend to like more scaling out clusters 12 years later). Then explained why Postgres is not good in this area. The original idea is that we want to be able for some applications to scale out Postgres across many hosts while making it transparent to the user's applications. By that, imagine a huge big box where users can connect to a single point, but underground any connection could involve a connection to a cluster of N PostgreSQL nodes, N being rather large (say N > 10k?). Why would we want that? One problem behind such configurations is that there is no way to make the values transparent for the application without applying schema changes (attribute defaults, UUIDs but these are large, for example), meaning that schemas cannot really be migrated as-they-are from one space (be it a Postgres cluster of 1 or more nodes) to a second space (with less more or more nodes), and having to manipulate clusters with ALTER SEQUENCE commands to ensure that there is no overlap in value does not help much to avoid support at 3AM in case of sudden value conflicts because an application has gone wild, especially if the node fleet needs to be elastic and flexible (yep, there's also that). Note that there are also limits with generated columns that feed from the in-core sequence computation of Postgres where all the sequence data is stored in a pseudo-heap table, relying on buffer locks to make sure that in-place updates are concurrent safe. So this thread is about extending the set of possibilities in this area for application developers to control how sequences are computed. First here is a summary of the use cases that have been mentioned where a custom computation is handy, based on properties that I've understood from the conversation: - Control of computation of values on a node and cluster-basis, usually coming with three properties (put a snowflake ID here): -- Global component, usually put in the first bits to force an ordering of the values across all the nodes. For snowflakes, this is covered by a timestamp, to which an offset can be applied. -- Local component, where a portion of the value bits are decided depending on the node where the value is computed. -- Local incrementation, where the last bits in the value are used to loop if the two first ones happen to be equal, to ensure uniqueness. - Cache range of values at node level or session level, retrieved from a unique source shared by multiple nodes. The range of values is retrieved from a single source (PostgreSQL node itself), cached in a shared pool in a node or just a backend context for consumption by a session. - Transactional behavior to minimize value gaps, which is something I have mentioned but I'm a bit meh on this property as value uniqueness is key, while users have learnt to live with value gaps. Still the APIs can make that possible if autovacuum is able to understand that some clean up needs to happen. Another set of things that have been mentioned: - Is it even correct to call this concept an access method? Should a different keyword be used? This depends on the stack layer where the callbacks associated to a sequence are added, I assume. Still, based on the infrastructure that we already have in place for tables and indexes (commands, GUCs), this is still kind of the correct concept to me because we can rely on a lot of existing infrastructure, but I also get that depending on one's view the opinion diverges. - More pluggable layers. The final picture will most likely involve multiple layers of APIs, and not only what's proposed here, with xpoints mentioned about: -- Custom data types. My answer on this one is that this will need to be controlled by a different clause. I think that this is a different feature than the "access method" approach proposed here that would need to happen on top of what's here, where the point is to control the computation (and anything I've seen lately would unlock up to 64b of computation space hidden behind integer-like data types). Other cluster products out there have also a concept of user-related data types, which have to be integer-like. -- Custom nextval() functions. Here we are going to need a split between the in-core portion of sequences related to system catalogs and the facilities that can be accessed once a sequence OID is known. The patch proposed plugs into nextval_internal() for two reasons: being able to let CACHE be handled by the core code and not the AM, and easier support for generated columns with the existing types where nextval_internal() is called from the executor. This part, also, is going to require a new SQL clause. Perhaps something will happen at some point in the SQL specification itself to put some guidelines, who knows. While on it, I have noticed a couple of conflicts while rebasing, so attached is a refreshed patch set. Thanks, -- Michael
Attachment
- v6-0001-Switch-pg_sequence_last_value-to-report-a-tuple-a.patch
- v6-0002-Remove-FormData_pg_sequence_data-from-init_params.patch
- v6-0003-Integrate-addition-of-attributes-for-sequences-wi.patch
- v6-0004-Refactor-code-for-in-core-local-sequences.patch
- v6-0005-Sequence-access-methods-backend-support.patch
- v6-0006-Sequence-access-methods-dump-restore-support.patch
- v6-0007-Sequence-access-methods-core-documentation.patch
- v6-0008-snowflake-Add-sequence-AM-based-on-it.patch
- signature.asc
pgsql-hackers by date: