Re: replication identifier format - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: replication identifier format |
Date | |
Msg-id | 20140623152832.GH3968@awork2.anarazel.de Whole thread Raw |
In response to | Re: replication identifier format (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: replication identifier format
|
List | pgsql-hackers |
On 2014-06-23 10:45:51 -0400, Robert Haas wrote: > On Mon, Jun 23, 2014 at 10:11 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> > Why? Users and other systems only ever see the external ID. Everything > >> > leaving the system is converted to the external form. The short id > >> > basically is only used in shared memory and in wal records. For both > >> > using longer strings would be problematic. > >> > > >> > In the patch I have the user can actually see them as they're stored in > >> > pg_replication_identifier, but there should never be a need for that. > >> > >> Hmm, so there's no requirement that the short IDs are consistent > >> across different clusters that are replication to each other? > > > > Nope. That seemed to be a hard requirement in the earlier discussions we > > had (~2 years ago). > > Oh, great. Somehow I missed the fact that that had been addressed. I > had assumed that we still needed global identifiers in which case I > think they'd need to be 64+ bits (preferably more like 128). If they > only need to be locally significant that makes things much better. Well, I was just talking about the 'short ids' here and how they are used in crash recovery/shmem et al. Those indeed don't need to be coordinated. If you ever use logical decoding on a system that receives changes from other systems (cascading replication, multimaster) you'll likely want to add the *long* form of that identifier to the output in the output plugin so the downstream nodes can identify the source. How one specific replication solution deals with coordinating this between systems is essentially that suite's problem. The external identifier currently is a 'text' column, so essentially unlimited. (Well, I just noticed that the table currently doesn't have a toast table assigned, so it's only a couple kb right now, but ...) > Is there any real reason to add a pg_replication_identifier table, or > should we just let individual replication solutions manage the > identifiers within their own configuration tables? I don't think that'd work. During crash recovery the short/internal IDs are read from WAL records and need to be unique across *all* databases. Since there's no way for different replication solutions or even the same to coordinate this across databases (as there's no way to add shared relations) it has to be builtin. It's also useful so we can have stuff like the 'pg_replication_identifier_progress' view which tells you internal_id, external_id, remote_lsn, local_lsn. Just showing the internal ID would imo be bad. > I guess one > question is: What happens if there are multiple replication solutions > in use on a single server? How do they coordinate? What's your concern here? You're wondering how they can make sure the identifiers they create are non-overlapping? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: