Support logical replication of large objects - Mailing list pgsql-hackers
From | Borui Yang |
---|---|
Subject | Support logical replication of large objects |
Date | |
Msg-id | CAOiV-_9_9yZbaaH-NXCOf-547RfThN182-A8kGRj8Tq4LCWu5Q@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
Hello, I’m working on a patch to support logical replication of large objects (LOBs). This is a useful feature when a database in logical replication has lots of tables, functions and other objects that change over time, such as in online cross major version upgrade. As an example, this lets users replicate large objects between different PostgreSQL major versions. The topic was previously discussed in [1]. Moreover, we need to address the following 3 challenges. I worked on some designs and appreciate feedback : 1. Replication of the change stream of LOBs My suggestion is that we can just add a check when any LOB function or API is called and executed in the backend, and then add a simple SQL command in WAL files to do the replication . Take lo_unlink as example[2] : we can create a “simple” SQL like SELECT lo_unlink(<PID>); and log it in WAL, so that replica only needs to replay the “simple” SQL command. We can unlink the LOBs in replica accordingly. Pros : a. We do not need to add much additional functionality. b. For most of the LOBs related APIs, we don’t need to touch whole LOBs, except for the case creation of LOBs. Cons: a. For the case creation of LOBs, we may need to split the whole LOB content into WAL files which will increase volume of WAL and replicated writes dramatically. This could be prevented if we can make sure the original file is publicly shared, like a url from cloud storage, or exists on host on replica as well. 2. Initializing replication of LOBs When a subscription is established, LOBs in the source should be replicated even if they are not created in replica. Here are two options to approach this problem: Option 1 : Support LOB related system catalogs in logical replication We can make an exception in this line[3] in the “check_publication_add_schema” function. Pros: All required LOBs can be transferred to replica. Cons: There is currently no support for allowing logical replication of system catalogs. Option 2 : Run a function or a script from source instance when it detects logical replication is established. The point is that we can ask the source to replicate whole LOBs when a new subscription is created. Maybe we can copy the whole LOBs related system catalogs and replicate the copy to replica, then restore the original LOBs into replica from the copy. Cons : This will increase the volume of WAL and replicated writes dramatically. I currently do not have a preference on either option. I would like to see if others have thoughts on how we could approach this. 3. OID conflicts A common case is that the OID we want to publish is already used in subscriber. Option 1 (My recommendation) : Create/Update existing System catalog for mapping the OID if conflict happens Maybe we could add another column naming like mapping_oid in system catalog pg_largeobject_metaqdate on the replica. When the replica detects the OID (E.g. 16400) that source is replicating is already used in replica, replica could store the 16400 as mapping_oid and create a new OID (E.g. 16500) as oid to be used in replica, so whatever operation is done on 16400 in source, in replica we just need to perform on 16500. Cons : We would need to add additional columns to the system catalog Option 2 : Prompt error message in Replica and let user handle it manually Cons : This is not user-friendly. Please let me know your thoughts. Borui Yang Amazon RDS/Aurora for PostgreSQL [1] https://www.postgresql.org/message-id/VisenaEmail.16.35d9e854e3626e81.15cd0de93df%40tc7-visena [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/be-fsstubs.c#l312 [3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/catalog/pg_publication.c#l98
pgsql-hackers by date: