Re: [PATCH] Support automatic sequence replication - Mailing list pgsql-hackers
| From | shveta malik |
|---|---|
| Subject | Re: [PATCH] Support automatic sequence replication |
| Date | |
| Msg-id | CAJpy0uCY=+xRfkJ29abQawZ+wDb_uOasJiDiOL+Yt2iZs5jqkg@mail.gmail.com Whole thread Raw |
| In response to | Re: [PATCH] Support automatic sequence replication (Ajin Cherian <itsajin@gmail.com>) |
| List | pgsql-hackers |
We revisited the design of this patch. Sharing my thoughts and analysis here. Any feedback is appreciated. Background: ----------------------- Previously, sequence synchronization was triggered during CREATE SUBSCRIPTION, ALTER SUBSCRIPTION REFRESH PUBLICATION, and REFRESH SEQUENCES. A sequence-sync worker was started whenever a sequence entered the INIT state, which could also occur if a previous sync failed. Therefore, a mechanism was required to continuously scan pg_subscription_rel and start a sequence-sync worker for a subscription whenever any sequence was found in INIT. Since the apply worker already performs this role for table-sync workers, the same infrastructure was reused for sequence-sync workers. Using the launcher for this purpose was rejected, as it would have required overloading the launcher with logic to repeatedly inspect pg_subscription_rel and decide whether to start a worker for each sequence (see discussion at [1]). Current scenario: ----------------------- The requirement is different now: the sequence-sync worker is now expected to run continuously, independent of sequence state. This makes us revisit our design choices and re-analyze whether we can do it in the launcher. The primary benefit of starting the sequence-sync worker from the launcher would be avoiding an extra apply worker for sequence-only subscriptions. However, this approach introduces challenges. The launcher currently accesses only global pg_subscription and does not establish a database connection (see [2]). To decide whether to start an apply worker, a sequence-sync worker, or both, the launcher would need to access pg_subscription_rel, which requires a database connection. It is unclear which database the launcher should connect to, since subscriptions can target different databases. Another option would be to explicitly feed this information to the launcher during CREATE SUBSCRIPTION and REFRESH PUBLICATION by having an additional column in pg_subscription indicating object_type: table_only, seq_only, both. This would undoubtedly add complexity. Also, I am unsure if it is a good idea to add an additional field to global catalog pg_subscription for this purpose. That said, it is reasonable to expect that users who create a publication for ALL SEQUENCES will typically have only a single publication–subscription pair. In such cases, the overhead of an extra apply worker per subscription, along with a sequence-sync worker, is likely acceptable. ~~ Considering the above, starting the sequence-sync worker from the launcher seems feasible ((though it would require a more detailed analysis), but it comes with its own complexities. OTOH, (potential) significant extra worker overhead, which could impact the system, would only occur if a large number of 'sequence-only' subscriptions were created. It is unclear whether there is ever a need for multiple ALL-SEQUENCE subscriptions, or whether business requirements would need subscribing to multiple machines for ALL-SEQUENCES, which would necessitate multiple such subscriptions. Given this, it seems reasonable to continue with the current design of starting the sequence-sync worker from the apply worker. We may think of other approaches if there is any objection or user-feedback for this approach. ~~ [1]: https://www.postgresql.org/message-id/CAA4eK1%2Bp%3DM%2B5NAq5VSxD4_XyE1MBTKwU40RD1cL9PgpbELKBRQ%40m… [2]: /* * Establish connection to nailed catalogs (we only ever access * pg_subscription). */ BackgroundWorkerInitializeConnection(NULL, NULL, 0); thanks Shveta
pgsql-hackers by date: