Re: pg_upgrade: optimize replication slot caught-up check - Mailing list pgsql-hackers
| From | Chao Li |
|---|---|
| Subject | Re: pg_upgrade: optimize replication slot caught-up check |
| Date | |
| Msg-id | F9ED040D-4F1C-404D-80EC-52A5C65290C6@gmail.com Whole thread Raw |
| In response to | Re: pg_upgrade: optimize replication slot caught-up check (Masahiko Sawada <sawada.mshk@gmail.com>) |
| List | pgsql-hackers |
> On Jan 8, 2026, at 04:18, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Jan 6, 2026 at 12:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Mon, Jan 5, 2026 at 6:55 PM Chao Li <li.evan.chao@gmail.com> wrote: >>> >>> >>> >>>> On Jan 6, 2026, at 02:02, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> >>>> Hi all, >>>> >>>> Commit 29d0a77fa6 improved pg_upgrade to allow migrating logical >>>> slots. Currently, to check if the slots are ready to be migrated, we >>>> call binary_upgrade_logical_slot_has_caught_up() for every single >>>> slot. This checks if there are any unconsumed WAL records. However, we >>>> noticed a performance issue. If there are many slots (e.g., 100 or >>>> more) or if there is a WAL backlog, checking all slots one by one >>>> takes a long time. >>>> >>>> Here are some test results from my environment: >>>> With an empty cluster: 1.55s >>>> With 200 slots and 30MB backlog: 15.51s >>>> >>>> Commit 6d3d2e8e5 introduced parallel checks per database, but a single >>>> job might still have to check too many slots, causing delays. >>>> >>>> Since binary_upgrade_logical_slot_has_caught_up() essentially checks >>>> if any decodable record exists in the database, IIUC it is not >>>> necessary to check every slot. We can optimize this by checking only >>>> the slot with the minimum confirmed_flush_lsn. If that slot is caught >>>> up, we can assume others are too. The attached patch implements this >>>> optimization. With the patch, the test with 200 slots finished in >>>> 2.512s. The execution time is now stable regardless of the number of >>>> slots. >>>> >>>> One thing to note is that DecodeTXNNeedSkip() also considers >>>> replication origin filters. Theoretically, a plugin could filter out >>>> specific origins, which might lead to different results. However, this >>>> is a very rare case. Even if it happens, it would just result in a >>>> false positive (the upgrade fails safely), so the impact is minimal. >>>> Therefore, the patch simplifies the check to be per-database instead >>>> of per-slot. >>>> >>>> Feedback is very welcome. >>>> >>>> Regards, >>>> >>>> -- >>>> Masahiko Sawada >>>> Amazon Web Services: https://aws.amazon.com >>>> <v1-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch> >>> >>> Hi Masahiko-san, >>> >>> Basically I think the optimization idea makes sense to me. Reducing the check from O(slots × WAL) to O(databases × WAL)sounds a win. Just a few small comments: >> >> Thank you for reviewing the patch! >> >>> >>> 1 >>> ``` >>> +static void process_old_cluter_logical_slot_catchup_infos(DbInfo *dbinfo, PGresult *res, void *arg); >>> ``` >>> >>> Typo in the function name: cluter->cluster >>> >>> 2 >>> ··· >>> - logical_slot_infos_query = get_old_cluster_logical_slot_infos_query(); >>> + const char *slot_info_query = "SELECT slot_name, plugin, two_phase, failover, “ >>> ··· >>> >>> logical_slot_infos_query is no longer used, should be removed. >> >> Fixed. >> >>> >>> 3 >>> ``` >>> + " ORDER BY confirmed_flush_lsn ASC " >>> ``` >>> >>> I am thinking, if confirmed_flush_lsn can be NULL? If that’s true, we may want to add “NULLS LAST”. >> >> Given that the query is not executed when live-check, I think >> confirmed_flush cannot be NULL. temporary should also not be true but >> it's for consistency with slot_info_query. >> >>> >>> 4 >>> ``` >>> + Assert(num_slots == dbinfo->slot_arr.nslots); >>> + >>> + for (int i = 0; i < num_slots; i++) >>> + { >>> + char *slotname; >>> + bool caught_up; >>> + >>> + slotname = PQgetvalue(res, i, PQfnumber(res, "slot_name")); >>> + caught_up = (strcmp(PQgetvalue(res, i, PQfnumber(res, "caught_up")), "t") == 0); >>> + >>> + for (int slotnum = 0; slotnum < dbinfo->slot_arr.nslots; slotnum++) >>> + { >>> ``` >>> >>> As num_slots == dbinfo->slot_arr.nslots, this is still a O(num_slots^2) check. Given this patch’s goal is to eliminatethe burden from large number of slots, maybe we can build a hash table to make the check faster. >>> >> >> Or if we sort both queries in the same order we can simply update the >> slots sequentially. >> >> One problem I can see in this approach is that we could wrongly report >> the invalid slots in invalid_logical_slots.txt. Even if the slot with >> the oldest confirmed_flush_lsn has unconsumed decodable records, other >> slots might have already been caught up. I'll think of how to deal >> with this problem. > > I came up with a simple idea; instead of > binary_upgrade_logical_slot_has_caught_up() returning true/false, we > can have it return the last decodable WAL record's LSN, and consider > logical slots as caught-up if its confirmed_flush_lsn already passed > that LSN. This way, we can keep scanning at most one logical slot per > database and reporting the same contents as today. The function nwo > needs to scan all WAL backlogs but it's still much better than the > current method. > Sounds better, but I need more time to look into the detail. I will spend time on this tomorrow. > I've implemented this idea in the v2 patch with some updates: > > - incorporated the review comments. > - added logic to support PG18 or older. > - added regression tests. > > Feedback is very welcome. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com > <v2-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch> Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
pgsql-hackers by date: