Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication - Mailing list pgsql-hackers

From shveta malik
Subject Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date
Msg-id CAJpy0uAo+AWTMU9s7bJUXJuyd-NvMM290Au2BMtQMfpmviXLNA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Responses Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
List pgsql-hackers
On Thu, Jan 26, 2023 at 7:53 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
>
> If a relation is currently being synced by a tablesync worker and uses a replication slot/origin for that operation,
thensrrelslotname and srreloriginname fields will have values.
 
> When a relation is done with its replication slot/origin, their info gets removed from related catalog row, so that
slot/origincan be reused for another table or dropped if not needed anymore.
 
> In your case, all relations are in READY state so it's expected that srrelslotname and srreloriginname are empty.
READYrelations do not need a replication slot/origin anymore.
 
>
> Tables are probably synced so quickly that you're missing the moments when a tablesync worker copies a relation and
storesits rep. slot/origin in the catalog.
 
> If initial sync is long enough, then you should be able to see the columns get updated. I follow [1] to make it
longerand test if the patch really updates the catalog.
 
>

Thank You for the details. It is clear now.
>

>
> Rebased and resolved conflicts. Please check the new version
>
Please find my suggestions on v9:

1.
--Can we please add a few more points to the Summary to make it more clear.
a) something telling that reusability of workers is for tables under
one subscription and not across multiple subscriptions.
b) Since we are reusing both workers and slots, can we add:
--when do we actually end up spawning a new worker
--when do we actually end up creating a new slot in a worker rather
than using existing one.
--if we reuse existing slots, what happens to the snapshot?


2.
+       The last used ID for tablesync workers. This ID is used to
+       create replication slots. The last used ID needs to be stored
+       to make logical replication can safely proceed after any interruption.
+       If sublastusedid is 0, then no table has been synced yet.

--typo:
 to make logical replication can safely proceed ==> to make logical
replication safely proceed

--Also, does it sound better:
The last used ID for tablesync workers. It acts as an unique
identifier for replication slots
which are created by table-sync workers. The last used ID needs to be
persisted...


3.
is_first_run;
move_to_next_rel;
--Do you think one variable is enough here as we do not get any extra
info by using 2 variables? Can we have 1 which is more generic like
'ready_to_reuse'. Otherwise, please let me know if we must use 2.


4.
/* missin_ok = true, since the origin might be already dropped. */
typo: missing_ok


5. GetReplicationSlotNamesBySubId:
errmsg("not tuple returned."));

Can we have a better error msg:
                ereport(ERROR,
                        errmsg("could not receive list of slots
associated with subscription %d, error: %s", subid, res->err));

6.
static void
clean_sync_worker(void)

--can we please add introductory comment for this function.

7.
    /*
     * Pick the table for the next run if there is not another worker
     * already picked that table.
     */
Pick the table for the next run if it is not already picked up by
another worker.

8.
process_syncing_tables_for_sync()

/* Cleanup before next run or ending the worker. */
--can we please improve this comment:
if there is no more work left for this worker, stop the worker
gracefully, else do clean-up and make it ready for the next
relation/run.

9.
LogicalRepSyncTableStart:
         * Read previous slot name from the catalog, if exists.
         */
        prev_slotname = (char *) palloc0(NAMEDATALEN);
Do we need to free this at the end?


10.
                if (strlen(prev_slotname) == 0)
                {
                        elog(ERROR, "Replication slot could not be
found for relation %u",
                                 MyLogicalRepWorker->relid);
                }
shall we mention subid also in error msg.

I am reviewing further...
thanks
Shveta



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: improving user.c error messages
Next
From: vignesh C
Date:
Subject: Re: Deadlock between logrep apply worker and tablesync worker