Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication - Mailing list pgsql-hackers

From shveta malik
Subject Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date
Msg-id CAJpy0uD-Kn0-rva5iFGXO6jWbhyBd6shyieSUuS7PODc6ySC6Q@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication  (Melih Mutlu <m.melihmutlu@gmail.com>)
Responses Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
List pgsql-hackers
On Wed, Feb 1, 2023 at 5:37 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
>
> Hi,
> Please see attached patches.
>
> Thanks,
> --
> Melih Mutlu
> Microsoft

Hi Melih,

Few suggestions on v10-0002-Reuse patch

1)
        for (int64 i = 1; i <= lastusedid; i++)
        {
                char            originname_to_drop[NAMEDATALEN] = {0};
                snprintf(originname_to_drop,
sizeof(originname_to_drop), "pg_%u_%lld", subid, (long long) i);
         .......
          }

--Is it better to use the function
'ReplicationOriginNameForLogicalRep' here instead of sprintf, just to
be consistent everywhere to construct origin-name?


2)
pa_launch_parallel_worker:
launched = logicalrep_worker_launch(MyLogicalRepWorker->dbid,
                                                            MySubscription->oid,

MySubscription->name,

MyLogicalRepWorker->userid,
                                                             InvalidOid,

dsm_segment_handle(winfo->dsm_seg),
                                                            0);

--Can we please define 'InvalidRepSlotId' macro and pass it here as
the last arg to make it more readable.
#define InvalidRepSlotId 0
Same in ApplyLauncherMain where we are passing 0 as last arg to
logicalrep_worker_launch.

3)
We are safe to drop the replication trackin origin after this
--typo: trackin -->tracking

4)
process_syncing_tables_for_sync:
if (MyLogicalRepWorker->slot_name && strcmp(syncslotname,
MyLogicalRepWorker->slot_name) != 0)
{
          ..............
ReplicationOriginNameForLogicalRep(MyLogicalRepWorker->subid,

MyLogicalRepWorker->relid,
                                                              originname,

sizeof(originname));

/* Drop replication origin */
replorigin_drop_by_name(originname, true, false);
}

--Are we passing missing_ok as true (second arg) intentionally here in
replorigin_drop_by_name? Once we fix the issue reported  in my earlier
email (ASSERT), do you think it makes sense to  pass missing_ok as
false here?

5)
process_syncing_tables_for_sync:
                foreach(lc, rstates)
                {

                        rstate = (SubscriptionRelState *)
palloc(sizeof(SubscriptionRelState));
                        memcpy(rstate, lfirst(lc),
sizeof(SubscriptionRelState));
                        /*
                         * Pick the table for the next run if it is
not already picked up
                         * by another worker.
                         */
                        LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
                        if (rstate->state != SUBREL_STATE_SYNCDONE &&

!logicalrep_worker_find(MySubscription->oid, rstate->relid, false))

                        {
                           .........
                        }
                        LWLockRelease(LogicalRepWorkerLock);
                }

--Do we need to palloc for each relation separately? Shall we do it
once outside the loop and reuse it? Also pfree is not done for rstate
here.



6)
LogicalRepSyncTableStart:
1385         slotname = (char *) palloc(NAMEDATALEN);
1413         prev_slotname = (char *) palloc(NAMEDATALEN);
1481         slotname = prev_slotname;
1502         pfree(prev_slotname);
1512         UpdateSubscriptionRel(MyLogicalRepWorker->subid,
1513
MyLogicalRepWorker->relid,
1514
MyLogicalRepWorker->relstate,
1515
MyLogicalRepWorker->relstate_lsn,
1516                                                   slotname,
1517                                                   originname);

Can you please review the above flow (I have given line# along with),
I think it could be problematic. We alloced prev_slotname, assigned it
to slotname, freed prev_slotname and used slotname after freeing the
prev_slotname.
Also slotname is allocated some memory too, that is not freed.

Reviewing further....

JFYI, I get below while applying patch:

========================
shveta@shveta-vm:~/repos/postgres1/postgres$ git am
~/Desktop/shared/reuse/v10-0002-Reuse-Logical-Replication-Background-worker.patch
Applying: Reuse Logical Replication Background worker
.git/rebase-apply/patch:142: trailing whitespace.
        values[Anum_pg_subscription_rel_srrelslotname - 1] =
.git/rebase-apply/patch:692: indent with spaces.
                    errmsg("could not receive list of slots associated
with the subscription %u, error: %s",
.git/rebase-apply/patch:1055: trailing whitespace.
        /*
.git/rebase-apply/patch:1057: trailing whitespace.
         * relations.
.git/rebase-apply/patch:1059: trailing whitespace.
         * and origin. Then stop the worker gracefully.
warning: 5 lines add whitespace errors.
 ========================



thanks
Shveta



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Deadlock between logrep apply worker and tablesync worker
Next
From: Dean Rasheed
Date:
Subject: Re: run pgindent on a regular basis / scripted manner