Re: Skip partition tuple routing with constant partition key - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Skip partition tuple routing with constant partition key |
Date | |
Msg-id | CAD21AoCkiVnesVyQcUggWtuJqGhn1GCFQ6_s6=c1BU+arK8Zjw@mail.gmail.com Whole thread Raw |
In response to | Re: Skip partition tuple routing with constant partition key (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Wed, Apr 13, 2022 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 12, 2022 at 6:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Hi, > > > > On Thu, Apr 7, 2022 at 4:37 PM Andres Freund <andres@anarazel.de> wrote: > > > > > > Hi, > > > > > > On 2022-04-06 00:07:07 -0400, Tom Lane wrote: > > > > Amit Langote <amitlangote09@gmail.com> writes: > > > > > On Sun, Apr 3, 2022 at 10:31 PM Greg Stark <stark@mit.edu> wrote: > > > > >> Is this a problem with the patch or its tests? > > > > >> [18:14:20.798] Test Summary Report > > > > >> [18:14:20.798] ------------------- > > > > >> [18:14:20.798] t/013_partition.pl (Wstat: 15360 Tests: 31 Failed: 0) > > > > > > > > > Hmm, make check-world passes for me after rebasing the patch (v10) to > > > > > the latest HEAD (clean), nor do I see a failure on cfbot: > > > > > http://cfbot.cputube.org/amit-langote.html > > > > > > > > 013_partition.pl has been failing regularly in the buildfarm, > > > > most recently here: > > > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2022-03-31%2000%3A49%3A45 > > > > > > Just failed locally on my machine as well. > > > > > > > > > > I don't think there's room to blame any uncommitted patches > > > > for that. Somebody broke it a short time before here: > > > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-03-17%2016%3A08%3A19 > > > > > > The obvious thing to point a finger at is > > > > > > commit c91f71b9dc91ef95e1d50d6d782f477258374fc6 > > > Author: Tomas Vondra <tomas.vondra@postgresql.org> > > > Date: 2022-03-16 16:42:47 +0100 > > > > > > Fix publish_as_relid with multiple publications > > > > > > > I've not managed to reproduce this issue on my machine but while > > reviewing the code and the server logs[1] I may have found possible > > bugs: > > > > 2022-04-08 12:59:30.701 EDT [91997:1] LOG: logical replication apply > > worker for subscription "sub2" has started > > 2022-04-08 12:59:30.702 EDT [91998:3] 013_partition.pl LOG: > > statement: ALTER SUBSCRIPTION sub2 SET PUBLICATION pub_lower_level, > > pub_all > > 2022-04-08 12:59:30.733 EDT [91998:4] 013_partition.pl LOG: > > disconnection: session time: 0:00:00.036 user=buildfarm > > database=postgres host=[local] > > 2022-04-08 12:59:30.740 EDT [92001:1] LOG: logical replication table > > synchronization worker for subscription "sub2", table "tab4_1" has > > started > > 2022-04-08 12:59:30.744 EDT [91997:2] LOG: logical replication apply > > worker for subscription "sub2" will restart because of a parameter > > change > > 2022-04-08 12:59:30.750 EDT [92003:1] LOG: logical replication table > > synchronization worker for subscription "sub2", table "tab3" has > > started > > > > The logs say that the apply worker for "sub2" finished whereas the > > tablesync workers for "tab4_1" and "tab3" started. After these logs, > > there are no logs that these tablesync workers finished and the apply > > worker for "sub2" restarted, until the timeout. While reviewing the > > code, I realized that the tablesync workers can advance its relstate > > even without the apply worker intervention. > > > > After a tablesync worker copies the table it sets > > SUBREL_STATE_SYNCWAIT to its relstate, then it waits for the apply > > worker to update the relstate to SUBREL_STATE_CATCHUP. If the apply > > worker has already died, it breaks from the wait loop and returns > > false: > > > > wait_for_worker_state_change(): > > > > for (;;) > > { > > LogicalRepWorker *worker; > > > > : > > > > /* > > * Bail out if the apply worker has died, else signal it we're > > * waiting. > > */ > > LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); > > worker = logicalrep_worker_find(MyLogicalRepWorker->subid, > > InvalidOid, false); > > if (worker && worker->proc) > > logicalrep_worker_wakeup_ptr(worker); > > LWLockRelease(LogicalRepWorkerLock); > > if (!worker) > > break; > > > > : > > } > > > > return false; > > > > However, the caller doesn't check the return value at all: > > > > /* > > * We are done with the initial data synchronization, update the state. > > */ > > SpinLockAcquire(&MyLogicalRepWorker->relmutex); > > MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT; > > MyLogicalRepWorker->relstate_lsn = *origin_startpos; > > SpinLockRelease(&MyLogicalRepWorker->relmutex); > > > > /* > > * Finally, wait until the main apply worker tells us to catch up and then > > * return to let LogicalRepApplyLoop do it. > > */ > > wait_for_worker_state_change(SUBREL_STATE_CATCHUP); > > return slotname; > > > > Therefore, the tablesync worker started logical replication while > > keeping its relstate as SUBREL_STATE_SYNCWAIT. > > > > Given the server logs, it's likely that both tablesync workers for > > "tab4_1" and "tab3" were in this situation. That is, there were two > > tablesync workers who were applying changes for the target relation > > but the relstate was SUBREL_STATE_SYNCWAIT. > > > > When it comes to starting the apply worker, probably it didn't happen > > since there are already running tablesync workers as much as > > max_sync_workers_per_subscription (2 by default): > > > > logicalrep_worker_launch(): > > > > /* > > * If we reached the sync worker limit per subscription, just exit > > * silently as we might get here because of an otherwise harmless race > > * condition. > > */ > > if (nsyncworkers >= max_sync_workers_per_subscription) > > { > > LWLockRelease(LogicalRepWorkerLock); > > return; > > } > > > > This scenario seems possible in principle but I've not managed to > > reproduce this issue so I might be wrong. > > > > This is exactly the same analysis I have done in the original thread > where that patch was committed. I have found some crude ways to > reproduce it with a different test as well. See emails [1][2][3]. Great. I didn't realize there is a discussion there. > > > Especially, according to the > > server logs, it seems like the tablesync workers were launched before > > the apply worker restarted due to parameter change and this is a > > common pattern among other failure logs. But I'm not sure how it could > > really happen. IIUC the apply worker always re-reads subscription (and > > exits if there is parameter change) and then requests to launch > > tablesync workers accordingly. > > > > Is there any rule/documentation which ensures that we must re-read the > subscription parameter change before trying to launch sync workers? No, but as far as I read the code I could not find any path of that. > > Actually, it would be better if we discuss this problem on another > thread [1] to avoid hijacking this thread. So, it would be good if you > respond there with your thoughts. Thanks for looking into this. Agreed. I'll respond there. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: