RE: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers
From | osumi.takamichi@fujitsu.com |
---|---|
Subject | RE: Failed transaction statistics to measure the logical replication progress |
Date | |
Msg-id | TYCPR01MB837343C96B6045E97F311393ED379@TYCPR01MB8373.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Failed transaction statistics to measure the logical replication progress (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Failed transaction statistics to measure the logical replication progress
|
List | pgsql-hackers |
On Thursday, February 17, 2022 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Feb 17, 2022 at 3:13 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > On Tue, Jan 4, 2022 at 5:22 PM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > > > > > On Friday, December 31, 2021 10:12 AM Tang, Haiying/唐 海英 > <tanghy.fnst@fujitsu.com> wrote: > > > > 4) > > > > +void > > > > +pgstat_send_subworker_xact_stats(LogicalRepWorker *repWorker, > > > > +bool > > > > +force) { > > > > + static TimestampTz last_report = 0; > > > > + PgStat_MsgSubWorkerXactEnd msg; > > > > + > > > > + if (!force) > > > > + { > > > > ... > > > > + if (!TimestampDifferenceExceeds(last_report, now, > > > > PGSTAT_STAT_INTERVAL)) > > > > + return; > > > > + last_report = now; > > > > + } > > > > + > > > > ... > > > > + if (repWorker->commit_count == 0 && repWorker->abort_count > > > > + == > > > > 0) > > > > + return; > > > > ... > > > > > > > > I think it's better to check commit_count and abort_count first, > > > > then check if reach PGSTAT_STAT_INTERVAL. > > > > Otherwise if commit_count and abort_count are 0, it is possible > > > > that the value of last_report has been updated but it didn't send > > > > stats in fact. In this case, last_report is not the real time that send last > message. > > > Yeah, agreed. This fix is right in terms of the variable name aspect. > > > > > > > Can't we use pgstat_report_stat() here? Basically, you can update xact > > completetion counters during apply, and then from > > pgstat_report_stat(), you can invoke a logical replication worker > > stats-related function. > > > > If we can do this then we can save the logic this patch is trying to introduce for > PGSTAT_STAT_INTERVAL. Hi, I've encounter a couple of questions during my modification, following your advice. In the pgstat_report_stat, we refer to the return value of GetCurrentTransactionStopTimestamp to compare the time different from the last time. (In my previous patch, I used GetCurrentTimestamp) This time is updated in apply_handle_commit_internal's CommitTransactionCommand for the apply worker. Then, if I update the subscription worker stats(commit_count/abort_count) immediately after this CommitTransactionCommand and immediately call pgstat_report_stat in the apply_handle_commit_internal, the time difference becomes too small (falls short of PGSTAT_STAT_INTERVAL). Also, the time of GetCurrentTransactionStopTimestamp is not updated even when I keep calling pgstat_report_stat repeatedly. Then, IIUC the next possible timing that message of commit_count or abort_count is sent to the stats collector would become the time when we execute another transaction by the apply worker and update the time for GetCurrentTransactionStopTimestamp and rerun pgstat_report_stat again. So, if we keep GetCurrentTransactionStopTimestamp without change, an update of stats depends on another new subsequent transaction if we simply merge those. (this leads to users cannot see the latest stats information ?) At least, I got a test failure because of this function for streaming commit case because it uses poll_query_until to wait for stats update. On the other hand, replacing GetCurrentTransactionStopTimestamp with GetCurrentTimestamp in case of apply worker looks have another negative impact. If we do so, it becomes possible that we go into the code to scan TabStatusArray with PgStat_TableStatus's trans with non-null values, because of the timing change. I might be able to avoid this kind of assert failure if I write the message send function of this patch before other existing functions to send various type of messages and return if the process is apply worker in pgstat_report_stat. But, I can't be convinced that this way of modification is OK. What did you think about those issues ? Best Regards, Takamichi Osumi
pgsql-hackers by date: