RE: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Failed transaction statistics to measure the logical replication progress
Date
Msg-id TYCPR01MB83737C689F8F310C87C19C1EED379@TYCPR01MB8373.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Failed transaction statistics to measure the logical replication progress  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Friday, February 18, 2022 8:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Feb 18, 2022 at 2:04 PM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com> wrote:
> >
> > On Thursday, February 17, 2022 6:45 PM Amit Kapila
> <amit.kapila16@gmail.com> wrote:
> > > On Thu, Feb 17, 2022 at 3:13 PM Amit Kapila
> > > <amit.kapila16@gmail.com>
> > > wrote:
> > > > Can't we use pgstat_report_stat() here? Basically, you can update
> > > > xact completetion counters during apply, and then from
> > > > pgstat_report_stat(), you can invoke a logical replication worker
> > > > stats-related function.
> > > >
> > >
> > > If we can do this then we can save the logic this patch is trying to
> > > introduce for PGSTAT_STAT_INTERVAL.
> > Hi, I've encounter a couple of questions during my modification, following
> your advice.
> >
> > In the pgstat_report_stat, we refer to the return value of
> > GetCurrentTransactionStopTimestamp to compare the time different from
> the last time.
> > (In my previous patch, I used GetCurrentTimestamp)
> >
> > This time is updated in apply_handle_commit_internal's
> CommitTransactionCommand for the apply worker.
> > Then, if I update the subscription worker
> > stats(commit_count/abort_count) immediately after this
> > CommitTransactionCommand and immediately call pgstat_report_stat in the
> apply_handle_commit_internal, the time difference becomes too small (falls
> short of PGSTAT_STAT_INTERVAL).
> > Also, the time of GetCurrentTransactionStopTimestamp is not updated
> > even when I keep calling pgstat_report_stat repeatedly.
> > Then, IIUC the next possible timing that message of commit_count or
> > abort_count is sent to the stats collector would become the time when
> > we execute another transaction by the apply worker and update the time
> > for GetCurrentTransactionStopTimestamp
> > and rerun pgstat_report_stat again.
> >
> 
> I think but same is true in the case of the transaction in the backend where we
> increment commit counter via AtEOXact_PgStat after updating the transaction
> time. After that, we call pgstat_report_stat() at later point. How is this case
> different?
> 
> > So, if we keep GetCurrentTransactionStopTimestamp without change, an
> > update of stats depends on another new subsequent transaction if we
> simply merge those.
> > (this leads to users cannot see the latest stats information ?)
> >
> 
> I think this should be okay as these don't need to be accurate.
> 
> > At least, I got a test failure because of this function for streaming
> > commit case because it uses poll_query_until to wait for stats update.
> >
> 
> I feel it is not a good idea to wait for the accurate update of these counters.
Ah, then I had wrote tests based on totally wrong direction and made noises for it.
Sorry for that. I don't see tests for existing xact_commit/rollback count,
so I'll follow the same way.

Attached a new patch that addresses three major improvements I've got so far as comments.
1. skip increment of stats counter by empty transaction, on the subscriber side
   (except for commit prepared)
2. utilize the existing pgstat_report_stat, instead of having a similar logic newly.
3. remove the wrong tests.


Best Regards,
    Takamichi Osumi


Attachment

pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Next
From: Robert Haas
Date:
Subject: Re: adding 'zstd' as a compression algorithm