RE: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Failed transaction statistics to measure the logical replication progress
Date
Msg-id OSBPR01MB4888BE4718225778D1F3F42BEDB89@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Failed transaction statistics to measure the logical replication progress  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses RE: Failed transaction statistics to measure the logical replication progress
List pgsql-hackers
On Thursday, October 14, 2021 12:54 PM Hou, Zhijie/侯 志杰 <houzj.fnst@fujitsu.com> wrote:
> On Thursday, September 30, 2021 12:15 PM Amit Kapila
> <amit.kapila16@gmail.com>
> > On Thu, Sep 30, 2021 at 8:22 AM Hou, Zhijie/侯 志杰
> <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Tues, Sep 28, 2021 6:05 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > >
> > > > Can't we keep the current and new stats both in-memory and persist
> > > > on disk? So, the persistent stats data will be used to fill the
> > > > in-memory counters after restarting of workers, otherwise, we will
> > > > always refer to in-memory values.
> > >
> > > I think this approach works, but I have another concern about it.
> > >
> > > The current pg_stat_subscription view is listed as "Dynamic Statistics
> Views"
> > in
> > > the document, the data in it seems about the worker process, and the
> > > view
> > data
> > > shows what the current worker did. But if we keep the new xact stat
> > > persist, then it's not what the current worker did, it looks more
> > > related to the subscription historic data.
> > >
> >
> > I see your point.
> >
> > > Adding a new view seems resonalble, but it will bring another
> > > subscription related view which might be too much. OTOH, I can see
> > > there are already
> > some
> > > different views[1] including xact stat, maybe adding another one is
> > accepatble ?
> > >
> >
> > These all views are related to untransmitted to the collector but what
> > we really need is a view similar to pg_stat_archiver or
> > pg_stat_bgwriter which gives information about background workers.
> > Now, the problem as I see is if we go that route then
> > pg_stat_subscription will no longer remain dynamic view and one might
> > consider that as a compatibility break. The other idea I have shared
> > is that we display these stats under the new view introduced by
> > Sawada-San's patch [1] and probably rename that view as
> > pg_stat_subscription_worker where all the stats (xact info and last
> > failure information) about each worker will be displayed. Do you have
> > any opinion on that idea or do you see any problem with it?
> 
> Personally, I think it seems reasonable to merge the xact stat into the view from
> sawada-san's patch.
> 
> One problem I noticed is that pg_stat_subscription_error currently have a
> 'count' column which show how many times the last error happened. The xact
> stat here also have a similar value 'xact_error'. I think we might need to rename
> it or merge them into one in some way.
> 
> Besides, if we decide to merge xact stat into pg_stat_subscription_error, some
> column seems need to be renamed. Maybe like:
> error_message => Last_error_message, command=> last_error_command..
Yeah, we must make them distinguished clearly.

I guessed that you are concerned about
amount of renaming codes that could be a bit large
or you come up with a necessity to consider
the all column names of the pg_stat_subscription_worker
together all at once in advance.

It's because my instant impression is,
when we go with the current xact stats column definitions
(xact_commit, xact_commit_bytes, xact_error, xact_error_bytes,
xact_abort, xact_abort_bytes), the renaming problem can be solved
if I write one additional patch or extend the main patch of
xact stats to handle renaming.
(This can work to keep both threads independent
from each other).

Did you have some concern that cannot be handled by this way ?


Best Regards,
    Takamichi Osumi


pgsql-hackers by date:

Previous
From: Andrey Lepikhov
Date:
Subject: Re: Make query ID more portable
Next
From: Michael Paquier
Date:
Subject: Re: Reset snapshot export state on the transaction abort