RE: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Failed transaction statistics to measure the logical replication progress
Date
Msg-id OSBPR01MB488876F715BBB0DFC56F0129EDBD9@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Failed transaction statistics to measure the logical replication progress  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Monday, October 18, 2021 11:52 AM Hou, Zhijie/侯 志杰 <houzj.fnst@fujitsu.com> wrote:
> On Thursday, October 14, 2021 2:13 PM Osumi, Takamichi wrote:
> > On Thursday, October 14, 2021 12:54 PM Hou,
> Zhijie<houzj.fnst@fujitsu.com> wrote:
> > > On Thursday, September 30, 2021 12:15 PM Amit Kapila
> > > > On Thu, Sep 30, 2021 at 8:22 AM Hou, Zhijie/侯 志杰wrote:
> > > > >
> > > > > On Tues, Sep 28, 2021 6:05 PM Amit Kapila
> > > > > <amit.kapila16@gmail.com>
> > > wrote:
> > > > > >
> > > > > > Can't we keep the current and new stats both in-memory and
> > > > > > persist on disk? So, the persistent stats data will be used to
> > > > > > fill the in-memory counters after restarting of workers,
> > > > > > otherwise, we will always refer to in-memory values.
> > > > >
> > > > > I think this approach works, but I have another concern about it.
> > > > >
> > > > > The current pg_stat_subscription view is listed as "Dynamic
> > > > > Statistics Views" in the document, the data in it seems about
> > > > > the worker process, and the view data shows what the current
> > > > > worker did. But if we keep the new xact stat persist, then it's
> > > > > not what the current worker did, it looks more related to the
> > > > > subscription historic data.
> > > > >
> > > >
> > > > I see your point.
> > > >
> > > > > Adding a new view seems resonalble, but it will bring another
> > > > > subscription related view which might be too much. OTOH, I can
> > > > > see there are already some different views[1] including xact
> > > > > stat, maybe adding another one is accepatble ?
> > > > >
> > > >
> > > > These all views are related to untransmitted to the collector but
> > > > what we really need is a view similar to pg_stat_archiver or
> > > > pg_stat_bgwriter which gives information about background workers.
> > > > Now, the problem as I see is if we go that route then
> > > > pg_stat_subscription will no longer remain dynamic view and one
> > > > might consider that as a compatibility break. The other idea I
> > > > have shared is that we display these stats under the new view
> > > > introduced by Sawada-San's patch [1] and probably rename that view
> > > > as pg_stat_subscription_worker where all the stats (xact info and
> > > > last failure information) about each worker will be displayed. Do
> > > > you have any opinion on that idea or do you see any problem with it?
> > >
> > > Personally, I think it seems reasonable to merge the xact stat into
> > > the view from sawada-san's patch.
> > >
> > > One problem I noticed is that pg_stat_subscription_error currently
> > > have a 'count' column which show how many times the last error
> > > happened. The xact stat here also have a similar value 'xact_error'.
> > > I think we might need to rename it or merge them into one in some way.
> > >
> > > Besides, if we decide to merge xact stat into
> > > pg_stat_subscription_error, some column seems need to be renamed.
> > Maybe like:
> > > error_message => Last_error_message, command=>
> last_error_command..
> > Yeah, we must make them distinguished clearly.
> >
> > I guessed that you are concerned about amount of renaming codes that
> > could be a bit large or you come up with a necessity to consider the
> > all column names of the pg_stat_subscription_worker together all at
> > once in advance.
> >
> > It's because my instant impression is, when we go with the current
> > xact stats column definitions (xact_commit, xact_commit_bytes,
> > xact_error, xact_error_bytes, xact_abort, xact_abort_bytes), the
> > renaming problem can be solved if I write one additional patch or
> > extend the main patch of xact stats to handle renaming.
> > (This can work to keep both threads independent from each other).
> >
> > Did you have some concern that cannot be handled by this way ?
> Hi,
> 
> Currently, I don't find some unsolvable issues in this approach.
Okay. Glad to hear that.
Then, I can restart my implementation with this direction.


Best Regards,
    Takamichi Osumi

pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Next
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side