RE: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Failed transaction statistics to measure the logical replication progress
Date
Msg-id OSBPR01MB4888B9F62644397398240DC3EDA99@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Failed transaction statistics to measure the logical replication progress  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Failed transaction statistics to measure the logical replication progress
List pgsql-hackers
Hi,


Thank you, Amit-san and Sawada-san for the discussion.
On Tuesday, September 28, 2021 7:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Another idea could be to have a separate view, say
> > pg_stat_subscription_xact but I'm not sure it's a better idea.
> >
> 
> Yeah, that is another idea but I am afraid that having three different
> views for subscription stats will be too much. I think it would be
> better if we can display these additional stats via the existing view
> pg_stat_subscription or the new view pg_stat_subscription_errors (or
> whatever name we want to give it).
pg_stat_subscription_errors specializes in showing an error record.
So, it would be awkward to combine it with other normal xact stats.


> > > > Then, if, we proceed in this direction, the place to implement
> > > > those stats would be on the LogicalRepWorker struct, instead ?
> > > >
> > >
> > > Or, we can make existing stats persistent and then add these stats
> > > on top of it. Sawada-San, do you have any thoughts on this matter?
> >
> > I think that making existing stats including received_lsn and
> > last_msg_receipt_time persistent by using stats collector could cause
> > massive reporting messages. We can report these messages with a
> > certain interval to reduce the amount of messages but we will end up
> > seeing old stats on the view.
> >
> 
> Can't we keep the current and new stats both in-memory and persist on disk?
> So, the persistent stats data will be used to fill the in-memory counters after
> restarting of workers, otherwise, we will always refer to in-memory values.
I felt this isn't impossible.
When we have to update the values of the xact stats is
the end of message apply for COMMIT, COMMIT PREPARED, STREAM_ABORT and etc
or the time when an error happens during apply. Then, if we want,
we can update xact stats values at such moments accordingly.
I'm thinking that we will have a hash table whose key is a pair of subid + relid
and entry is a proposed stats structure and update the entry,
depending on the above timings.

Here, one thing a bit unclear to me is
whether we should move existing stats of pg_stat_subscription
(such as last_lsn and reply_lsn) to the hash entry or not.
Now, in pg_stat_get_subscription() for pg_stat_subscription view,
current stats values are referenced directly from (a copy of)
existing LogicalRepCtx->workers. I felt that we need to avoid
a situation that some existing data are fetched from LogicalRepWorker
and other new xact stats are from the hash in the function,
in order to keeping the alignment of the function. Was this correct ?

Another thing we need to talk is where we put a new file
of contents of pg_stat_subscription. I'm thinking that
it's pg_logical/, because above idea does not interact with
stats collector any more.

Let me know if I miss something.


Best Regards,
    Takamichi Osumi


pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: Failed transaction statistics to measure the logical replication progress
Next
From: Antonin Houska
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs