Re: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Failed transaction statistics to measure the logical replication progress |
Date | |
Msg-id | CALDaNm0AdASg4Hy0iyJBCCd+sqaPuO19YgmN2qUs+0AC16eUQQ@mail.gmail.com Whole thread Raw |
In response to | Re: Failed transaction statistics to measure the logical replication progress (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Fri, Nov 19, 2021 at 7:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Nov 18, 2021 at 12:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Nov 17, 2021 at 7:12 PM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > > > > > On Wednesday, November 17, 2021 10:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > Can you please tell us why you think the names in your proposed patch are > > > > better than the existing names proposed in Sawada-San's patch? Is it because > > > > those fields always contain the information of the last or latest error that > > > > occurred in the corresponding subscription worker? > > > This is one reason. > > > > > > Another big reason comes from the final alignment when we list up all columns of both patches. > > > The patches in this thread is trying to introduce a column that indicates > > > cumulative count of error to show all error counts that the worker got in the past. > > > > > > > Okay, I see your point and it makes sense to rename columns after > > these other stats. I am not able to come up with any better names than > > what is being used here. Sawada-San, do you agree with this, or do let > > us know if you have any better ideas? > > > > I'm concerned that these new names will introduce confusion; if we > have last_error_relid, last_error_command, last_error_message, > last_error_time, and last_error_xid, I think users might think that > first_error_time is the timestamp at which an error occurred for the > first time in the subscription worker. Also, I'm not sure > last_error_count is not clear to me (it looks like showing something > count but the only "last" one?). An alternative idea would be to add > total_error_count by this patch, resulting in having both error_count > and total_error_count. Regarding commit_count and abort_count, I > personally think xact_commit and xact_rollback would be better since > they’re more consistent with pg_stat_database view, although we might > already have discussed that. > > Besides that, I’m not sure how useful commit_bytes, abort_bytes, and > error_bytes are. I originally thought these statistics track the size > of received data, i.g., how much data is transferred from the > publisher and processed on the subscriber. But what the view currently > has is how much memory is used in the subscription worker. The > subscription worker emulates ReorderBufferChangeSize() on the > subscriber side but, as the comment of update_apply_change_size() > mentions, the size in the view is not accurate: > > + * The byte size of transaction on the publisher is calculated by > + * ReorderBufferChangeSize() based on the ReorderBufferChange structure. > + * But on the subscriber, consumed resources are not same as the > + * publisher's decoding processsing and required to be computed in > + * different way. Therefore, the exact same byte size is not restored on > + * the subscriber usually. > > Also, it seems to take into account the size of FlushPosition that is > not taken into account by ReorderBufferChangeSize(). Let's keep the size calculation similar to the publisher side to avoid any confusion, we can try to keep it the same as ReorderBufferChangeSize wherever possible. Regards, Vignesh
pgsql-hackers by date: