Re: Failed transaction statistics to measure the logical replication progress - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Failed transaction statistics to measure the logical replication progress
Date
Msg-id CAA4eK1LWYc15=ASj1tMTEFsXtxu=02aGoMwq9YanUVr9-QMhdQ@mail.gmail.com
Whole thread Raw
In response to RE: Failed transaction statistics to measure the logical replication progress  ("tanghy.fnst@fujitsu.com" <tanghy.fnst@fujitsu.com>)
Responses RE: Failed transaction statistics to measure the logical replication progress  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
List pgsql-hackers
On Tue, Feb 22, 2022 at 6:45 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
>
> I found a problem when using it. When a replication workers exits, the
> transaction stats should be sent to stats collector if they were not sent before
> because it didn't reach PGSTAT_STAT_INTERVAL. But I saw that the stats weren't
> updated as expected.
>
> I looked into it and found that the replication worker would send the
> transaction stats (if any) before it exits. But it got invalid subid in
> pgstat_send_subworker_xact_stats(), which led to the following result:
>
> postgres=# select pg_stat_get_subscription_worker(0, null);
>  pg_stat_get_subscription_worker
> ---------------------------------
>  (0,,2,0,0,,,,0,"",)
> (1 row)
>
> I think that's because subid has already been cleaned when trying to send the
> stats. I printed the value of before_shmem_exit_list, the functions in this list
> would be called in shmem_exit() when the worker exits.
> logicalrep_worker_onexit() would clean up the worker info (including subid), and
> pgstat_shutdown_hook() would send stats if any.  logicalrep_worker_onexit() was
> called before calling pgstat_shutdown_hook().
>

Yeah, I think that is a problem and maybe we can think of solving it
by sending the stats via logicalrep_worker_onexit before subid is
cleared but not sure that is a good idea. I feel we need to go back to
the idea of v21 for sending stats instead of using pgstat_report_stat.
I think the one thing which we could improve is to avoid trying to
send it each time before receiving each message by walrcv_receive and
rather try to send it before we try to wait (WaitLatchOrSocket).
Trying after each message doesn't seem to be required and could lead
to some overhead as well. What do you think?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Handle infinite recursion in logical replication setup
Next
From: Etsuro Fujita
Date:
Subject: Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit