Re: Design of pg_stat_subscription_workers vs pgstats - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Design of pg_stat_subscription_workers vs pgstats
Date
Msg-id CAD21AoCM0RZuLk+dOvTOGWJwxtgzSmTOKkvmGGWC6gytWSERFg@mail.gmail.com
Whole thread Raw
In response to Design of pg_stat_subscription_workers vs pgstats  (Andres Freund <andres@anarazel.de>)
Responses Re: Design of pg_stat_subscription_workers vs pgstats
List pgsql-hackers
Hi,

On Tue, Jan 25, 2022 at 3:31 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> I was looking the shared memory stats patch again. The rebase of which
> collided fairly heavily with the addition of pg_stat_subscription_workers.
>
> I'm concerned about the design of pg_stat_subscription_workers. The view was
> introduced in
>
>
> commit 8d74fc96db5fd547e077bf9bf4c3b67f821d71cd
> Author: Amit Kapila <akapila@postgresql.org>
> Date:   2021-11-30 08:54:30 +0530
>
>     Add a view to show the stats of subscription workers.
>
>     This commit adds a new system view pg_stat_subscription_workers, that
>     shows information about any errors which occur during the application of
>     logical replication changes as well as during performing initial table
>     synchronization. The subscription statistics entries are removed when the
>     corresponding subscription is removed.
>
>     It also adds an SQL function pg_stat_reset_subscription_worker() to reset
>     single subscription errors.
>
>     The contents of this view can be used by an upcoming patch that skips the
>     particular transaction that conflicts with the existing data on the
>     subscriber.
>
>     This view can be extended in the future to track other xact related
>     statistics like the number of xacts committed/aborted for subscription
>     workers.
>
>     Author: Masahiko Sawada
>     Reviewed-by: Greg Nancarrow, Hou Zhijie, Tang Haiying, Vignesh C, Dilip Kumar, Takamichi Osumi, Amit Kapila
>     Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
>
>
> I tried to skim-read the discussion leading to its introduction, but it's
> extraordinarily long: 474 messages in [1], 131 messages in [2], as well as a
> few other associated threads.
>
>
> From the commit message alone I am concerned that this appears to be intended
> to be used to store important state in pgstats. For which pgstats is
> fundamentally unsuitable (pgstat can loose state during normal operation,
> always looses state during crash restarts, the state can be reset).

The information on pg_stat_subscription_workers view, especially
last_error_xid, can be used to specify XID to "ALTER SUBSCRIPTION ...
SKIP (xid = XXX)" command which is proposed on the same thread, but it
doesn't mean that the new SKIP command relies on this information. The
failure XID is written in the server logs as well and the user
specifies XID manually.

>
> I don't really understand the name "pg_stat_subscription_workers" - what
> workers are stats kept about exactly? The columns don't obviously refer to a
> single worker or such? From the contents it should be name
> pg_stat_subscription_table_stats or such. But no, that'd not quite right,
> because apply errors are stored per-susbcription, while initial sync stuff is
> per-(subscription, table).

This stores stats for subscription workers namely apply and tablesync
worker, so named as pg_stat_subscription_workers.

Also, there is another proposal to add transaction statistics for
logical replication subscribers[1], and it's reasonable to merge these
statistics and this error information rather than having separate
views[2]. There also was an idea to add the transaction statistics to
pg_stat_subscription view, but it doesn't seem a good idea because the
pg_stat_subscription shows dynamic statistics whereas the transaction
statistics are accumulative statistics[3].

>
> The pgstat entries are quite wide (292 bytes), because of the error message
> stored. That's nearly twice the size of PgStat_StatTabEntry. And as far as I
> can tell, once there was an error, we'll just keep the stats entry around
> until the subscription is dropped.

We can drop the particular statistics by
pg_stat_reset_subscription_worker() function.

> And that includes stats for long dropped
> tables, as far as I can see - except that they're hidden from view, due to a
> join to pg_subscription_rel.

We are planning to drop this after successfully apply[4].

> To me this looks like it's using pgstat as an extremely poor IPC mechanism.
>
>
> Why isn't this just storing data in pg_subscription_rel?

These need to be updated on error which means for a failed xact and we
don't want to update the system catalog in that state. There will be
some challenges in a case where updating pg_subscription_rel also
failed too (what to report to the user, etc.). And moreover, we don't
want to consume space for temporary information in the system catalog.

Regards,

[1]
https://www.postgresql.org/message-id/OSBPR01MB48887CA8F40C8D984A6DC00CED199%40OSBPR01MB4888.jpnprd01.prod.outlook.com
[2] https://www.postgresql.org/message-id/CAD21AoDF7LmSALzMfmPshRw_xFcRz3WvB-me8T2gO6Ht%3D3zL2w%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAA4eK1JqwpsvjhLxV8CMYQ3NrhimZ8AFhWHh0Qn1FrL%3DLXfY6Q%40mail.gmail.com
[4] https://www.postgresql.org/message-id/CAA4eK1%2B9yXkWkJSNtWYV2rG7QNAnoAt%2BeNH0PexoSP9ZQmXKPg%40mail.gmail.com

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: [PATCH] Full support for index LP_DEAD hint bits on standby
Next
From: Andrei Zubkov
Date:
Subject: Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements