Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAD21AoAZ76=YB_QyQuDNc-NBdGfQ_zbiee3aw7MUVFFmTZPB6A@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Sat, Sep 25, 2021 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Sep 24, 2021 at 6:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Sep 24, 2021 at 8:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > 6.
> > > +typedef struct PgStat_StatSubEntry
> > > +{
> > > + Oid subid; /* hash table key */
> > > +
> > > + /*
> > > + * Statistics of errors that occurred during logical replication.  While
> > > + * having the hash table for table sync errors we have a separate
> > > + * statistics value for apply error (apply_error), because we can avoid
> > > + * building a nested hash table for table sync errors in the case where
> > > + * there is no table sync error, which is the common case in practice.
> > > + *
> > >
> > > The above comment is not clear to me. Why do you need to have a
> > > separate hash table for table sync errors? And what makes it avoid
> > > building nested hash table?
> >
> > In the previous patch, a subscription stats entry
> > (PgStat_StatSubEntry) had one hash table that had error entries of
> > both apply and table sync. Since a subscription can have one apply
> > worker and multiple table sync workers it makes sense to me to have
> > the subscription entry have a hash table for them.
> >
>
> Sure, but each tablesync worker must have a separate relid. Why can't
> we have a single hash table for both apply and table sync workers
> which are hashed by sub_id + rel_id? For apply worker, the rel_id will
> always be zero (InvalidOId) and tablesync workers will have a unique
> OID for rel_id, so we should be able to uniquely identify each of
> apply and table sync workers.

What I imagined is to extend the subscription statistics, for
instance, transaction stats[1]. By having a hash table for
subscriptions, we can store those statistics into an entry of the hash
table and we can think of subscription errors as also statistics of
the subscription. So we can have another hash table for errors in an
entry of the subscription hash table. For example, the subscription
entry struct will be something like:

typedef struct PgStat_StatSubEntry
{
    Oid subid; /* hash key */

    HTAB *errors;    /* apply and table sync errors */

    /* transaction stats of subscription */
    PgStat_Counter xact_commit;
    PgStat_Counter xact_commit_bytes;
    PgStat_Counter xact_error;
    PgStat_Counter xact_error_bytes;
    PgStat_Counter xact_abort;
    PgStat_Counter xact_abort_bytes;
    PgStat_Counter failure_count;
} PgStat_StatSubEntry;

When a subscription is dropped, we can easily drop the subscription
entry along with those statistics including the errors from the hash
table.

Regards,

[1]
https://www.postgresql.org/message-id/OSBPR01MB48887CA8F40C8D984A6DC00CED199%40OSBPR01MB4888.jpnprd01.prod.outlook.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: can we add some file(msvc) to gitignore
Next
From: Jaime Casanova
Date:
Subject: Re: Evaluate expression at planning time for two more cases