Re: Replication slot stats misgivings - Mailing list pgsql-hackers

From vignesh C
Subject Re: Replication slot stats misgivings
Date
Msg-id CALDaNm021AOgnc=Sx3bno_E_+riTedvcb-3HXp9Tpbp7fNZVFg@mail.gmail.com
Whole thread Raw
In response to Re: Replication slot stats misgivings  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Replication slot stats misgivings  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Apr 20, 2021 at 7:22 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, Apr 20, 2021 at 9:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Apr 19, 2021 at 4:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Mon, Apr 19, 2021 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > 4.
> > > > > > > +CREATE VIEW pg_stat_replication_slots AS
> > > > > > > +    SELECT
> > > > > > > +            s.slot_name,
> > > > > > > +            s.spill_txns,
> > > > > > > +            s.spill_count,
> > > > > > > +            s.spill_bytes,
> > > > > > > +            s.stream_txns,
> > > > > > > +            s.stream_count,
> > > > > > > +            s.stream_bytes,
> > > > > > > +            s.total_txns,
> > > > > > > +            s.total_bytes,
> > > > > > > +            s.stats_reset
> > > > > > > +    FROM pg_replication_slots as r,
> > > > > > > +        LATERAL pg_stat_get_replication_slot(slot_name) as s
> > > > > > > +    WHERE r.datoid IS NOT NULL; -- excluding physical slots
> > > > > > > ..
> > > > > > > ..
> > > > > > >
> > > > > > > -/* Get the statistics for the replication slots */
> > > > > > > +/* Get the statistics for the replication slot */
> > > > > > >  Datum
> > > > > > > -pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
> > > > > > > +pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
> > > > > > >  {
> > > > > > >  #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
> > > > > > > - ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> > > > > > > + text *slotname_text = PG_GETARG_TEXT_P(0);
> > > > > > > + NameData slotname;
> > > > > > >
> > > > > > > I think with the above changes getting all the slot stats has become
> > > > > > > much costlier. Is there any reason why can't we get all the stats from
> > > > > > > the new hash_table in one shot and return them to the user?
> > > > > >
> > > > > > I think the advantage of this approach would be that it can avoid
> > > > > > showing the stats for already-dropped slots. Like other statistics
> > > > > > views such as pg_stat_all_tables and pg_stat_all_functions, searching
> > > > > > the stats by the name got from pg_replication_slots can show only
> > > > > > available slot stats even if the hash table has garbage slot stats.
> > > > > >
> > > > >
> > > > > Sounds reasonable. However, if the create_slot message is missed, it
> > > > > will show an empty row for it. See below:
> > > > >
> > > > > postgres=# select slot_name, total_txns from pg_stat_replication_slots;
> > > > >  slot_name | total_txns
> > > > > -----------+------------
> > > > >  s1        |          0
> > > > >  s2        |          0
> > > > >            |
> > > > > (3 rows)
> > > > >
> > > > > Here, I have manually via debugger skipped sending the create_slot
> > > > > message for the third slot and we are showing an empty for it. This
> > > > > won't happen for pg_stat_all_tables, as it will set 0 or other initial
> > > > > values in such a case. I think we need to address this case.
> > > >
> > > > Good catch. I think it's better to set 0 to all counters and NULL to
> > > > reset_stats.
> > > >
> > > > >
> > > > > > Given that pg_stat_replication_slots doesn’t show garbage slot stats
> > > > > > even if it has, I thought we can avoid checking those garbage stats
> > > > > > frequently. It should not essentially be a problem for the hash table
> > > > > > to have entries up to max_replication_slots regardless of live or
> > > > > > already-dropped.
> > > > > >
> > > > >
> > > > > Yeah, but I guess we still might not save much by not doing it,
> > > > > especially because for the other cases like tables/functions, we are
> > > > > doing it without any threshold limit.
> > > >
> > > > Agreed.
> > > >
> > > > I've attached the updated patch, please review it.
> > >
> > > I've attached the new version patch that fixed the compilation error
> > > reported off-line by Amit.
> >
> > Thanks for the updated patch, few comments:
>
> Thank you for the review comments.
>
> > 1) We can change "slotent = pgstat_get_replslot_entry(slotname,
> > false);" to "return pgstat_get_replslot_entry(slotname, false);" and
> > remove the slotent variable.
> >
> > +       PgStat_StatReplSlotEntry *slotent = NULL;
> > +
> >         backend_read_statsfile();
> >
> > -       *nslots_p = nReplSlotStats;
> > -       return replSlotStats;
> > +       slotent = pgstat_get_replslot_entry(slotname, false);
> > +
> > +       return slotent;
>
> Fixed.
>
> >
> > 2) Should we change PGSTAT_FILE_FORMAT_ID as the statistic file format
> > has changed for replication statistics?
>
> The struct name is changed but I think the statistics file format has
> not changed by this patch. No?

I tried to create stats on head and then applied this patch and tried
reading the stats, it could not get the values, the backtrace for the
same is:
(gdb) bt
#0  0x000055fe12f8a93d in pg_detoast_datum (datum=0x7f7f7f7f7f7f7f7f)
at fmgr.c:1727
#1  0x000055fe12ec2a03 in pg_stat_get_replication_slot
(fcinfo=0x55fe1357e150) at pgstatfuncs.c:2316
#2  0x000055fe12b6af23 in ExecMakeTableFunctionResult
(setexpr=0x55fe13563c28, econtext=0x55fe13563b90,
argContext=0x55fe1357e030, expectedDesc=0x55fe13564968,
    randomAccess=false) at execSRF.c:234
#3  0x000055fe12b87ba3 in FunctionNext (node=0x55fe13563a78) at
nodeFunctionscan.c:95
#4  0x000055fe12b6c929 in ExecScanFetch (node=0x55fe13563a78,
accessMtd=0x55fe12b87aee <FunctionNext>, recheckMtd=0x55fe12b87eea
<FunctionRecheck>) at execScan.c:133
#5  0x000055fe12b6c9a2 in ExecScan (node=0x55fe13563a78,
accessMtd=0x55fe12b87aee <FunctionNext>, recheckMtd=0x55fe12b87eea
<FunctionRecheck>) at execScan.c:182
#6  0x000055fe12b87f40 in ExecFunctionScan (pstate=0x55fe13563a78) at
nodeFunctionscan.c:270
#7  0x000055fe12b687eb in ExecProcNodeFirst (node=0x55fe13563a78) at
execProcnode.c:462
#8  0x000055fe12b5c713 in ExecProcNode (node=0x55fe13563a78) at
../../../src/include/executor/executor.h:257
#9  0x000055fe12b5f147 in ExecutePlan (estate=0x55fe135635f0,
planstate=0x55fe13563a78, use_parallel_mode=false,
operation=CMD_SELECT, sendTuples=true, numberTuples=0,
    direction=ForwardScanDirection, dest=0x55fe13579558,
execute_once=true) at execMain.c:1551
#10 0x000055fe12b5cded in standard_ExecutorRun
(queryDesc=0x55fe1349acd0, direction=ForwardScanDirection, count=0,
execute_once=true) at execMain.c:361
#11 0x000055fe12b5cbfc in ExecutorRun (queryDesc=0x55fe1349acd0,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:305
#12 0x000055fe12dca9ce in PortalRunSelect (portal=0x55fe134ed2f0,
forward=true, count=0, dest=0x55fe13579558) at pquery.c:912
#13 0x000055fe12dca607 in PortalRun (portal=0x55fe134ed2f0,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x55fe13579558, altdest=0x55fe13579558,
    qc=0x7ffefa53cd30) at pquery.c:756
#14 0x000055fe12dc3915 in exec_simple_query
(query_string=0x55fe134796e0 "select * from pg_stat_replication_slots
;") at postgres.c:1196

I feel we can change CATALOG_VERSION_NO so that we will get this error
"The database cluster was initialized with CATALOG_VERSION_NO
2021XXXXX, but the server was compiled with CATALOG_VERSION_NO
2021XXXXX." which will prevent the above issue.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Replication slot stats misgivings
Next
From: Amit Kapila
Date:
Subject: Re: Replication slot stats misgivings