Thread: pg_replslotdata - a tool for displaying replication slot information

pg_replslotdata - a tool for displaying replication slot information

From
Bharath Rupireddy
Date:
Hi,

The replication slots data is stored in binary format on the disk under the pg_replslot/<<slot_name>> directory which isn't human readable. If the server is crashed/down (for whatever reasons) and unable to come up, currently there's no way for the user/admin/developer to know what were all the replication slots available at the time of server crash/down to figure out what's the restart lsn, xid, two phase info or types of slots etc.

pg_replslotdata is a tool that interprets the replication slots information and displays it onto the stdout even if the server is crashed/down. The design of this tool is similar to other tools available in the core today i.e. pg_controldata, pg_waldump.

Attaching initial patch herewith. I will improve it with documentation and other stuff a bit later.

Please see the attached picture for the sample output.

Thoughts?

Regards,
Bharath Rupireddy.
Attachment

Re: pg_replslotdata - a tool for displaying replication slot information

From
Bharath Rupireddy
Date:
On Tue, Nov 23, 2021 at 10:39 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> The replication slots data is stored in binary format on the disk under the pg_replslot/<<slot_name>> directory which
isn'thuman readable. If the server is crashed/down (for whatever reasons) and unable to come up, currently there's no
wayfor the user/admin/developer to know what were all the replication slots available at the time of server crash/down
tofigure out what's the restart lsn, xid, two phase info or types of slots etc. 
>
> pg_replslotdata is a tool that interprets the replication slots information and displays it onto the stdout even if
theserver is crashed/down. The design of this tool is similar to other tools available in the core today i.e.
pg_controldata,pg_waldump. 
>
> Attaching initial patch herewith. I will improve it with documentation and other stuff a bit later.
>
> Please see the attached picture for the sample output.
>
> Thoughts?

Attaching the rebased v2 patch.

Regards,
Bharath Rupireddy.

Attachment

Re: pg_replslotdata - a tool for displaying replication slot information

From
Bharath Rupireddy
Date:
On Wed, Nov 24, 2021 at 9:09 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Nov 23, 2021 at 10:39 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > The replication slots data is stored in binary format on the disk under the pg_replslot/<<slot_name>> directory
whichisn't human readable. If the server is crashed/down (for whatever reasons) and unable to come up, currently
there'sno way for the user/admin/developer to know what were all the replication slots available at the time of server
crash/downto figure out what's the restart lsn, xid, two phase info or types of slots etc. 
> >
> > pg_replslotdata is a tool that interprets the replication slots information and displays it onto the stdout even if
theserver is crashed/down. The design of this tool is similar to other tools available in the core today i.e.
pg_controldata,pg_waldump. 
> >
> > Attaching initial patch herewith. I will improve it with documentation and other stuff a bit later.
> >
> > Please see the attached picture for the sample output.
> >
> > Thoughts?
>
> Attaching the rebased v2 patch.

On windows the previous patches were failing, fixed that in the v3
patch. I'm really sorry for the noise.

Regards,
Bharath Rupireddy.

Attachment

Re: pg_replslotdata - a tool for displaying replication slot information

From
Japin Li
Date:
On Wed, 24 Nov 2021 at 23:59, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> On Wed, Nov 24, 2021 at 9:09 PM Bharath Rupireddy
>> > Thoughts?
>>
>> Attaching the rebased v2 patch.
>
> On windows the previous patches were failing, fixed that in the v3
> patch. I'm really sorry for the noise.
>

Cool!  When I try to use it, there is an error for -v, --verbose option.

px@ubuntu:~/Codes/postgres/Debug$ pg_replslotdata -v
pg_replslotdata: invalid option -- 'v'
Try "pg_replslotdata --help" for more information.

This is because the getopt_long() missing 'v' in the third parameter.

    while ((c = getopt_long(argc, argv, "D:v", long_options, NULL)) != -1)

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: pg_replslotdata - a tool for displaying replication slot information

From
Bharath Rupireddy
Date:
On Wed, Nov 24, 2021 at 9:40 PM Japin Li <japinli@hotmail.com> wrote:
>
>
> On Wed, 24 Nov 2021 at 23:59, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> > On Wed, Nov 24, 2021 at 9:09 PM Bharath Rupireddy
> >> > Thoughts?
> >>
> >> Attaching the rebased v2 patch.
> >
> > On windows the previous patches were failing, fixed that in the v3
> > patch. I'm really sorry for the noise.
> >
>
> Cool!  When I try to use it, there is an error for -v, --verbose option.
>
> px@ubuntu:~/Codes/postgres/Debug$ pg_replslotdata -v
> pg_replslotdata: invalid option -- 'v'
> Try "pg_replslotdata --help" for more information.
>
> This is because the getopt_long() missing 'v' in the third parameter.
>
>     while ((c = getopt_long(argc, argv, "D:v", long_options, NULL)) != -1)

Thanks for taking a look at the patch, attaching v4.

There are many things that I could do in the patch, for instance, more
comments, documentation, code improvements etc. I would like to first
know what hackers think about this tool, and then start spending more
time on it.

Regards,
Bharath Rupireddy.

Attachment

Re: pg_replslotdata - a tool for displaying replication slot information

From
Peter Eisentraut
Date:
On 23.11.21 06:09, Bharath Rupireddy wrote:
> The replication slots data is stored in binary format on the disk under 
> the pg_replslot/<<slot_name>> directory which isn't human readable. If 
> the server is crashed/down (for whatever reasons) and unable to come up, 
> currently there's no way for the user/admin/developer to know what were 
> all the replication slots available at the time of server crash/down to 
> figure out what's the restart lsn, xid, two phase info or types of slots 
> etc.

What do you need that for?  You can't do anything with a replication 
slot while the server is down.



Re: pg_replslotdata - a tool for displaying replication slot information

From
"Bossart, Nathan"
Date:
On 11/30/21, 6:14 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:
> On 23.11.21 06:09, Bharath Rupireddy wrote:
>> The replication slots data is stored in binary format on the disk under
>> the pg_replslot/<<slot_name>> directory which isn't human readable. If
>> the server is crashed/down (for whatever reasons) and unable to come up,
>> currently there's no way for the user/admin/developer to know what were
>> all the replication slots available at the time of server crash/down to
>> figure out what's the restart lsn, xid, two phase info or types of slots
>> etc.
>
> What do you need that for?  You can't do anything with a replication
> slot while the server is down.

One use-case might be to discover the value you need to set for
max_replication_slots, although it's pretty trivial to discover the
number of replication slots by looking at the folder directly.
However, you also need to know how many replication origins there are,
and AFAIK there isn't an easy way to read the replorigin_checkpoint
file at the moment.  IMO a utility like this should also show details
for the replication origins.  I don't have any other compelling use-
cases at the moment, but I will say that it is typically nice from an
administrative standpoint to be able to inspect things like this
without logging into a running server.

Nathan


Re: pg_replslotdata - a tool for displaying replication slot information

From
Bharath Rupireddy
Date:
On Wed, Dec 1, 2021 at 12:13 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 11/30/21, 6:14 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:
> > On 23.11.21 06:09, Bharath Rupireddy wrote:
> >> The replication slots data is stored in binary format on the disk under
> >> the pg_replslot/<<slot_name>> directory which isn't human readable. If
> >> the server is crashed/down (for whatever reasons) and unable to come up,
> >> currently there's no way for the user/admin/developer to know what were
> >> all the replication slots available at the time of server crash/down to
> >> figure out what's the restart lsn, xid, two phase info or types of slots
> >> etc.
> >
> > What do you need that for?  You can't do anything with a replication
> > slot while the server is down.
>
> One use-case might be to discover the value you need to set for
> max_replication_slots, although it's pretty trivial to discover the
> number of replication slots by looking at the folder directly.

Apart from the above use-case, one can do some exploratory analysis on
the replication slot information after the server crash, this may be
useful for RCA or debugging purposes, for instance:
1) to look at the restart_lsn of the slots to get to know why there
were many WAL files filled up on the disk (because of the restart_lsn
being low)
2) to know how many replication slots available at the time of crash,
if required, one can choose to drop selective replication slots or the
ones that were falling behind to make the server up
3) if we persist active_pid info of the replication slot to the
disk(currently we don't have this info in the disk), one can get to
know the inactive replication slots at the time of crash
4) if the primary server is down and failover were to happen on to the
standby, by looking at the replication slot information on the
primary, one can easily recreate the slots on the standby

> However, you also need to know how many replication origins there are,
> and AFAIK there isn't an easy way to read the replorigin_checkpoint
> file at the moment.  IMO a utility like this should also show details
> for the replication origins.  I don't have any other compelling use-
> cases at the moment, but I will say that it is typically nice from an
> administrative standpoint to be able to inspect things like this
> without logging into a running server.

Yeah, this can be added too, probably as an extra option to the
proposed pg_replslotdata tool. But for now, let's deal with the
replication slot information alone and once this gets committed, we
can extend it further for replication origin info.

Regards,
Bharath Rupireddy.



Re: pg_replslotdata - a tool for displaying replication slot information

From
SATYANARAYANA NARLAPURAM
Date:


On Tue, Nov 30, 2021 at 9:47 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Dec 1, 2021 at 12:13 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 11/30/21, 6:14 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:
> > On 23.11.21 06:09, Bharath Rupireddy wrote:
> >> The replication slots data is stored in binary format on the disk under
> >> the pg_replslot/<<slot_name>> directory which isn't human readable. If
> >> the server is crashed/down (for whatever reasons) and unable to come up,
> >> currently there's no way for the user/admin/developer to know what were
> >> all the replication slots available at the time of server crash/down to
> >> figure out what's the restart lsn, xid, two phase info or types of slots
> >> etc.
> >
> > What do you need that for?  You can't do anything with a replication
> > slot while the server is down.
>
> One use-case might be to discover the value you need to set for
> max_replication_slots, although it's pretty trivial to discover the
> number of replication slots by looking at the folder directly.

Apart from the above use-case, one can do some exploratory analysis on
the replication slot information after the server crash, this may be
useful for RCA or debugging purposes, for instance:
1) to look at the restart_lsn of the slots to get to know why there
were many WAL files filled up on the disk (because of the restart_lsn
being low)

In a disk full scenario because of WAL, this tool comes handy identifying which WAL files to delete  to free up the space and also help assess the accidental delete of the WAL files. I am not sure if there is a tool to help cleanup the WAL (may be invoking the archive_command too?) without impacting physical / logical slots, and respecting last checkpoint location but if one exist that will be handy

Re: pg_replslotdata - a tool for displaying replication slot information

From
Andres Freund
Date:
Hi,

On 2021-11-30 18:43:23 +0000, Bossart, Nathan wrote:
> On 11/30/21, 6:14 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:
> > On 23.11.21 06:09, Bharath Rupireddy wrote:
> >> The replication slots data is stored in binary format on the disk under
> >> the pg_replslot/<<slot_name>> directory which isn't human readable. If
> >> the server is crashed/down (for whatever reasons) and unable to come up,
> >> currently there's no way for the user/admin/developer to know what were
> >> all the replication slots available at the time of server crash/down to
> >> figure out what's the restart lsn, xid, two phase info or types of slots
> >> etc.
> >
> > What do you need that for?  You can't do anything with a replication
> > slot while the server is down.

Yes, I don't think there's sufficient need for this.


> I don't have any other compelling use- cases at the moment, but I will say
> that it is typically nice from an administrative standpoint to be able to
> inspect things like this without logging into a running server.

Weighed against the cost of maintaining (including documentation) a separate
tool this doesn't seem sufficient reason.

Greetings,

Andres Freund



Re: pg_replslotdata - a tool for displaying replication slot information

From
Bharath Rupireddy
Date:
On Thu, Dec 2, 2021 at 4:22 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-11-30 18:43:23 +0000, Bossart, Nathan wrote:
> > On 11/30/21, 6:14 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:
> > > On 23.11.21 06:09, Bharath Rupireddy wrote:
> > >> The replication slots data is stored in binary format on the disk under
> > >> the pg_replslot/<<slot_name>> directory which isn't human readable. If
> > >> the server is crashed/down (for whatever reasons) and unable to come up,
> > >> currently there's no way for the user/admin/developer to know what were
> > >> all the replication slots available at the time of server crash/down to
> > >> figure out what's the restart lsn, xid, two phase info or types of slots
> > >> etc.
> > >
> > > What do you need that for?  You can't do anything with a replication
> > > slot while the server is down.
>
> Yes, I don't think there's sufficient need for this.

Thanks. The idea of the pg_replslotdata is emanated from the real-time
working experience with the customer issues and answering some of
their questions. Given the fact that replication slots are used in
almost every major production servers, and they are likely to cause
problems, postgres having a core tool like pg_replslotdata to
interpret the replication slot info without contacting the server,
will definitely be useful for all the other postgres vendors out
there. Having some important tool in the core, can avoid duplicate
efforts.

> > I don't have any other compelling use- cases at the moment, but I will say
> > that it is typically nice from an administrative standpoint to be able to
> > inspect things like this without logging into a running server.
>
> Weighed against the cost of maintaining (including documentation) a separate
> tool this doesn't seem sufficient reason.

IMHO, this shouldn't be a reason to not get something useful (useful
IMO and few others in this thread) into the core. The maintenance of
the tools generally is low compared to the core server features once
they get reviewed and committed.

Having said that, other hackers may have better thoughts.

Regards,
Bharath Rupireddy.



Re: pg_replslotdata - a tool for displaying replication slot information

From
Michael Paquier
Date:
On Thu, Dec 02, 2021 at 08:32:08AM +0530, Bharath Rupireddy wrote:
> On Thu, Dec 2, 2021 at 4:22 AM Andres Freund <andres@anarazel.de> wrote:
>>> I don't have any other compelling use- cases at the moment, but I will say
>>> that it is typically nice from an administrative standpoint to be able to
>>> inspect things like this without logging into a running server.
>>
>> Weighed against the cost of maintaining (including documentation) a separate
>> tool this doesn't seem sufficient reason.
>
> IMHO, this shouldn't be a reason to not get something useful (useful
> IMO and few others in this thread) into the core. The maintenance of
> the tools generally is low compared to the core server features once
> they get reviewed and committed.

Well, a bit less maintenance is always better than more maintenance.
An extra cost that you may be missing is related to the translation of
the documentation, as well as the translation of any new strings this
would require.  FWIW, I don't directly see a use for this tool that
could not be solved with an online server.
--
Michael

Attachment

Re: pg_replslotdata - a tool for displaying replication slot information

From
"Bossart, Nathan"
Date:
On 12/5/21, 11:10 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Thu, Dec 02, 2021 at 08:32:08AM +0530, Bharath Rupireddy wrote:
>> On Thu, Dec 2, 2021 at 4:22 AM Andres Freund <andres@anarazel.de> wrote:
>>>> I don't have any other compelling use- cases at the moment, but I will say
>>>> that it is typically nice from an administrative standpoint to be able to
>>>> inspect things like this without logging into a running server.
>>>
>>> Weighed against the cost of maintaining (including documentation) a separate
>>> tool this doesn't seem sufficient reason.
>> 
>> IMHO, this shouldn't be a reason to not get something useful (useful
>> IMO and few others in this thread) into the core. The maintenance of
>> the tools generally is low compared to the core server features once
>> they get reviewed and committed.
>
> Well, a bit less maintenance is always better than more maintenance.
> An extra cost that you may be missing is related to the translation of
> the documentation, as well as the translation of any new strings this
> would require.  FWIW, I don't directly see a use for this tool that
> could not be solved with an online server.

Bharath, perhaps you should maintain this outside of core PostgreSQL
for now.  If some compelling use-cases ever surface that make it seem
worth the added maintenance burden, this thread could probably be
revisited.

Nathan


Re: pg_replslotdata - a tool for displaying replication slot information

From
Julien Rouhaud
Date:
Hi,

On Mon, Dec 06, 2021 at 07:16:12PM +0000, Bossart, Nathan wrote:
> On 12/5/21, 11:10 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> > On Thu, Dec 02, 2021 at 08:32:08AM +0530, Bharath Rupireddy wrote:
> >> On Thu, Dec 2, 2021 at 4:22 AM Andres Freund <andres@anarazel.de> wrote:
> >>>> I don't have any other compelling use- cases at the moment, but I will say
> >>>> that it is typically nice from an administrative standpoint to be able to
> >>>> inspect things like this without logging into a running server.
> >>>
> >>> Weighed against the cost of maintaining (including documentation) a separate
> >>> tool this doesn't seem sufficient reason.
> >> 
> >> IMHO, this shouldn't be a reason to not get something useful (useful
> >> IMO and few others in this thread) into the core. The maintenance of
> >> the tools generally is low compared to the core server features once
> >> they get reviewed and committed.
> >
> > Well, a bit less maintenance is always better than more maintenance.
> > An extra cost that you may be missing is related to the translation of
> > the documentation, as well as the translation of any new strings this
> > would require.  FWIW, I don't directly see a use for this tool that
> > could not be solved with an online server.
> 
> Bharath, perhaps you should maintain this outside of core PostgreSQL
> for now.  If some compelling use-cases ever surface that make it seem
> worth the added maintenance burden, this thread could probably be
> revisited.

Ironically, the patch is currently failing due to translation problem:

https://cirrus-ci.com/task/5467034313031680
[19:12:28.179] su postgres -c "make -s -j${BUILD_JOBS} world-bin"
[19:12:44.270] make[3]: *** No rule to make target 'po/cs.po', needed by 'po/cs.mo'.  Stop.
[19:12:44.270] make[2]: *** [Makefile:44: all-pg_replslotdata-recurse] Error 2
[19:12:44.270] make[2]: *** Waiting for unfinished jobs....
[19:12:44.499] make[1]: *** [Makefile:42: all-bin-recurse] Error 2
[19:12:44.499] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2

Looking at the thread, I see support from 3 people:

- Bharath
- Japin
- Satyanarayana

while 3 committers think that the extra maintenance effort isn't worth the
usage:

- Peter E.
- Andres
- Michael

and a +0.5 from Nathan IIUC.

I also personally don't think that this worth the maintenance effort.  This
tool being entirely client side, there's no problem with maintaining it on a
separate repository, as mentioned by Nathan, including using it on the cloud
providers that provides access to at least the data file.  Another pro of the
external repo is that the tool can be made available immediately and for older
releases.

Since 3 committers voted against it I think that the patch should be closed
as "Rejected".  I will do that in a few days unless there's some compelling
objection by then.



Re: pg_replslotdata - a tool for displaying replication slot information

From
Bharath Rupireddy
Date:
On Sat, Jan 15, 2022 at 2:20 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > Bharath, perhaps you should maintain this outside of core PostgreSQL
> > for now.  If some compelling use-cases ever surface that make it seem
> > worth the added maintenance burden, this thread could probably be
> > revisited.
>
> Ironically, the patch is currently failing due to translation problem:
>
> https://cirrus-ci.com/task/5467034313031680
> [19:12:28.179] su postgres -c "make -s -j${BUILD_JOBS} world-bin"
> [19:12:44.270] make[3]: *** No rule to make target 'po/cs.po', needed by 'po/cs.mo'.  Stop.
> [19:12:44.270] make[2]: *** [Makefile:44: all-pg_replslotdata-recurse] Error 2
> [19:12:44.270] make[2]: *** Waiting for unfinished jobs....
> [19:12:44.499] make[1]: *** [Makefile:42: all-bin-recurse] Error 2
> [19:12:44.499] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2

Thanks Juilen. I'm okay if the patch gets dropped. But, I'm curious to
know why the above error occurred. Is it because I included the nls.mk
file in the patch which I'm not supposed to? Are these nls.mk files
generated as part of the commit that does translation changes?

Regards,
Bharath Rupireddy.



Re: pg_replslotdata - a tool for displaying replication slot information

From
Julien Rouhaud
Date:
Hi,

On Mon, Jan 17, 2022 at 04:10:13PM +0530, Bharath Rupireddy wrote:
> 
> Thanks Juilen. I'm okay if the patch gets dropped.

Ok, I will take care of that soon.

> But, I'm curious to
> know why the above error occurred. Is it because I included the nls.mk
> file in the patch which I'm not supposed to? Are these nls.mk files
> generated as part of the commit that does translation changes?

Not exactly.  I think it's a good thing to take care of the translatability in
the initial submission, at least for the infrastructure part.  So the nlk.mk
and the _() function are fine, but you should have added an empty
AVAIL_LANGUAGES in your nlk.mk to avoid those errors.  The translation is being
done at a later stage by the various teams on babel ([1]) and then synced
periodically (usually by PeterE, thanks a lot to him!) in the tree.

[1] https://babel.postgresql.org/



Re: pg_replslotdata - a tool for displaying replication slot information

From
"Gurjeet"
Date:
On Mon Jan 17, 2022 at 5:11 AM PST, Julien Rouhaud wrote:
> On Mon, Jan 17, 2022 at 04:10:13PM +0530, Bharath Rupireddy wrote:
> >
> > Thanks Juilen. I'm okay if the patch gets dropped.
>
> Ok, I will take care of that soon.

I find this utility interesting and useful, especially for the reason
that it can provide information about the replication slots without
consuming a connection. I would be willing to continue the work on it.

Just checking here if, a year later, anyone has seen any more, or
interesting use-cases that would make it a candidate for its inclusion
in Postgres.

Best regards,
Gurjeet, http://Gurje.et
Postgres Contributors Team, https://aws.amazon.com/opensource