Thread: pg_replslotdata - a tool for displaying replication slot information
Hi,
The replication slots data is stored in binary format on the disk under the pg_replslot/<<slot_name>> directory which isn't human readable. If the server is crashed/down (for whatever reasons) and unable to come up, currently there's no way for the user/admin/developer to know what were all the replication slots available at the time of server crash/down to figure out what's the restart lsn, xid, two phase info or types of slots etc.
pg_replslotdata is a tool that interprets the replication slots information and displays it onto the stdout even if the server is crashed/down. The design of this tool is similar to other tools available in the core today i.e. pg_controldata, pg_waldump.
Attaching initial patch herewith. I will improve it with documentation and other stuff a bit later.
Please see the attached picture for the sample output.
Thoughts?
Regards,
Bharath Rupireddy.
The replication slots data is stored in binary format on the disk under the pg_replslot/<<slot_name>> directory which isn't human readable. If the server is crashed/down (for whatever reasons) and unable to come up, currently there's no way for the user/admin/developer to know what were all the replication slots available at the time of server crash/down to figure out what's the restart lsn, xid, two phase info or types of slots etc.
pg_replslotdata is a tool that interprets the replication slots information and displays it onto the stdout even if the server is crashed/down. The design of this tool is similar to other tools available in the core today i.e. pg_controldata, pg_waldump.
Attaching initial patch herewith. I will improve it with documentation and other stuff a bit later.
Please see the attached picture for the sample output.
Thoughts?
Regards,
Bharath Rupireddy.
Attachment
Re: pg_replslotdata - a tool for displaying replication slot information
From
Bharath Rupireddy
Date:
On Tue, Nov 23, 2021 at 10:39 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Hi, > > The replication slots data is stored in binary format on the disk under the pg_replslot/<<slot_name>> directory which isn'thuman readable. If the server is crashed/down (for whatever reasons) and unable to come up, currently there's no wayfor the user/admin/developer to know what were all the replication slots available at the time of server crash/down tofigure out what's the restart lsn, xid, two phase info or types of slots etc. > > pg_replslotdata is a tool that interprets the replication slots information and displays it onto the stdout even if theserver is crashed/down. The design of this tool is similar to other tools available in the core today i.e. pg_controldata,pg_waldump. > > Attaching initial patch herewith. I will improve it with documentation and other stuff a bit later. > > Please see the attached picture for the sample output. > > Thoughts? Attaching the rebased v2 patch. Regards, Bharath Rupireddy.
Attachment
Re: pg_replslotdata - a tool for displaying replication slot information
From
Bharath Rupireddy
Date:
On Wed, Nov 24, 2021 at 9:09 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Nov 23, 2021 at 10:39 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > Hi, > > > > The replication slots data is stored in binary format on the disk under the pg_replslot/<<slot_name>> directory whichisn't human readable. If the server is crashed/down (for whatever reasons) and unable to come up, currently there'sno way for the user/admin/developer to know what were all the replication slots available at the time of server crash/downto figure out what's the restart lsn, xid, two phase info or types of slots etc. > > > > pg_replslotdata is a tool that interprets the replication slots information and displays it onto the stdout even if theserver is crashed/down. The design of this tool is similar to other tools available in the core today i.e. pg_controldata,pg_waldump. > > > > Attaching initial patch herewith. I will improve it with documentation and other stuff a bit later. > > > > Please see the attached picture for the sample output. > > > > Thoughts? > > Attaching the rebased v2 patch. On windows the previous patches were failing, fixed that in the v3 patch. I'm really sorry for the noise. Regards, Bharath Rupireddy.
Attachment
On Wed, 24 Nov 2021 at 23:59, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Wed, Nov 24, 2021 at 9:09 PM Bharath Rupireddy >> > Thoughts? >> >> Attaching the rebased v2 patch. > > On windows the previous patches were failing, fixed that in the v3 > patch. I'm really sorry for the noise. > Cool! When I try to use it, there is an error for -v, --verbose option. px@ubuntu:~/Codes/postgres/Debug$ pg_replslotdata -v pg_replslotdata: invalid option -- 'v' Try "pg_replslotdata --help" for more information. This is because the getopt_long() missing 'v' in the third parameter. while ((c = getopt_long(argc, argv, "D:v", long_options, NULL)) != -1) -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: pg_replslotdata - a tool for displaying replication slot information
From
Bharath Rupireddy
Date:
On Wed, Nov 24, 2021 at 9:40 PM Japin Li <japinli@hotmail.com> wrote: > > > On Wed, 24 Nov 2021 at 23:59, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Nov 24, 2021 at 9:09 PM Bharath Rupireddy > >> > Thoughts? > >> > >> Attaching the rebased v2 patch. > > > > On windows the previous patches were failing, fixed that in the v3 > > patch. I'm really sorry for the noise. > > > > Cool! When I try to use it, there is an error for -v, --verbose option. > > px@ubuntu:~/Codes/postgres/Debug$ pg_replslotdata -v > pg_replslotdata: invalid option -- 'v' > Try "pg_replslotdata --help" for more information. > > This is because the getopt_long() missing 'v' in the third parameter. > > while ((c = getopt_long(argc, argv, "D:v", long_options, NULL)) != -1) Thanks for taking a look at the patch, attaching v4. There are many things that I could do in the patch, for instance, more comments, documentation, code improvements etc. I would like to first know what hackers think about this tool, and then start spending more time on it. Regards, Bharath Rupireddy.
Attachment
Re: pg_replslotdata - a tool for displaying replication slot information
From
Peter Eisentraut
Date:
On 23.11.21 06:09, Bharath Rupireddy wrote: > The replication slots data is stored in binary format on the disk under > the pg_replslot/<<slot_name>> directory which isn't human readable. If > the server is crashed/down (for whatever reasons) and unable to come up, > currently there's no way for the user/admin/developer to know what were > all the replication slots available at the time of server crash/down to > figure out what's the restart lsn, xid, two phase info or types of slots > etc. What do you need that for? You can't do anything with a replication slot while the server is down.
Re: pg_replslotdata - a tool for displaying replication slot information
From
"Bossart, Nathan"
Date:
On 11/30/21, 6:14 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote: > On 23.11.21 06:09, Bharath Rupireddy wrote: >> The replication slots data is stored in binary format on the disk under >> the pg_replslot/<<slot_name>> directory which isn't human readable. If >> the server is crashed/down (for whatever reasons) and unable to come up, >> currently there's no way for the user/admin/developer to know what were >> all the replication slots available at the time of server crash/down to >> figure out what's the restart lsn, xid, two phase info or types of slots >> etc. > > What do you need that for? You can't do anything with a replication > slot while the server is down. One use-case might be to discover the value you need to set for max_replication_slots, although it's pretty trivial to discover the number of replication slots by looking at the folder directly. However, you also need to know how many replication origins there are, and AFAIK there isn't an easy way to read the replorigin_checkpoint file at the moment. IMO a utility like this should also show details for the replication origins. I don't have any other compelling use- cases at the moment, but I will say that it is typically nice from an administrative standpoint to be able to inspect things like this without logging into a running server. Nathan
Re: pg_replslotdata - a tool for displaying replication slot information
From
Bharath Rupireddy
Date:
On Wed, Dec 1, 2021 at 12:13 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > On 11/30/21, 6:14 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote: > > On 23.11.21 06:09, Bharath Rupireddy wrote: > >> The replication slots data is stored in binary format on the disk under > >> the pg_replslot/<<slot_name>> directory which isn't human readable. If > >> the server is crashed/down (for whatever reasons) and unable to come up, > >> currently there's no way for the user/admin/developer to know what were > >> all the replication slots available at the time of server crash/down to > >> figure out what's the restart lsn, xid, two phase info or types of slots > >> etc. > > > > What do you need that for? You can't do anything with a replication > > slot while the server is down. > > One use-case might be to discover the value you need to set for > max_replication_slots, although it's pretty trivial to discover the > number of replication slots by looking at the folder directly. Apart from the above use-case, one can do some exploratory analysis on the replication slot information after the server crash, this may be useful for RCA or debugging purposes, for instance: 1) to look at the restart_lsn of the slots to get to know why there were many WAL files filled up on the disk (because of the restart_lsn being low) 2) to know how many replication slots available at the time of crash, if required, one can choose to drop selective replication slots or the ones that were falling behind to make the server up 3) if we persist active_pid info of the replication slot to the disk(currently we don't have this info in the disk), one can get to know the inactive replication slots at the time of crash 4) if the primary server is down and failover were to happen on to the standby, by looking at the replication slot information on the primary, one can easily recreate the slots on the standby > However, you also need to know how many replication origins there are, > and AFAIK there isn't an easy way to read the replorigin_checkpoint > file at the moment. IMO a utility like this should also show details > for the replication origins. I don't have any other compelling use- > cases at the moment, but I will say that it is typically nice from an > administrative standpoint to be able to inspect things like this > without logging into a running server. Yeah, this can be added too, probably as an extra option to the proposed pg_replslotdata tool. But for now, let's deal with the replication slot information alone and once this gets committed, we can extend it further for replication origin info. Regards, Bharath Rupireddy.
Re: pg_replslotdata - a tool for displaying replication slot information
From
SATYANARAYANA NARLAPURAM
Date:
On Tue, Nov 30, 2021 at 9:47 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Dec 1, 2021 at 12:13 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 11/30/21, 6:14 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:
> > On 23.11.21 06:09, Bharath Rupireddy wrote:
> >> The replication slots data is stored in binary format on the disk under
> >> the pg_replslot/<<slot_name>> directory which isn't human readable. If
> >> the server is crashed/down (for whatever reasons) and unable to come up,
> >> currently there's no way for the user/admin/developer to know what were
> >> all the replication slots available at the time of server crash/down to
> >> figure out what's the restart lsn, xid, two phase info or types of slots
> >> etc.
> >
> > What do you need that for? You can't do anything with a replication
> > slot while the server is down.
>
> One use-case might be to discover the value you need to set for
> max_replication_slots, although it's pretty trivial to discover the
> number of replication slots by looking at the folder directly.
Apart from the above use-case, one can do some exploratory analysis on
the replication slot information after the server crash, this may be
useful for RCA or debugging purposes, for instance:
1) to look at the restart_lsn of the slots to get to know why there
were many WAL files filled up on the disk (because of the restart_lsn
being low)
In a disk full scenario because of WAL, this tool comes handy identifying which WAL files to delete to free up the space and also help assess the accidental delete of the WAL files. I am not sure if there is a tool to help cleanup the WAL (may be invoking the archive_command too?) without impacting physical / logical slots, and respecting last checkpoint location but if one exist that will be handy
Hi, On 2021-11-30 18:43:23 +0000, Bossart, Nathan wrote: > On 11/30/21, 6:14 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote: > > On 23.11.21 06:09, Bharath Rupireddy wrote: > >> The replication slots data is stored in binary format on the disk under > >> the pg_replslot/<<slot_name>> directory which isn't human readable. If > >> the server is crashed/down (for whatever reasons) and unable to come up, > >> currently there's no way for the user/admin/developer to know what were > >> all the replication slots available at the time of server crash/down to > >> figure out what's the restart lsn, xid, two phase info or types of slots > >> etc. > > > > What do you need that for? You can't do anything with a replication > > slot while the server is down. Yes, I don't think there's sufficient need for this. > I don't have any other compelling use- cases at the moment, but I will say > that it is typically nice from an administrative standpoint to be able to > inspect things like this without logging into a running server. Weighed against the cost of maintaining (including documentation) a separate tool this doesn't seem sufficient reason. Greetings, Andres Freund
Re: pg_replslotdata - a tool for displaying replication slot information
From
Bharath Rupireddy
Date:
On Thu, Dec 2, 2021 at 4:22 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-11-30 18:43:23 +0000, Bossart, Nathan wrote: > > On 11/30/21, 6:14 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote: > > > On 23.11.21 06:09, Bharath Rupireddy wrote: > > >> The replication slots data is stored in binary format on the disk under > > >> the pg_replslot/<<slot_name>> directory which isn't human readable. If > > >> the server is crashed/down (for whatever reasons) and unable to come up, > > >> currently there's no way for the user/admin/developer to know what were > > >> all the replication slots available at the time of server crash/down to > > >> figure out what's the restart lsn, xid, two phase info or types of slots > > >> etc. > > > > > > What do you need that for? You can't do anything with a replication > > > slot while the server is down. > > Yes, I don't think there's sufficient need for this. Thanks. The idea of the pg_replslotdata is emanated from the real-time working experience with the customer issues and answering some of their questions. Given the fact that replication slots are used in almost every major production servers, and they are likely to cause problems, postgres having a core tool like pg_replslotdata to interpret the replication slot info without contacting the server, will definitely be useful for all the other postgres vendors out there. Having some important tool in the core, can avoid duplicate efforts. > > I don't have any other compelling use- cases at the moment, but I will say > > that it is typically nice from an administrative standpoint to be able to > > inspect things like this without logging into a running server. > > Weighed against the cost of maintaining (including documentation) a separate > tool this doesn't seem sufficient reason. IMHO, this shouldn't be a reason to not get something useful (useful IMO and few others in this thread) into the core. The maintenance of the tools generally is low compared to the core server features once they get reviewed and committed. Having said that, other hackers may have better thoughts. Regards, Bharath Rupireddy.
On Thu, Dec 02, 2021 at 08:32:08AM +0530, Bharath Rupireddy wrote: > On Thu, Dec 2, 2021 at 4:22 AM Andres Freund <andres@anarazel.de> wrote: >>> I don't have any other compelling use- cases at the moment, but I will say >>> that it is typically nice from an administrative standpoint to be able to >>> inspect things like this without logging into a running server. >> >> Weighed against the cost of maintaining (including documentation) a separate >> tool this doesn't seem sufficient reason. > > IMHO, this shouldn't be a reason to not get something useful (useful > IMO and few others in this thread) into the core. The maintenance of > the tools generally is low compared to the core server features once > they get reviewed and committed. Well, a bit less maintenance is always better than more maintenance. An extra cost that you may be missing is related to the translation of the documentation, as well as the translation of any new strings this would require. FWIW, I don't directly see a use for this tool that could not be solved with an online server. -- Michael
Attachment
Re: pg_replslotdata - a tool for displaying replication slot information
From
"Bossart, Nathan"
Date:
On 12/5/21, 11:10 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Thu, Dec 02, 2021 at 08:32:08AM +0530, Bharath Rupireddy wrote: >> On Thu, Dec 2, 2021 at 4:22 AM Andres Freund <andres@anarazel.de> wrote: >>>> I don't have any other compelling use- cases at the moment, but I will say >>>> that it is typically nice from an administrative standpoint to be able to >>>> inspect things like this without logging into a running server. >>> >>> Weighed against the cost of maintaining (including documentation) a separate >>> tool this doesn't seem sufficient reason. >> >> IMHO, this shouldn't be a reason to not get something useful (useful >> IMO and few others in this thread) into the core. The maintenance of >> the tools generally is low compared to the core server features once >> they get reviewed and committed. > > Well, a bit less maintenance is always better than more maintenance. > An extra cost that you may be missing is related to the translation of > the documentation, as well as the translation of any new strings this > would require. FWIW, I don't directly see a use for this tool that > could not be solved with an online server. Bharath, perhaps you should maintain this outside of core PostgreSQL for now. If some compelling use-cases ever surface that make it seem worth the added maintenance burden, this thread could probably be revisited. Nathan
Hi, On Mon, Dec 06, 2021 at 07:16:12PM +0000, Bossart, Nathan wrote: > On 12/5/21, 11:10 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > > On Thu, Dec 02, 2021 at 08:32:08AM +0530, Bharath Rupireddy wrote: > >> On Thu, Dec 2, 2021 at 4:22 AM Andres Freund <andres@anarazel.de> wrote: > >>>> I don't have any other compelling use- cases at the moment, but I will say > >>>> that it is typically nice from an administrative standpoint to be able to > >>>> inspect things like this without logging into a running server. > >>> > >>> Weighed against the cost of maintaining (including documentation) a separate > >>> tool this doesn't seem sufficient reason. > >> > >> IMHO, this shouldn't be a reason to not get something useful (useful > >> IMO and few others in this thread) into the core. The maintenance of > >> the tools generally is low compared to the core server features once > >> they get reviewed and committed. > > > > Well, a bit less maintenance is always better than more maintenance. > > An extra cost that you may be missing is related to the translation of > > the documentation, as well as the translation of any new strings this > > would require. FWIW, I don't directly see a use for this tool that > > could not be solved with an online server. > > Bharath, perhaps you should maintain this outside of core PostgreSQL > for now. If some compelling use-cases ever surface that make it seem > worth the added maintenance burden, this thread could probably be > revisited. Ironically, the patch is currently failing due to translation problem: https://cirrus-ci.com/task/5467034313031680 [19:12:28.179] su postgres -c "make -s -j${BUILD_JOBS} world-bin" [19:12:44.270] make[3]: *** No rule to make target 'po/cs.po', needed by 'po/cs.mo'. Stop. [19:12:44.270] make[2]: *** [Makefile:44: all-pg_replslotdata-recurse] Error 2 [19:12:44.270] make[2]: *** Waiting for unfinished jobs.... [19:12:44.499] make[1]: *** [Makefile:42: all-bin-recurse] Error 2 [19:12:44.499] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2 Looking at the thread, I see support from 3 people: - Bharath - Japin - Satyanarayana while 3 committers think that the extra maintenance effort isn't worth the usage: - Peter E. - Andres - Michael and a +0.5 from Nathan IIUC. I also personally don't think that this worth the maintenance effort. This tool being entirely client side, there's no problem with maintaining it on a separate repository, as mentioned by Nathan, including using it on the cloud providers that provides access to at least the data file. Another pro of the external repo is that the tool can be made available immediately and for older releases. Since 3 committers voted against it I think that the patch should be closed as "Rejected". I will do that in a few days unless there's some compelling objection by then.
Re: pg_replslotdata - a tool for displaying replication slot information
From
Bharath Rupireddy
Date:
On Sat, Jan 15, 2022 at 2:20 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Bharath, perhaps you should maintain this outside of core PostgreSQL > > for now. If some compelling use-cases ever surface that make it seem > > worth the added maintenance burden, this thread could probably be > > revisited. > > Ironically, the patch is currently failing due to translation problem: > > https://cirrus-ci.com/task/5467034313031680 > [19:12:28.179] su postgres -c "make -s -j${BUILD_JOBS} world-bin" > [19:12:44.270] make[3]: *** No rule to make target 'po/cs.po', needed by 'po/cs.mo'. Stop. > [19:12:44.270] make[2]: *** [Makefile:44: all-pg_replslotdata-recurse] Error 2 > [19:12:44.270] make[2]: *** Waiting for unfinished jobs.... > [19:12:44.499] make[1]: *** [Makefile:42: all-bin-recurse] Error 2 > [19:12:44.499] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2 Thanks Juilen. I'm okay if the patch gets dropped. But, I'm curious to know why the above error occurred. Is it because I included the nls.mk file in the patch which I'm not supposed to? Are these nls.mk files generated as part of the commit that does translation changes? Regards, Bharath Rupireddy.
Hi, On Mon, Jan 17, 2022 at 04:10:13PM +0530, Bharath Rupireddy wrote: > > Thanks Juilen. I'm okay if the patch gets dropped. Ok, I will take care of that soon. > But, I'm curious to > know why the above error occurred. Is it because I included the nls.mk > file in the patch which I'm not supposed to? Are these nls.mk files > generated as part of the commit that does translation changes? Not exactly. I think it's a good thing to take care of the translatability in the initial submission, at least for the infrastructure part. So the nlk.mk and the _() function are fine, but you should have added an empty AVAIL_LANGUAGES in your nlk.mk to avoid those errors. The translation is being done at a later stage by the various teams on babel ([1]) and then synced periodically (usually by PeterE, thanks a lot to him!) in the tree. [1] https://babel.postgresql.org/
On Mon Jan 17, 2022 at 5:11 AM PST, Julien Rouhaud wrote: > On Mon, Jan 17, 2022 at 04:10:13PM +0530, Bharath Rupireddy wrote: > > > > Thanks Juilen. I'm okay if the patch gets dropped. > > Ok, I will take care of that soon. I find this utility interesting and useful, especially for the reason that it can provide information about the replication slots without consuming a connection. I would be willing to continue the work on it. Just checking here if, a year later, anyone has seen any more, or interesting use-cases that would make it a candidate for its inclusion in Postgres. Best regards, Gurjeet, http://Gurje.et Postgres Contributors Team, https://aws.amazon.com/opensource