Thread: pg_system_identifier()
After someone in IRC asked if there was an equivalent to MySQL's server_id, it was noted that we do have a system identifier but it's not very accessible. The attached patch implements a pg_system_identifier() function that exposes it. Shall I add this to the next commitfest? -- Vik
Attachment
On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote: > After someone in IRC asked if there was an equivalent to MySQL's > server_id, it was noted that we do have a system identifier but it's not > very accessible. > > The attached patch implements a pg_system_identifier() function that > exposes it. What's the use case? IIUC, PostgreSQL's system identifier is not equal to MySQL server-id. In PostgreSQL, in replication, the master and all the standbys must have the same system identifier. OTOH, in MySQL, they have the different server-ids. No? Regards, -- Fujii Masao
On 08/22/2013 02:51 PM, Fujii Masao wrote: > On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote: >> After someone in IRC asked if there was an equivalent to MySQL's >> server_id, it was noted that we do have a system identifier but it's not >> very accessible. >> >> The attached patch implements a pg_system_identifier() function that >> exposes it. > What's the use case? It's information about the server that's only accessible through pg_controldata. I don't know if that's justification enough, which is why I didn't add it to the commitfest yet. > IIUC, PostgreSQL's system identifier is not equal to MySQL server-id. > In PostgreSQL, in replication, the master and all the standbys must > have the same system identifier. OTOH, in MySQL, they have the different > server-ids. No? I have zero experience with MySQL. -- Vik
On Thu, Aug 22, 2013 at 9:53 PM, Vik Fearing <vik.fearing@dalibo.com> wrote: > On 08/22/2013 02:51 PM, Fujii Masao wrote: >> On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote: >>> After someone in IRC asked if there was an equivalent to MySQL's >>> server_id, it was noted that we do have a system identifier but it's not >>> very accessible. >>> >>> The attached patch implements a pg_system_identifier() function that >>> exposes it. >> What's the use case? > > It's information about the server that's only accessible through > pg_controldata. BTW, you can see the system identifier by executing IDENTIFY_SYSTEM command in replication connection as follows: 1. Change the server settings so that the server can accept the replication connection 2. Connect to the server in replication mode 3. Execute IDENTIFY_SYSTEM command in replication connection $ psql "replication=1" =# IDENTIFY_SYSTEM; systemid | timeline | xlogpos ---------------------+----------+-----------5914930202950905854 | 1 | 0/183F720 (1 row) This is not good way for a user, though ;P > I don't know if that's justification enough, which is > why I didn't add it to the commitfest yet. You can add the patch to CF, and then hear the opinions from other people during CF. Regards, -- Fujii Masao
On 08/22/2013 03:21 PM, Fujii Masao wrote: >> > I don't know if that's justification enough, which is >> > why I didn't add it to the commitfest yet. > You can add the patch to CF, and then hear the opinions from other people > during CF. Added. -- Vik
On 2013-08-22 21:51:22 +0900, Fujii Masao wrote: > On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote: > > After someone in IRC asked if there was an equivalent to MySQL's > > server_id, it was noted that we do have a system identifier but it's not > > very accessible. > > > > The attached patch implements a pg_system_identifier() function that > > exposes it. > > What's the use case? > > IIUC, PostgreSQL's system identifier is not equal to MySQL server-id. > In PostgreSQL, in replication, the master and all the standbys must > have the same system identifier. OTOH, in MySQL, they have the different > server-ids. No? FWIW I've wished for that function repeatedly. Mostly just to make sure I am actually connected to the same "network" of replicas and not some other. It's also useful if you're providing support for a limited number of machines and you want some form of identifying a node. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Aug 22, 2013 at 6:42 AM, Andres Freund <spandir="ltr"><<a href="mailto:andres@2ndquadrant.com" target="_blank">andres@2ndquadrant.com</a>></span> wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><divclass="im"><br /></div>FWIW I've wished for that function repeatedly. Mostly justto make sure<br /> I am actually connected to the same "network" of replicas and not some<br /> other.<br /> It's alsouseful if you're providing support for a limited number of<br /> machines and you want some form of identifying a node.<br/></blockquote></div><br /></div><div class="gmail_extra">There's a "hostname" function at PGXN which serves someuse-cases: <a href="http://pgxn.org/dist/hostname/">http://pgxn.org/dist/hostname/</a></div></div>
All, Vik's feature would be useful for detecting an accidental split in a replication cluster. That is, it would be another tool for detecting if you've made a mistake and created two masters. So +1 from me. It will also be useful for me for sharding. Right now, I'm doing a hackish version of Vik's function, so I'd be glad to have it in core. However, given that the value is the same for all servers in a replication set, are we sure we want to call it system_identifier? Is there a better name? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
bricklen <bricklen@gmail.com> writes: > On Thu, Aug 22, 2013 at 6:42 AM, Andres Freund <andres@2ndquadrant.com>wrote: >> FWIW I've wished for that function repeatedly. Mostly just to make sure >> I am actually connected to the same "network" of replicas and not some >> other. >> It's also useful if you're providing support for a limited number of >> machines and you want some form of identifying a node. > There's a "hostname" function at PGXN which serves some use-cases: > http://pgxn.org/dist/hostname/ I have a very vague recollection that we might've intentionally decided not to expose the system identifier at the SQL level. This could be all wet, but it'd be worth trolling the archives to see if there was such a conversation and if so whether the arguments still have merit. See also recent discussion about changing how the identifier is computed --- it'd be a good idea to fix that before we expose the identifier to users, if we decide to do so. regards, tom lane
On 2013-08-22 08:45:38 -0700, Josh Berkus wrote: > All, > > Vik's feature would be useful for detecting an accidental split in a > replication cluster. That is, it would be another tool for detecting if > you've made a mistake and created two masters. So +1 from me. > > It will also be useful for me for sharding. Right now, I'm doing a > hackish version of Vik's function, so I'd be glad to have it in core. > > However, given that the value is the same for all servers in a > replication set, are we sure we want to call it system_identifier? Is > there a better name? Given it's been named that and visible via pg_controldata for years I am against introducing confusion by renaming it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Josh Berkus <josh@agliodbs.com> writes: > Vik's feature would be useful for detecting an accidental split in a > replication cluster. That is, it would be another tool for detecting if > you've made a mistake and created two masters. So +1 from me. We don't change the ID when promoting a slave to master, do we? So how would this help for that? > However, given that the value is the same for all servers in a > replication set, are we sure we want to call it system_identifier? Is > there a better name? I think there are definitely use cases for a system identifier of some kind, I'm just not sure that what we have got right now is especially useful to users. I'd rather see some thought go into what's needed before we expose the existing definition (and consume the valuable namespace of "pg_system_identifier"). regards, tom lane
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-08-22 08:45:38 -0700, Josh Berkus wrote: >> However, given that the value is the same for all servers in a >> replication set, are we sure we want to call it system_identifier? Is >> there a better name? > Given it's been named that and visible via pg_controldata for years I am > against introducing confusion by renaming it. I agree that if we have a function named pg_system_identifier(), it ought to return the same value that pg_controldata prints under that name. But that doesn't really answer any questions about how that value is obtained. I think the question on the table right now is whether we like the way that value behaves, in the context of a user-visible system ID. In particular, do we want to think about changing things so that (1) a slave has a different ID than the master, and/or (2) a slave's ID changes on promotion to master. I don't know the answers to these things --- but once we make it user visible it's going to be too late to change its behavior, so now's the time to consider. regards, tom lane
On 2013-08-22 12:06:03 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-08-22 08:45:38 -0700, Josh Berkus wrote: > >> However, given that the value is the same for all servers in a > >> replication set, are we sure we want to call it system_identifier? Is > >> there a better name? > > > Given it's been named that and visible via pg_controldata for years I am > > against introducing confusion by renaming it. > > I agree that if we have a function named pg_system_identifier(), it ought > to return the same value that pg_controldata prints under that name. > But that doesn't really answer any questions about how that value is > obtained. I think the question on the table right now is whether we like > the way that value behaves, in the context of a user-visible system ID. > In particular, do we want to think about changing things so tha > (1) a slave has a different ID than the master, and/or We currently use the system identifier to know that we're replicating between the same/compatible systems (c.f. libpqwalreceiver.c: libpqrcv_identify_system()), so I don't think a change of definition like that is realistic. > (2) a slave's ID changes on promotion to master. We also cannot change the identifier here, because then other standbys won't be able to follow the promotion because the identifier doesn't match anymore But essentially we already have something like that by the combination of system identifier and timeline id. Admittedly there's the weakness that the timelineid can increase the same on several machines in the cluster but that's a weakness we ought to fix sometime independent of this. So maybe the answer is to also expose the current timeline? An alternative would be to have a pg_controldata_values() SRF... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Aug 22, 2013 at 06:18:39PM +0200, Andres Freund wrote: > But essentially we already have something like that by the combination > of system identifier and timeline id. Admittedly there's the weakness > that the timelineid can increase the same on several machines in the > cluster but that's a weakness we ought to fix sometime independent of > this. > > So maybe the answer is to also expose the current timeline? > > An alternative would be to have a pg_controldata_values() SRF... It seems the value is more of a _cluster_ identifier than a system identifier. We don't allow cross-major-version replication, so I am confused why we can't rename it in 9.4. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Andres Freund wrote: > On 2013-08-22 12:06:03 -0400, Tom Lane wrote: > > I agree that if we have a function named pg_system_identifier(), it ought > > to return the same value that pg_controldata prints under that name. > > But that doesn't really answer any questions about how that value is > > obtained. I think the question on the table right now is whether we like > > the way that value behaves, in the context of a user-visible system ID. > > In particular, do we want to think about changing things so tha > > (1) a slave has a different ID than the master, and/or > > We currently use the system identifier to know that we're replicating > between the same/compatible systems (c.f. libpqwalreceiver.c: > libpqrcv_identify_system()), so I don't think a change of definition > like that is realistic. We could split the value; make sure that the first, way, 96 bits are identical in master and slaves (and change the code to only compare those bits); but the last 32 bits are system specific and cannot appear twice in the same replica network. Also, perhaps we should reserve the last (say) 4 bits, so that 0000 means master and 0001 means standby (it changes on promotion), and the rest of the values are reserved for future use. Not necessarily that exact encoding, but hopefully you get my point. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-08-22 12:18:41 -0400, Bruce Momjian wrote: > On Thu, Aug 22, 2013 at 06:18:39PM +0200, Andres Freund wrote: > > But essentially we already have something like that by the combination > > of system identifier and timeline id. Admittedly there's the weakness > > that the timelineid can increase the same on several machines in the > > cluster but that's a weakness we ought to fix sometime independent of > > this. > > > > So maybe the answer is to also expose the current timeline? > > > > An alternative would be to have a pg_controldata_values() SRF... > > It seems the value is more of a _cluster_ identifier than a system > identifier. We don't allow cross-major-version replication, so I am > confused why we can't rename it in 9.4. For one, it would introduce confusion for the not inconsiderable number of people already knowing the variable. For another, it's exposed via the replication protocol's IDENTIFY SYSTEM. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-08-22 12:20:19 -0400, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2013-08-22 12:06:03 -0400, Tom Lane wrote: > > > > I agree that if we have a function named pg_system_identifier(), it ought > > > to return the same value that pg_controldata prints under that name. > > > But that doesn't really answer any questions about how that value is > > > obtained. I think the question on the table right now is whether we like > > > the way that value behaves, in the context of a user-visible system ID. > > > In particular, do we want to think about changing things so tha > > > (1) a slave has a different ID than the master, and/or > > > > We currently use the system identifier to know that we're replicating > > between the same/compatible systems (c.f. libpqwalreceiver.c: > > libpqrcv_identify_system()), so I don't think a change of definition > > like that is realistic. > > We could split the value; make sure that the first, way, 96 bits are > identical in master and slaves (and change the code to only compare > those bits); but the last 32 bits are system specific and cannot appear > twice in the same replica network. Also, perhaps we should reserve the > last (say) 4 bits, so that 0000 means master and 0001 means standby (it > changes on promotion), and the rest of the values are reserved for > future use. Why? This seems to be making a simple thing into something way much more complex? Imo this proposal is about further exposing an already existing, already exposed (via pg_controldata, via replication protocol) variable, not more. It seems better to make sure the other datapoints are *also* exposed if they aren't yet. Some are: * port (SHOW port;) * standby/primary (SELECT pg_is_in_recovery();) Some are not easily: * system identifier (pg_controldata, replication protocol) * current timeline identifier (pg_controldata, replication protocol) * host identifier/hostname (which actually is hard) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-08-22 12:20:19 -0400, Alvaro Herrera wrote: >> We could split the value; make sure that the first, way, 96 bits are >> identical in master and slaves (and change the code to only compare >> those bits); but the last 32 bits are system specific and cannot appear >> twice in the same replica network. Also, perhaps we should reserve the >> last (say) 4 bits, so that 0000 means master and 0001 means standby (it >> changes on promotion), and the rest of the values are reserved for >> future use. > Why? This seems to be making a simple thing into something way much more > complex? Imo this proposal is about further exposing an already > existing, already exposed (via pg_controldata, via replication protocol) > variable, not more. > It seems better to make sure the other datapoints are *also* exposed if > they aren't yet. Agreed, this seems like overloading the identifier too much. Currently we consider it an 8-byte value with unspecified internal structure, and I think we should probably maintain that approach rather than defining APIs that assume it can be subdivided. For one thing, reducing the width of the "unique" part increases our risk of chance collisions. Do we have a reliable way of generating a unique identifier for each slave (independently of how that might be exposed)? regards, tom lane
On 2013-08-22 12:37:36 -0400, Tom Lane wrote: > Do we have a reliable way of generating a unique identifier for each slave > (independently of how that might be exposed)? I'd like one, but it's not easy. The best I can think of is to mash together: * system_identifier * mac address of primary interface we're listening to * port * primary/standby in some pseudo-cryptographic manner. But that's less than convincing because it can change after simple configuration or hardware changes or even reboot :(. Really identifying a particular host seems hard in anything resembling a portable solution. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> After someone in IRC asked if there was an equivalent to MySQL's > server_id, it was noted that we do have a system identifier but it's not > very accessible. > > The attached patch implements a pg_system_identifier() function that > exposes it. Would it make sense for such identifiers be standard UUID (http://en.wikipedia.org/wiki/UUID)? Should there be a UUID per cluster? and/or per database, possibly deduce from the cluster one? Should it be configurable, say from "postgresql.conf"? get_pg_uuid() get_pg_uuid('template0') Note that there is a set of uuid functions provided as a module that may help. -- Fabien.
Vik Fearing <vik.fearing@dalibo.com> writes: >>> The attached patch implements a pg_system_identifier() function that >>> exposes it. > > It's information about the server that's only accessible through > pg_controldata. I don't know if that's justification enough, which is > why I didn't add it to the commitfest yet. We already have two extensions to expose pg_controldata output at the SQL level, and I think we should just include the facility in core. https://github.com/jconway/pg_controldata http://labs.omniti.com/labs/pgtreats/browser/trunk/contrib/control Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 08/22/2013 06:37 PM, Tom Lane wrote: > A > Do we have a reliable way of generating a unique identifier for each slave > (independently of how that might be exposed)? Probably we could just generate an unique UUID when we first detect that we are replicating from the master with same UUID. This of course requires this master UUID to be present in some way in the replication stream Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
<div dir="ltr">This doesn't generate a unique id. You could back up a standby and restore it and point it at the originalmaster and end up with two standbies with the same id.<br /></div><div class="gmail_extra"><br /><br /><div class="gmail_quote">OnFri, Aug 23, 2013 at 4:08 PM, Hannu Krosing <span dir="ltr"><<a href="mailto:hannu@2ndquadrant.com"target="_blank">hannu@2ndquadrant.com</a>></span> wrote:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 08/22/2013 06:37PM, Tom Lane wrote:<br /> > A<br /> > Do we have a reliable way of generating a unique identifier for each slave<br/> > (independently of how that might be exposed)?<br /></div>Probably we could just generate an unique UUID whenwe first detect<br /> that we are replicating from the master with same UUID.<br /><br /> This of course requires thismaster UUID to be present in some way<br /> in the replication stream<br /><br /> Cheers<br /><span class="HOEnZb"><fontcolor="#888888"><br /> --<br /> Hannu Krosing<br /> PostgreSQL Consultant<br /> Performance, Scalabilityand High Availability<br /> 2ndQuadrant Nordic OÜ<br /></font></span><div class="HOEnZb"><div class="h5"><br /><br/><br /> --<br /> Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></div></div></blockquote></div><br /><br clear="all"/><br />-- <br />greg<br /></div>
On Fri, Aug 23, 2013 at 3:10 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> After someone in IRC asked if there was an equivalent to MySQL's >> server_id, it was noted that we do have a system identifier but it's not >> very accessible. >> >> The attached patch implements a pg_system_identifier() function that >> exposes it. > > > Would it make sense for such identifiers be standard UUID > (http://en.wikipedia.org/wiki/UUID)? > > Should there be a UUID per cluster? and/or per database, possibly deduce > from the cluster one? Should it be configurable, say from "postgresql.conf"? > > get_pg_uuid() > get_pg_uuid('template0') > > Note that there is a set of uuid functions provided as a module that may > help. There is sense to this, sure. I'd think that constructing a Type 5 (SHA-1) UUID based on some local information would make a lot of sense. In effect, based on constructing SHA-1 on a string looking like: "Database system identifier: 5651554613500795646 Maximum data alignment: 8 Database block size: 8192 WAL block size: 8192 Maximum length of identifiers: 64 Date/time type storage: 64-bit integers Version: PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.6.1-4) 4.6.1, 64-bit" ==> SHA-1 of b1b012cc85149d2fe4bf0fc18c38dcf1218e95a5 (Note that I didn't put anything into that which is mutable such as port numbers, MAC addresses, or IP addresses - seems to introduce risk of false-negatives to me...) -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
Christopher Browne <cbbrowne@gmail.com> writes: > On Fri, Aug 23, 2013 at 3:10 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> Would it make sense for such identifiers be standard UUID >> (http://en.wikipedia.org/wiki/UUID)? > There is sense to this, sure. That ship's already sailed, though. As was pointed out upthread, we don't really want to change the way that pg_controldata prints the system ID, and we don't want this SQL function printing something different either. > I'd think that constructing a Type 5 (SHA-1) UUID based on some local > information would make a lot of sense. > In effect, based on constructing SHA-1 on a string looking like: > "Database system identifier: 5651554613500795646 > Maximum data alignment: 8 > Database block size: 8192 > WAL block size: 8192 > Maximum length of identifiers: 64 > Date/time type storage: 64-bit integers > Version: PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by > gcc (Debian 4.6.1-4) 4.6.1, 64-bit" > ==> SHA-1 of b1b012cc85149d2fe4bf0fc18c38dcf1218e95a5 Including the version string would be a seriously bad idea --- you don't want the sys ID to change just because you did a minor version upgrade, or even recompiled the same version with a newer compiler, do you? There might be some point in factoring in those other values, but I'm not terribly excited about them either. regards, tom lane
On 8/23/13 11:23 AM, Greg Stark wrote: > This doesn't generate a unique id. You could back up a standby and restore it and point it at the original master and endup with two standbies with the same id. If you want to enforce something unique throughout a cluster, I think we're stuck with having the cluster communicate IDsacross an entire cluster. AFAIK that's how both Slony and londiste 3 do it. I think it's also noteworthy that Slony and londiste both rely on the user specifying node identifiers. They don't try tobe magic about it. I think there's 2 advantages there: - Code is simpler - Users can choose a naming schema that makes sense for them -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Mon, Aug 26, 2013 at 7:47 AM, Jim Nasby <jim@nasby.net> wrote: > On 8/23/13 11:23 AM, Greg Stark wrote: >> >> This doesn't generate a unique id. You could back up a standby and restore >> it and point it at the original master and end up with two standbies with >> the same id. > > > If you want to enforce something unique throughout a cluster, I think we're > stuck with having the cluster communicate IDs across an entire cluster. > AFAIK that's how both Slony and londiste 3 do it. The same applies to Postgres-XC for node identifiers. Users can adapt the settings of their cluster to their own needs. > I think it's also noteworthy that Slony and londiste both rely on the user > specifying node identifiers. They don't try to be magic about it. I think > there's 2 advantages there: > > - Code is simpler > - Users can choose a naming schema that makes sense for them Definitely agreed on that. -- Michael
On 08/26/2013 12:47 AM, Jim Nasby wrote: > On 8/23/13 11:23 AM, Greg Stark wrote: >> This doesn't generate a unique id. You could back up a standby and >> restore it and point it at the original master and end up with two >> standbies with the same id. Yeah, not as easy as I imagined. It will fix itself once the 2nd slave starts to follow the 1st, bt this has the disadvantage that for a connected client a running system suddenly changes its "unique id". If we want it happen automatically we have to allow erring on "too often" or "not often enough" side for some users/usages. > > If you want to enforce something unique throughout a cluster, I think > we're stuck with having the cluster communicate IDs across an entire > cluster. AFAIK that's how both Slony and londiste 3 do it. > > I think it's also noteworthy that Slony and londiste both rely on the > user specifying node identifiers. They don't try to be magic about it. > I think there's 2 advantages there: > > - Code is simpler > - Users can choose a naming schema that makes sense for them 3rd - really only users can determine when a "system" is unique and when it is a copy of another. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On Mon, Aug 26, 2013 at 1:12 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Aug 26, 2013 at 7:47 AM, Jim Nasby <jim@nasby.net> wrote: >> On 8/23/13 11:23 AM, Greg Stark wrote: >>> >>> This doesn't generate a unique id. You could back up a standby and restore >>> it and point it at the original master and end up with two standbies with >>> the same id. >> >> >> If you want to enforce something unique throughout a cluster, I think we're >> stuck with having the cluster communicate IDs across an entire cluster. >> AFAIK that's how both Slony and londiste 3 do it. > The same applies to Postgres-XC for node identifiers. Users can adapt > the settings of their cluster to their own needs. > >> I think it's also noteworthy that Slony and londiste both rely on the user >> specifying node identifiers. They don't try to be magic about it. I think >> there's 2 advantages there: >> >> - Code is simpler >> - Users can choose a naming schema that makes sense for them > Definitely agreed on that. A user can already specify the unique standby name by using application_name in primary_conninfo. So, the remaining thing that we should do is to expose the primary_conninfo, i.e., commit the merge-recovery.conf-into-postgresql.conf patch ;P Regards, -- Fujii Masao
All, I think we've gotten a bit off the track of the proposed patch, which is designed to report the system_identifier string currently in pg_controldata, nothing more. I'm going to reverse my vote, and vote against this patch. The reason why is that I think we should instead have a function: pg_controldata(parameter text) ... which would report *all* strings in pg_controldata. Hence, you'd do instead: pg_controldata('system identifier') This will hopefully spare us from 15 patches incrementally adding all of the individual items in controldata. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Hi, On 2013-08-26 10:10:54 -0700, Josh Berkus wrote: > I think we've gotten a bit off the track of the proposed patch, which is > designed to report the system_identifier string currently in > pg_controldata, nothing more. Agreed. > I'm going to reverse my vote, and vote against this patch. The reason > why is that I think we should instead have a function: > > pg_controldata(parameter text) > > ... which would report *all* strings in pg_controldata. Hence, you'd do > instead: > > pg_controldata('system identifier') > > This will hopefully spare us from 15 patches incrementally adding all of > the individual items in controldata. If anything but the proposed feature, it should be an SRF - passing in text parameters isn't very discoverable. And I can't imagine where an SRF would have too high overhead. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > If anything but the proposed feature, it should be an SRF - passing in > text parameters isn't very discoverable. And I can't imagine where an > SRF would have too high overhead. The two existing extensions doing that are SRFs, yes. One of them has been made by our committer Joe Conway. Let's please have a look how to best include existing code into core. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 2013-08-26 21:13:49 +0200, Dimitri Fontaine wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > If anything but the proposed feature, it should be an SRF - passing in > > text parameters isn't very discoverable. And I can't imagine where an > > SRF would have too high overhead. > > The two existing extensions doing that are SRFs, yes. One of them has > been made by our committer Joe Conway. Let's please have a look how to > best include existing code into core. I think the discussion so far is about whether we want to include something like it into core (tentative yes so far?) and what the userinterface should look like. Whether we use the proposed patch, include the Joe's code or write something new seems to be the smaller part. Last time round there were some arguments made *against* including all the fields: http://www.postgresql.org/message-id/7590.1267824977@sss.pgh.pa.us Personally I don't see much problem with either exposing all of pg_control or exposing the individual fields as individual functions. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-08-26 10:10:54 -0700, Josh Berkus wrote: >> I'm going to reverse my vote, and vote against this patch. The reason >> why is that I think we should instead have a function: >> >> pg_controldata(parameter text) >> >> ... which would report *all* strings in pg_controldata. Hence, you'd do >> instead: >> >> pg_controldata('system identifier') >> >> This will hopefully spare us from 15 patches incrementally adding all of >> the individual items in controldata. > If anything but the proposed feature, it should be an SRF - passing in > text parameters isn't very discoverable. I'm not pleased with the idea that we'd have to dumb all the relevant datatypes down to text so that we could push them through this single function. Also, what about i18n? pg_controldata localizes all the strings it prints, but I doubt that's a great idea for either the input or the output of this proposed function. regards, tom lane
On 2013-08-26 16:35:57 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-08-26 10:10:54 -0700, Josh Berkus wrote: > >> I'm going to reverse my vote, and vote against this patch. The reason > >> why is that I think we should instead have a function: > >> > >> pg_controldata(parameter text) > >> > >> ... which would report *all* strings in pg_controldata. Hence, you'd do > >> instead: > >> > >> pg_controldata('system identifier') > >> > >> This will hopefully spare us from 15 patches incrementally adding all of > >> the individual items in controldata. > > > If anything but the proposed feature, it should be an SRF - passing in > > text parameters isn't very discoverable. > > I'm not pleased with the idea that we'd have to dumb all the relevant > datatypes down to text so that we could push them through this single > function. We came to the same conclusion in an IM discussion some minutes ago. There doesn't seem much to be going for anything but plain functions that expose a single value each. a) greppability b) discoverability c) data types. The interesting data points around pg_control we could think of were: * system identifier (text pg_system_identifier()) * current timeline id (bigint? pg_current_timeline()) * last checkpoint time (timestamptz pg_last_checkpoint_timestamp()) * last checkpoint location (text pg_last_checkpoint_location()) Those might also be interesting, but I am not 100% sure: * minimum recovery point (text pg_minimum_recovery_location()) * minimum recovery timeline (bigint? pg_minimum_recovery_timeline()) * backup starting point (text pg_backup_start_location()) * backup end point (text pg_backup_end_location()) * backup end required? (bool pg_backup_end_required()) All the other variables are either already exposed, don't seem to be all that interesting or not necessary accurate for a running cluster. I'd vote for doing things piecemal here, otherwise it seems to be too likely that we never get anywhere. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 08/26/2013 01:50 PM, Andres Freund wrote: > All the other variables are either already exposed, don't seem to be all > that interesting or not necessary accurate for a running cluster. > > I'd vote for doing things piecemal here, otherwise it seems to be too > likely that we never get anywhere. Ok, that sounds like a vote to accept the pg_system_identifier patch, then. Given that the one thing I'd like to do is add to the docs on that patch, cautioning users that system_identifier doesn't mean what they think it means. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 8/26/13 8:48 AM, Fujii Masao wrote: >>> I think it's also noteworthy that Slony and londiste both rely on the user >>> >>specifying node identifiers. They don't try to be magic about it. I think >>> >>there's 2 advantages there: >>> >> >>> >>- Code is simpler >>> >>- Users can choose a naming schema that makes sense for them >> >Definitely agreed on that. > A user can already specify the unique standby name by using > application_name in primary_conninfo. So, the remaining thing > that we should do is to expose the primary_conninfo, i.e., > commit the merge-recovery.conf-into-postgresql.conf patch ;P Is uniqueness actually enforced there? I believe that was part of the original problem... -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Hi, I'm catching up with the discussion as a reviewer... (2013/08/27 5:35), Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> On 2013-08-26 10:10:54 -0700, Josh Berkus wrote: >>> I'm going to reverse my vote, and vote against this patch. The reason >>> why is that I think we should instead have a function: >>> >>> pg_controldata(parameter text) >>> >>> ... which would report *all* strings in pg_controldata. Hence, you'd do >>> instead: >>> >>> pg_controldata('system identifier') >>> >>> This will hopefully spare us from 15 patches incrementally adding all of >>> the individual items in controldata. > >> If anything but the proposed feature, it should be an SRF - passing in >> text parameters isn't very discoverable. > > I'm not pleased with the idea that we'd have to dumb all the relevant > datatypes down to text so that we could push them through this single > function. > > Also, what about i18n? pg_controldata localizes all the strings it > prints, but I doubt that's a great idea for either the input or the > output of this proposed function. How about adding new system view with new function which returns a single pg_controldata value in text type, and using a cast for each column in the view definition? CREATE VIEW pg_catalog.pg_controldata AS SELECT pg_controldata('control_version')::integer AS control_version, pg_controldata('catalog_version')::integerAS catalog_version, pg_controldata('system_identifier')::bigint AS system_identifier, ... pg_controldata('next_xlog_file')::char(25) AS next_xlog_file, ... pg_controldata('encoding')::textAS encoding; Given that the view can work like a SRF, and it allows us to retrieve all the values of pg_controldata with appropriate types in single record from the view: select * from pg_catalog.pg_controldata; To get the system identifier value: select system_identifier from pg_catalog.pg_controldata; Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
On Mon, Sep 16, 2013 at 1:25 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > How about adding new system view with new function which returns > a single pg_controldata value in text type, and using a cast for > each column in the view definition? > > CREATE VIEW pg_catalog.pg_controldata AS > SELECT pg_controldata('control_version')::integer AS control_version, > pg_controldata('catalog_version')::integer AS catalog_version, > pg_controldata('system_identifier')::bigint AS system_identifier, > ... > pg_controldata('next_xlog_file')::char(25) AS next_xlog_file, > ... > pg_controldata('encoding')::text AS encoding; > > Given that the view can work like a SRF, and it allows us to retrieve > all the values of pg_controldata with appropriate types in single > record from the view: I like this idea. I think having an easy way to get the values with the right types will be a plus. But adding a separate function for each field seems excessive, so I think this is a good compromise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-09-17 10:57:46 -0400, Robert Haas wrote: > On Mon, Sep 16, 2013 at 1:25 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > > How about adding new system view with new function which returns > > a single pg_controldata value in text type, and using a cast for > > each column in the view definition? > > > > CREATE VIEW pg_catalog.pg_controldata AS > > SELECT pg_controldata('control_version')::integer AS control_version, > > pg_controldata('catalog_version')::integer AS catalog_version, > > pg_controldata('system_identifier')::bigint AS system_identifier, > > ... > > pg_controldata('next_xlog_file')::char(25) AS next_xlog_file, > > ... > > pg_controldata('encoding')::text AS encoding; > > > > Given that the view can work like a SRF, and it allows us to retrieve > > all the values of pg_controldata with appropriate types in single > > record from the view: > > I like this idea. I think having an easy way to get the values with > the right types will be a plus. But adding a separate function for > each field seems excessive, so I think this is a good compromise. Why not add a single function returning a composite type then? That'd at least have a chance of returning consistent values for the individual values that change during runtime. It would also produce proper errors when you load a view using columns that don't exist anymore instead of just at runtime. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 17, 2013 at 10:59 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-17 10:57:46 -0400, Robert Haas wrote: >> On Mon, Sep 16, 2013 at 1:25 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >> > How about adding new system view with new function which returns >> > a single pg_controldata value in text type, and using a cast for >> > each column in the view definition? >> > >> > CREATE VIEW pg_catalog.pg_controldata AS >> > SELECT pg_controldata('control_version')::integer AS control_version, >> > pg_controldata('catalog_version')::integer AS catalog_version, >> > pg_controldata('system_identifier')::bigint AS system_identifier, >> > ... >> > pg_controldata('next_xlog_file')::char(25) AS next_xlog_file, >> > ... >> > pg_controldata('encoding')::text AS encoding; >> > >> > Given that the view can work like a SRF, and it allows us to retrieve >> > all the values of pg_controldata with appropriate types in single >> > record from the view: >> >> I like this idea. I think having an easy way to get the values with >> the right types will be a plus. But adding a separate function for >> each field seems excessive, so I think this is a good compromise. > > Why not add a single function returning a composite type then? That'd at > least have a chance of returning consistent values for the individual > values that change during runtime. It would also produce proper errors > when you load a view using columns that don't exist anymore instead of > just at runtime. Hmm. Yeah, that might be better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Sep 17, 2013 at 11:43 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Sep 17, 2013 at 10:59 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> On 2013-09-17 10:57:46 -0400, Robert Haas wrote: >>> On Mon, Sep 16, 2013 at 1:25 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >>> > How about adding new system view with new function which returns >>> > a single pg_controldata value in text type, and using a cast for >>> > each column in the view definition? >>> > >>> > CREATE VIEW pg_catalog.pg_controldata AS >>> > SELECT pg_controldata('control_version')::integer AS control_version, >>> > pg_controldata('catalog_version')::integer AS catalog_version, >>> > pg_controldata('system_identifier')::bigint AS system_identifier, >>> > ... >>> > pg_controldata('next_xlog_file')::char(25) AS next_xlog_file, >>> > ... >>> > pg_controldata('encoding')::text AS encoding; >>> > >>> > Given that the view can work like a SRF, and it allows us to retrieve >>> > all the values of pg_controldata with appropriate types in single >>> > record from the view: >>> >>> I like this idea. I think having an easy way to get the values with >>> the right types will be a plus. But adding a separate function for >>> each field seems excessive, so I think this is a good compromise. >> >> Why not add a single function returning a composite type then? That'd at >> least have a chance of returning consistent values for the individual >> values that change during runtime. It would also produce proper errors >> when you load a view using columns that don't exist anymore instead of >> just at runtime. > > Hmm. Yeah, that might be better. Nobody's objected to this design, so I think it's the way to go. But since that's not what the patch implements, I'm marking this "Returned with Feedback" in the CF app. Please feel free to submit an updated patch for the next CommitFest (perhaps on a new thread with a name more accurately reflecting the outcome of this design discussion). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<p dir="ltr"><br /> Dne 17. 9. 2013 17:01 "Andres Freund" <<a href="mailto:andres@2ndquadrant.com">andres@2ndquadrant.com</a>>napsal(a):<br /> ><br /> > On 2013-09-17 10:57:46-0400, Robert Haas wrote:<br /> > > On Mon, Sep 16, 2013 at 1:25 AM, Satoshi Nagayasu <<a href="mailto:snaga@uptime.jp">snaga@uptime.jp</a>>wrote:<br /> > > > How about adding new system view with newfunction which returns<br /> > > > a single pg_controldata value in text type, and using a cast for<br /> >> > each column in the view definition?<br /> > > ><br /> > > > CREATE VIEW pg_catalog.pg_controldataAS<br /> > > > SELECT pg_controldata('control_version')::integer AS control_version,<br/> > > > pg_controldata('catalog_version')::integer AS catalog_version,<br /> > >> pg_controldata('system_identifier')::bigint AS system_identifier,<br /> > > > ...<br/> > > > pg_controldata('next_xlog_file')::char(25) AS next_xlog_file,<br /> > > > ...<br /> > > > pg_controldata('encoding')::text AS encoding;<br /> > > ><br /> > >> Given that the view can work like a SRF, and it allows us to retrieve<br /> > > > all the values of pg_controldatawith appropriate types in single<br /> > > > record from the view:<br /> > ><br /> > >I like this idea. I think having an easy way to get the values with<br /> > > the right types will be a plus. But adding a separate function for<br /> > > each field seems excessive, so I think this is a good compromise.<br/> ><br /> > Why not add a single function returning a composite type then? That'd at<br /> > leasthave a chance of returning consistent values for the individual<br /> > values that change during runtime. It wouldalso produce proper errors<br /> > when you load a view using columns that don't exist anymore instead of<br /> >just at runtime.<br /> ><br /><p dir="ltr">+1<p dir="ltr">Pavel<p dir="ltr">> Greetings,<br /> ><br /> >Andres Freund<br /> ><br /> > --<br /> > Andres Freund <a href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a><br/> > PostgreSQL Development, 24x7 Support, Training& Services<br /> ><br /> ><br /> > --<br /> > Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> > To make changes to your subscription:<br/> > <a href="http://www.postgresql.org/mailpref/pgsql-hackers">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/>