Thread: pg_system_identifier()

pg_system_identifier()

From
Vik Fearing
Date:
After someone in IRC asked if there was an equivalent to MySQL's
server_id, it was noted that we do have a system identifier but it's not
very accessible.

The attached patch implements a pg_system_identifier() function that
exposes it.

Shall I add this to the next commitfest?

--
Vik


Attachment

Re: pg_system_identifier()

From
Fujii Masao
Date:
On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:
> After someone in IRC asked if there was an equivalent to MySQL's
> server_id, it was noted that we do have a system identifier but it's not
> very accessible.
>
> The attached patch implements a pg_system_identifier() function that
> exposes it.

What's the use case?

IIUC, PostgreSQL's system identifier is not equal to MySQL server-id.
In PostgreSQL, in replication, the master and all the standbys must
have the same system identifier. OTOH, in MySQL, they have the different
server-ids. No?

Regards,

-- 
Fujii Masao



Re: pg_system_identifier()

From
Vik Fearing
Date:
On 08/22/2013 02:51 PM, Fujii Masao wrote:
> On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:
>> After someone in IRC asked if there was an equivalent to MySQL's
>> server_id, it was noted that we do have a system identifier but it's not
>> very accessible.
>>
>> The attached patch implements a pg_system_identifier() function that
>> exposes it.
> What's the use case?

It's information about the server that's only accessible through
pg_controldata.  I don't know if that's justification enough, which is
why I didn't add it to the commitfest yet.

> IIUC, PostgreSQL's system identifier is not equal to MySQL server-id.
> In PostgreSQL, in replication, the master and all the standbys must
> have the same system identifier. OTOH, in MySQL, they have the different
> server-ids. No?

I have zero experience with MySQL.

-- 
Vik




Re: pg_system_identifier()

From
Fujii Masao
Date:
On Thu, Aug 22, 2013 at 9:53 PM, Vik Fearing <vik.fearing@dalibo.com> wrote:
> On 08/22/2013 02:51 PM, Fujii Masao wrote:
>> On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:
>>> After someone in IRC asked if there was an equivalent to MySQL's
>>> server_id, it was noted that we do have a system identifier but it's not
>>> very accessible.
>>>
>>> The attached patch implements a pg_system_identifier() function that
>>> exposes it.
>> What's the use case?
>
> It's information about the server that's only accessible through
> pg_controldata.

BTW, you can see the system identifier by executing IDENTIFY_SYSTEM
command in replication connection as follows:

1. Change the server settings so that the server can accept the  replication connection
2. Connect to the server in replication mode
3. Execute IDENTIFY_SYSTEM command in replication connection

$ psql "replication=1"
=# IDENTIFY_SYSTEM;     systemid       | timeline |  xlogpos
---------------------+----------+-----------5914930202950905854 |        1 | 0/183F720
(1 row)

This is not good way for a user, though ;P

> I don't know if that's justification enough, which is
> why I didn't add it to the commitfest yet.

You can add the patch to CF, and then hear the opinions from other people
during CF.

Regards,

-- 
Fujii Masao



Re: pg_system_identifier()

From
Vik Fearing
Date:
On 08/22/2013 03:21 PM, Fujii Masao wrote:
>> > I don't know if that's justification enough, which is
>> > why I didn't add it to the commitfest yet.
> You can add the patch to CF, and then hear the opinions from other people
> during CF.

Added.

-- 
Vik




Re: pg_system_identifier()

From
Andres Freund
Date:
On 2013-08-22 21:51:22 +0900, Fujii Masao wrote:
> On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:
> > After someone in IRC asked if there was an equivalent to MySQL's
> > server_id, it was noted that we do have a system identifier but it's not
> > very accessible.
> >
> > The attached patch implements a pg_system_identifier() function that
> > exposes it.
> 
> What's the use case?
> 
> IIUC, PostgreSQL's system identifier is not equal to MySQL server-id.
> In PostgreSQL, in replication, the master and all the standbys must
> have the same system identifier. OTOH, in MySQL, they have the different
> server-ids. No?

FWIW I've wished for that function repeatedly. Mostly just to make sure
I am actually connected to the same "network" of replicas and not some
other.
It's also useful if you're providing support for a limited number of
machines and you want some form of identifying a node.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_system_identifier()

From
bricklen
Date:
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Aug 22, 2013 at 6:42 AM, Andres Freund
<spandir="ltr"><<a href="mailto:andres@2ndquadrant.com" target="_blank">andres@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><divclass="im"><br /></div>FWIW I've wished for that function repeatedly. Mostly
justto make sure<br /> I am actually connected to the same "network" of replicas and not some<br /> other.<br /> It's
alsouseful if you're providing support for a limited number of<br /> machines and you want some form of identifying a
node.<br/></blockquote></div><br /></div><div class="gmail_extra">There's a "hostname" function at PGXN which serves
someuse-cases: <a href="http://pgxn.org/dist/hostname/">http://pgxn.org/dist/hostname/</a></div></div> 

Re: pg_system_identifier()

From
Josh Berkus
Date:
All,

Vik's feature would be useful for detecting an accidental split in a
replication cluster.  That is, it would be another tool for detecting if
you've made a mistake and created two masters.  So +1 from me.

It will also be useful for me for sharding.  Right now, I'm doing a
hackish version of Vik's function, so I'd be glad to have it in core.

However, given that the value is the same for all servers in a
replication set, are we sure we want to call it system_identifier?  Is
there a better name?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: pg_system_identifier()

From
Tom Lane
Date:
bricklen <bricklen@gmail.com> writes:
> On Thu, Aug 22, 2013 at 6:42 AM, Andres Freund <andres@2ndquadrant.com>wrote:
>> FWIW I've wished for that function repeatedly. Mostly just to make sure
>> I am actually connected to the same "network" of replicas and not some
>> other.
>> It's also useful if you're providing support for a limited number of
>> machines and you want some form of identifying a node.

> There's a "hostname" function at PGXN which serves some use-cases:
> http://pgxn.org/dist/hostname/

I have a very vague recollection that we might've intentionally decided
not to expose the system identifier at the SQL level.  This could be all
wet, but it'd be worth trolling the archives to see if there was such a
conversation and if so whether the arguments still have merit.

See also recent discussion about changing how the identifier is computed
--- it'd be a good idea to fix that before we expose the identifier to
users, if we decide to do so.
        regards, tom lane



Re: pg_system_identifier()

From
Andres Freund
Date:
On 2013-08-22 08:45:38 -0700, Josh Berkus wrote:
> All,
> 
> Vik's feature would be useful for detecting an accidental split in a
> replication cluster.  That is, it would be another tool for detecting if
> you've made a mistake and created two masters.  So +1 from me.
> 
> It will also be useful for me for sharding.  Right now, I'm doing a
> hackish version of Vik's function, so I'd be glad to have it in core.
> 
> However, given that the value is the same for all servers in a
> replication set, are we sure we want to call it system_identifier?  Is
> there a better name?

Given it's been named that and visible via pg_controldata for years I am
against introducing confusion by renaming it.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_system_identifier()

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Vik's feature would be useful for detecting an accidental split in a
> replication cluster.  That is, it would be another tool for detecting if
> you've made a mistake and created two masters.  So +1 from me.

We don't change the ID when promoting a slave to master, do we?
So how would this help for that?

> However, given that the value is the same for all servers in a
> replication set, are we sure we want to call it system_identifier?  Is
> there a better name?

I think there are definitely use cases for a system identifier of some
kind, I'm just not sure that what we have got right now is especially
useful to users.  I'd rather see some thought go into what's needed
before we expose the existing definition (and consume the valuable
namespace of "pg_system_identifier").
        regards, tom lane



Re: pg_system_identifier()

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-08-22 08:45:38 -0700, Josh Berkus wrote:
>> However, given that the value is the same for all servers in a
>> replication set, are we sure we want to call it system_identifier?  Is
>> there a better name?

> Given it's been named that and visible via pg_controldata for years I am
> against introducing confusion by renaming it.

I agree that if we have a function named pg_system_identifier(), it ought
to return the same value that pg_controldata prints under that name.
But that doesn't really answer any questions about how that value is
obtained.  I think the question on the table right now is whether we like
the way that value behaves, in the context of a user-visible system ID.
In particular, do we want to think about changing things so that (1) a
slave has a different ID than the master, and/or (2) a slave's ID changes
on promotion to master.  I don't know the answers to these things ---
but once we make it user visible it's going to be too late to change
its behavior, so now's the time to consider.
        regards, tom lane



Re: pg_system_identifier()

From
Andres Freund
Date:
On 2013-08-22 12:06:03 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-08-22 08:45:38 -0700, Josh Berkus wrote:
> >> However, given that the value is the same for all servers in a
> >> replication set, are we sure we want to call it system_identifier?  Is
> >> there a better name?
> 
> > Given it's been named that and visible via pg_controldata for years I am
> > against introducing confusion by renaming it.
> 
> I agree that if we have a function named pg_system_identifier(), it ought
> to return the same value that pg_controldata prints under that name.
> But that doesn't really answer any questions about how that value is
> obtained.  I think the question on the table right now is whether we like
> the way that value behaves, in the context of a user-visible system ID.
> In particular, do we want to think about changing things so tha
> (1) a slave has a different ID than the master, and/or

We currently use the system identifier to know that we're replicating
between the same/compatible systems (c.f. libpqwalreceiver.c:
libpqrcv_identify_system()), so I don't think a change of definition
like that is realistic.


> (2) a slave's ID changes  on promotion to master.

We also cannot change the identifier here, because then other standbys
won't be able to follow the promotion because the identifier doesn't
match anymore

But essentially we already have something like that by the combination
of system identifier and timeline id. Admittedly there's the weakness
that the timelineid can increase the same on several machines in the
cluster but that's a weakness we ought to fix sometime independent of
this.

So maybe the answer is to also expose the current timeline?

An alternative would be to have a pg_controldata_values() SRF...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_system_identifier()

From
Bruce Momjian
Date:
On Thu, Aug 22, 2013 at 06:18:39PM +0200, Andres Freund wrote:
> But essentially we already have something like that by the combination
> of system identifier and timeline id. Admittedly there's the weakness
> that the timelineid can increase the same on several machines in the
> cluster but that's a weakness we ought to fix sometime independent of
> this.
> 
> So maybe the answer is to also expose the current timeline?
> 
> An alternative would be to have a pg_controldata_values() SRF...

It seems the value is more of a _cluster_ identifier than a system
identifier.  We don't allow cross-major-version replication, so I am
confused why we can't rename it in 9.4.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: pg_system_identifier()

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-08-22 12:06:03 -0400, Tom Lane wrote:

> > I agree that if we have a function named pg_system_identifier(), it ought
> > to return the same value that pg_controldata prints under that name.
> > But that doesn't really answer any questions about how that value is
> > obtained.  I think the question on the table right now is whether we like
> > the way that value behaves, in the context of a user-visible system ID.
> > In particular, do we want to think about changing things so tha
> > (1) a slave has a different ID than the master, and/or
> 
> We currently use the system identifier to know that we're replicating
> between the same/compatible systems (c.f. libpqwalreceiver.c:
> libpqrcv_identify_system()), so I don't think a change of definition
> like that is realistic.

We could split the value; make sure that the first, way, 96 bits are
identical in master and slaves (and change the code to only compare
those bits); but the last 32 bits are system specific and cannot appear
twice in the same replica network.  Also, perhaps we should reserve the
last (say) 4 bits, so that 0000 means master and 0001 means standby (it
changes on promotion), and the rest of the values are reserved for
future use.

Not necessarily that exact encoding, but hopefully you get my point.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_system_identifier()

From
Andres Freund
Date:
On 2013-08-22 12:18:41 -0400, Bruce Momjian wrote:
> On Thu, Aug 22, 2013 at 06:18:39PM +0200, Andres Freund wrote:
> > But essentially we already have something like that by the combination
> > of system identifier and timeline id. Admittedly there's the weakness
> > that the timelineid can increase the same on several machines in the
> > cluster but that's a weakness we ought to fix sometime independent of
> > this.
> > 
> > So maybe the answer is to also expose the current timeline?
> > 
> > An alternative would be to have a pg_controldata_values() SRF...
> 
> It seems the value is more of a _cluster_ identifier than a system
> identifier.  We don't allow cross-major-version replication, so I am
> confused why we can't rename it in 9.4.

For one, it would introduce confusion for the not inconsiderable number
of people already knowing the variable. For another, it's exposed via
the replication protocol's IDENTIFY SYSTEM.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_system_identifier()

From
Andres Freund
Date:
On 2013-08-22 12:20:19 -0400, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-08-22 12:06:03 -0400, Tom Lane wrote:
> 
> > > I agree that if we have a function named pg_system_identifier(), it ought
> > > to return the same value that pg_controldata prints under that name.
> > > But that doesn't really answer any questions about how that value is
> > > obtained.  I think the question on the table right now is whether we like
> > > the way that value behaves, in the context of a user-visible system ID.
> > > In particular, do we want to think about changing things so tha
> > > (1) a slave has a different ID than the master, and/or
> > 
> > We currently use the system identifier to know that we're replicating
> > between the same/compatible systems (c.f. libpqwalreceiver.c:
> > libpqrcv_identify_system()), so I don't think a change of definition
> > like that is realistic.
> 
> We could split the value; make sure that the first, way, 96 bits are
> identical in master and slaves (and change the code to only compare
> those bits); but the last 32 bits are system specific and cannot appear
> twice in the same replica network.  Also, perhaps we should reserve the
> last (say) 4 bits, so that 0000 means master and 0001 means standby (it
> changes on promotion), and the rest of the values are reserved for
> future use.

Why? This seems to be making a simple thing into something way much more
complex? Imo this proposal is about further exposing an already
existing, already exposed (via pg_controldata, via replication protocol)
variable, not more.

It seems better to make sure the other datapoints are *also* exposed if
they aren't yet.
Some are:
* port (SHOW port;)
* standby/primary (SELECT pg_is_in_recovery();)

Some are not easily:
* system identifier (pg_controldata, replication protocol)
* current timeline identifier (pg_controldata, replication protocol)
* host identifier/hostname (which actually is hard)


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_system_identifier()

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-08-22 12:20:19 -0400, Alvaro Herrera wrote:
>> We could split the value; make sure that the first, way, 96 bits are
>> identical in master and slaves (and change the code to only compare
>> those bits); but the last 32 bits are system specific and cannot appear
>> twice in the same replica network.  Also, perhaps we should reserve the
>> last (say) 4 bits, so that 0000 means master and 0001 means standby (it
>> changes on promotion), and the rest of the values are reserved for
>> future use.

> Why? This seems to be making a simple thing into something way much more
> complex? Imo this proposal is about further exposing an already
> existing, already exposed (via pg_controldata, via replication protocol)
> variable, not more.

> It seems better to make sure the other datapoints are *also* exposed if
> they aren't yet.

Agreed, this seems like overloading the identifier too much.  Currently we
consider it an 8-byte value with unspecified internal structure, and I
think we should probably maintain that approach rather than defining
APIs that assume it can be subdivided.  For one thing, reducing the width
of the "unique" part increases our risk of chance collisions.

Do we have a reliable way of generating a unique identifier for each slave
(independently of how that might be exposed)?
        regards, tom lane



Re: pg_system_identifier()

From
Andres Freund
Date:
On 2013-08-22 12:37:36 -0400, Tom Lane wrote:
> Do we have a reliable way of generating a unique identifier for each slave
> (independently of how that might be exposed)?

I'd like one, but it's not easy. The best I can think of is to mash
together:
* system_identifier
* mac address of primary interface we're listening to
* port
* primary/standby
in some pseudo-cryptographic manner.

But that's less than convincing because it can change after simple
configuration or hardware changes or even reboot :(.

Really identifying a particular host seems hard in anything resembling a
portable solution.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_system_identifier()

From
Fabien COELHO
Date:
> After someone in IRC asked if there was an equivalent to MySQL's
> server_id, it was noted that we do have a system identifier but it's not
> very accessible.
>
> The attached patch implements a pg_system_identifier() function that
> exposes it.

Would it make sense for such identifiers be standard UUID 
(http://en.wikipedia.org/wiki/UUID)?

Should there be a UUID per cluster? and/or per database, possibly deduce 
from the cluster one? Should it be configurable, say from 
"postgresql.conf"?
  get_pg_uuid()  get_pg_uuid('template0')

Note that there is a set of uuid functions provided as a module that may 
help.

-- 
Fabien.



Re: pg_system_identifier()

From
Dimitri Fontaine
Date:
Vik Fearing <vik.fearing@dalibo.com> writes:
>>> The attached patch implements a pg_system_identifier() function that
>>> exposes it.
>
> It's information about the server that's only accessible through
> pg_controldata.  I don't know if that's justification enough, which is
> why I didn't add it to the commitfest yet.

We already have two extensions to expose pg_controldata output at the
SQL level, and I think we should just include the facility in core.
 https://github.com/jconway/pg_controldata http://labs.omniti.com/labs/pgtreats/browser/trunk/contrib/control

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: pg_system_identifier()

From
Hannu Krosing
Date:
On 08/22/2013 06:37 PM, Tom Lane wrote:
> A
> Do we have a reliable way of generating a unique identifier for each slave
> (independently of how that might be exposed)?
Probably we could just generate an unique UUID when we first detect
that we are replicating from the master with same UUID.

This of course requires this master UUID to be present in some way
in the replication stream

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




Re: pg_system_identifier()

From
Greg Stark
Date:
<div dir="ltr">This doesn't generate a unique id. You could back up a standby and restore it and point it at the
originalmaster and end up with two standbies with the same id.<br /></div><div class="gmail_extra"><br /><br /><div
class="gmail_quote">OnFri, Aug 23, 2013 at 4:08 PM, Hannu Krosing <span dir="ltr"><<a
href="mailto:hannu@2ndquadrant.com"target="_blank">hannu@2ndquadrant.com</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 08/22/2013
06:37PM, Tom Lane wrote:<br /> > A<br /> > Do we have a reliable way of generating a unique identifier for each
slave<br/> > (independently of how that might be exposed)?<br /></div>Probably we could just generate an unique UUID
whenwe first detect<br /> that we are replicating from the master with same UUID.<br /><br /> This of course requires
thismaster UUID to be present in some way<br /> in the replication stream<br /><br /> Cheers<br /><span
class="HOEnZb"><fontcolor="#888888"><br /> --<br /> Hannu Krosing<br /> PostgreSQL Consultant<br /> Performance,
Scalabilityand High Availability<br /> 2ndQuadrant Nordic OÜ<br /></font></span><div class="HOEnZb"><div class="h5"><br
/><br/><br /> --<br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></div></div></blockquote></div><br /><br
clear="all"/><br />-- <br />greg<br /></div> 

Re: pg_system_identifier()

From
Christopher Browne
Date:
On Fri, Aug 23, 2013 at 3:10 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>> After someone in IRC asked if there was an equivalent to MySQL's
>> server_id, it was noted that we do have a system identifier but it's not
>> very accessible.
>>
>> The attached patch implements a pg_system_identifier() function that
>> exposes it.
>
>
> Would it make sense for such identifiers be standard UUID
> (http://en.wikipedia.org/wiki/UUID)?
>
> Should there be a UUID per cluster? and/or per database, possibly deduce
> from the cluster one? Should it be configurable, say from "postgresql.conf"?
>
>   get_pg_uuid()
>   get_pg_uuid('template0')
>
> Note that there is a set of uuid functions provided as a module that may
> help.

There is sense to this, sure.

I'd think that constructing a Type 5 (SHA-1) UUID based on some local
information would make a lot of sense.

In effect, based on constructing SHA-1 on a string looking like:

"Database system identifier:           5651554613500795646
Maximum data alignment:               8
Database block size:                  8192
WAL block size:                       8192
Maximum length of identifiers:        64
Date/time type storage:               64-bit integers
Version:  PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by
gcc (Debian 4.6.1-4) 4.6.1, 64-bit"

==> SHA-1 of b1b012cc85149d2fe4bf0fc18c38dcf1218e95a5

(Note that I didn't put anything into that which is mutable such as
port numbers, MAC addresses, or IP
addresses - seems to introduce risk of false-negatives to me...)
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"



Re: pg_system_identifier()

From
Tom Lane
Date:
Christopher Browne <cbbrowne@gmail.com> writes:
> On Fri, Aug 23, 2013 at 3:10 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> Would it make sense for such identifiers be standard UUID
>> (http://en.wikipedia.org/wiki/UUID)?

> There is sense to this, sure.

That ship's already sailed, though.  As was pointed out upthread, we don't
really want to change the way that pg_controldata prints the system ID,
and we don't want this SQL function printing something different either.

> I'd think that constructing a Type 5 (SHA-1) UUID based on some local
> information would make a lot of sense.

> In effect, based on constructing SHA-1 on a string looking like:

> "Database system identifier:           5651554613500795646
> Maximum data alignment:               8
> Database block size:                  8192
> WAL block size:                       8192
> Maximum length of identifiers:        64
> Date/time type storage:               64-bit integers
> Version:  PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by
> gcc (Debian 4.6.1-4) 4.6.1, 64-bit"

> ==> SHA-1 of b1b012cc85149d2fe4bf0fc18c38dcf1218e95a5

Including the version string would be a seriously bad idea --- you
don't want the sys ID to change just because you did a minor version
upgrade, or even recompiled the same version with a newer compiler,
do you?  There might be some point in factoring in those other values,
but I'm not terribly excited about them either.
        regards, tom lane



Re: pg_system_identifier()

From
Jim Nasby
Date:
On 8/23/13 11:23 AM, Greg Stark wrote:
> This doesn't generate a unique id. You could back up a standby and restore it and point it at the original master and
endup with two standbies with the same id.
 

If you want to enforce something unique throughout a cluster, I think we're stuck with having the cluster communicate
IDsacross an entire cluster. AFAIK that's how both Slony and londiste 3 do it.
 

I think it's also noteworthy that Slony and londiste both rely on the user specifying node identifiers. They don't try
tobe magic about it. I think there's 2 advantages there:
 

- Code is simpler
- Users can choose a naming schema that makes sense for them
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: pg_system_identifier()

From
Michael Paquier
Date:
On Mon, Aug 26, 2013 at 7:47 AM, Jim Nasby <jim@nasby.net> wrote:
> On 8/23/13 11:23 AM, Greg Stark wrote:
>>
>> This doesn't generate a unique id. You could back up a standby and restore
>> it and point it at the original master and end up with two standbies with
>> the same id.
>
>
> If you want to enforce something unique throughout a cluster, I think we're
> stuck with having the cluster communicate IDs across an entire cluster.
> AFAIK that's how both Slony and londiste 3 do it.
The same applies to Postgres-XC for node identifiers. Users can adapt
the settings of their cluster to their own needs.

> I think it's also noteworthy that Slony and londiste both rely on the user
> specifying node identifiers. They don't try to be magic about it. I think
> there's 2 advantages there:
>
> - Code is simpler
> - Users can choose a naming schema that makes sense for them
Definitely agreed on that.
-- 
Michael



Re: pg_system_identifier()

From
Hannu Krosing
Date:
On 08/26/2013 12:47 AM, Jim Nasby wrote:
> On 8/23/13 11:23 AM, Greg Stark wrote:
>> This doesn't generate a unique id. You could back up a standby and
>> restore it and point it at the original master and end up with two
>> standbies with the same id.
Yeah, not as easy as I imagined. It will fix itself once the 2nd slave
starts to follow the 1st, bt this has the disadvantage that for a
connected client a running system suddenly changes its "unique id".

If we want it happen automatically we have to allow erring on "too
often" or "not often enough" side for some users/usages.
>
> If you want to enforce something unique throughout a cluster, I think
> we're stuck with having the cluster communicate IDs across an entire
> cluster. AFAIK that's how both Slony and londiste 3 do it.
>
> I think it's also noteworthy that Slony and londiste both rely on the
> user specifying node identifiers. They don't try to be magic about it.
> I think there's 2 advantages there:
>
> - Code is simpler
> - Users can choose a naming schema that makes sense for them
3rd - really only users can determine when a "system" is unique and when
it is a copy of another.


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




Re: pg_system_identifier()

From
Fujii Masao
Date:
On Mon, Aug 26, 2013 at 1:12 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Aug 26, 2013 at 7:47 AM, Jim Nasby <jim@nasby.net> wrote:
>> On 8/23/13 11:23 AM, Greg Stark wrote:
>>>
>>> This doesn't generate a unique id. You could back up a standby and restore
>>> it and point it at the original master and end up with two standbies with
>>> the same id.
>>
>>
>> If you want to enforce something unique throughout a cluster, I think we're
>> stuck with having the cluster communicate IDs across an entire cluster.
>> AFAIK that's how both Slony and londiste 3 do it.
> The same applies to Postgres-XC for node identifiers. Users can adapt
> the settings of their cluster to their own needs.
>
>> I think it's also noteworthy that Slony and londiste both rely on the user
>> specifying node identifiers. They don't try to be magic about it. I think
>> there's 2 advantages there:
>>
>> - Code is simpler
>> - Users can choose a naming schema that makes sense for them
> Definitely agreed on that.

A user can already specify the unique standby name by using
application_name in primary_conninfo. So, the remaining thing
that we should do is to expose the primary_conninfo, i.e.,
commit the merge-recovery.conf-into-postgresql.conf patch ;P

Regards,

-- 
Fujii Masao



Re: pg_system_identifier()

From
Josh Berkus
Date:
All,

I think we've gotten a bit off the track of the proposed patch, which is
designed to report the system_identifier string currently in
pg_controldata, nothing more.

I'm going to reverse my vote, and vote against this patch.  The reason
why is that I think we should instead have a function:

pg_controldata(parameter text)

... which would report *all* strings in pg_controldata.  Hence, you'd do
instead:

pg_controldata('system identifier')

This will hopefully spare us from 15 patches incrementally adding all of
the individual items in controldata.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: pg_system_identifier()

From
Andres Freund
Date:
Hi,

On 2013-08-26 10:10:54 -0700, Josh Berkus wrote:
> I think we've gotten a bit off the track of the proposed patch, which is
> designed to report the system_identifier string currently in
> pg_controldata, nothing more.

Agreed.

> I'm going to reverse my vote, and vote against this patch.  The reason
> why is that I think we should instead have a function:
> 
> pg_controldata(parameter text)
> 
> ... which would report *all* strings in pg_controldata.  Hence, you'd do
> instead:
> 
> pg_controldata('system identifier')
> 
> This will hopefully spare us from 15 patches incrementally adding all of
> the individual items in controldata.

If anything but the proposed feature, it should be an SRF - passing in
text parameters isn't very discoverable. And I can't imagine where an
SRF would have too high overhead.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_system_identifier()

From
Dimitri Fontaine
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> If anything but the proposed feature, it should be an SRF - passing in
> text parameters isn't very discoverable. And I can't imagine where an
> SRF would have too high overhead.

The two existing extensions doing that are SRFs, yes. One of them has
been made by our committer Joe Conway. Let's please have a look how to
best include existing code into core.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: pg_system_identifier()

From
Andres Freund
Date:
On 2013-08-26 21:13:49 +0200, Dimitri Fontaine wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > If anything but the proposed feature, it should be an SRF - passing in
> > text parameters isn't very discoverable. And I can't imagine where an
> > SRF would have too high overhead.
> 
> The two existing extensions doing that are SRFs, yes. One of them has
> been made by our committer Joe Conway. Let's please have a look how to
> best include existing code into core.

I think the discussion so far is about whether we want to include
something like it into core (tentative yes so far?) and what the
userinterface should look like. Whether we use the proposed patch,
include the Joe's code or write something new seems to be the smaller
part.

Last time round there were some arguments made *against* including all
the fields: http://www.postgresql.org/message-id/7590.1267824977@sss.pgh.pa.us

Personally I don't see much problem with either exposing all of
pg_control or exposing the individual fields as individual functions.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_system_identifier()

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-08-26 10:10:54 -0700, Josh Berkus wrote:
>> I'm going to reverse my vote, and vote against this patch.  The reason
>> why is that I think we should instead have a function:
>> 
>> pg_controldata(parameter text)
>> 
>> ... which would report *all* strings in pg_controldata.  Hence, you'd do
>> instead:
>> 
>> pg_controldata('system identifier')
>> 
>> This will hopefully spare us from 15 patches incrementally adding all of
>> the individual items in controldata.

> If anything but the proposed feature, it should be an SRF - passing in
> text parameters isn't very discoverable.

I'm not pleased with the idea that we'd have to dumb all the relevant
datatypes down to text so that we could push them through this single
function.

Also, what about i18n?  pg_controldata localizes all the strings it
prints, but I doubt that's a great idea for either the input or the
output of this proposed function.
        regards, tom lane



Re: pg_system_identifier()

From
Andres Freund
Date:
On 2013-08-26 16:35:57 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-08-26 10:10:54 -0700, Josh Berkus wrote:
> >> I'm going to reverse my vote, and vote against this patch.  The reason
> >> why is that I think we should instead have a function:
> >> 
> >> pg_controldata(parameter text)
> >> 
> >> ... which would report *all* strings in pg_controldata.  Hence, you'd do
> >> instead:
> >> 
> >> pg_controldata('system identifier')
> >> 
> >> This will hopefully spare us from 15 patches incrementally adding all of
> >> the individual items in controldata.
> 
> > If anything but the proposed feature, it should be an SRF - passing in
> > text parameters isn't very discoverable.
> 
> I'm not pleased with the idea that we'd have to dumb all the relevant
> datatypes down to text so that we could push them through this single
> function.

We came to the same conclusion in an IM discussion some minutes
ago. There doesn't seem much to be going for anything but plain
functions that expose a single value each. a) greppability b)
discoverability c) data types.

The interesting data points around pg_control we could think of were:
* system identifier (text pg_system_identifier())
* current timeline id (bigint? pg_current_timeline())
* last checkpoint time (timestamptz pg_last_checkpoint_timestamp())
* last checkpoint location (text pg_last_checkpoint_location())

Those might also be interesting, but I am not 100% sure:
* minimum recovery point (text pg_minimum_recovery_location())
* minimum recovery timeline (bigint? pg_minimum_recovery_timeline())
* backup starting point (text pg_backup_start_location())
* backup end point (text pg_backup_end_location())
* backup end required? (bool pg_backup_end_required())

All the other variables are either already exposed, don't seem to be all
that interesting or not necessary accurate for a running cluster.

I'd vote for doing things piecemal here, otherwise it seems to be too
likely that we never get anywhere.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_system_identifier()

From
Josh Berkus
Date:
On 08/26/2013 01:50 PM, Andres Freund wrote:
> All the other variables are either already exposed, don't seem to be all
> that interesting or not necessary accurate for a running cluster.
> 
> I'd vote for doing things piecemal here, otherwise it seems to be too
> likely that we never get anywhere.

Ok, that sounds like a vote to accept the pg_system_identifier patch, then.

Given that the one thing I'd like to do is add to the docs on that
patch, cautioning users that system_identifier doesn't mean what they
think it means.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: pg_system_identifier()

From
Jim Nasby
Date:
On 8/26/13 8:48 AM, Fujii Masao wrote:
>>> I think it's also noteworthy that Slony and londiste both rely on the user
>>> >>specifying node identifiers. They don't try to be magic about it. I think
>>> >>there's 2 advantages there:
>>> >>
>>> >>- Code is simpler
>>> >>- Users can choose a naming schema that makes sense for them
>> >Definitely agreed on that.
> A user can already specify the unique standby name by using
> application_name in primary_conninfo. So, the remaining thing
> that we should do is to expose the primary_conninfo, i.e.,
> commit the merge-recovery.conf-into-postgresql.conf patch ;P

Is uniqueness actually enforced there? I believe that was part of the original problem...
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: pg_system_identifier()

From
Satoshi Nagayasu
Date:
Hi,

I'm catching up with the discussion as a reviewer...

(2013/08/27 5:35), Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> On 2013-08-26 10:10:54 -0700, Josh Berkus wrote:
>>> I'm going to reverse my vote, and vote against this patch.  The reason
>>> why is that I think we should instead have a function:
>>>
>>> pg_controldata(parameter text)
>>>
>>> ... which would report *all* strings in pg_controldata.  Hence, you'd do
>>> instead:
>>>
>>> pg_controldata('system identifier')
>>>
>>> This will hopefully spare us from 15 patches incrementally adding all of
>>> the individual items in controldata.
> 
>> If anything but the proposed feature, it should be an SRF - passing in
>> text parameters isn't very discoverable.
> 
> I'm not pleased with the idea that we'd have to dumb all the relevant
> datatypes down to text so that we could push them through this single
> function.
> 
> Also, what about i18n?  pg_controldata localizes all the strings it
> prints, but I doubt that's a great idea for either the input or the
> output of this proposed function.

How about adding new system view with new function which returns
a single pg_controldata value in text type, and using a cast for
each column in the view definition?

CREATE VIEW pg_catalog.pg_controldata AS SELECT pg_controldata('control_version')::integer AS control_version,
pg_controldata('catalog_version')::integerAS catalog_version,        pg_controldata('system_identifier')::bigint AS
system_identifier,       ...        pg_controldata('next_xlog_file')::char(25) AS next_xlog_file,        ...
pg_controldata('encoding')::textAS encoding;
 

Given that the view can work like a SRF, and it allows us to retrieve
all the values of pg_controldata with appropriate types in single
record from the view:
 select * from pg_catalog.pg_controldata;

To get the system identifier value:
 select system_identifier from pg_catalog.pg_controldata;

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp



Re: pg_system_identifier()

From
Robert Haas
Date:
On Mon, Sep 16, 2013 at 1:25 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
> How about adding new system view with new function which returns
> a single pg_controldata value in text type, and using a cast for
> each column in the view definition?
>
> CREATE VIEW pg_catalog.pg_controldata AS
>   SELECT pg_controldata('control_version')::integer AS control_version,
>          pg_controldata('catalog_version')::integer AS catalog_version,
>          pg_controldata('system_identifier')::bigint AS system_identifier,
>          ...
>          pg_controldata('next_xlog_file')::char(25) AS next_xlog_file,
>          ...
>          pg_controldata('encoding')::text AS encoding;
>
> Given that the view can work like a SRF, and it allows us to retrieve
> all the values of pg_controldata with appropriate types in single
> record from the view:

I like this idea.  I think having an easy way to get the values with
the right types will be a plus.  But adding a separate function for
each field seems excessive, so I think this is a good compromise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_system_identifier()

From
Andres Freund
Date:
On 2013-09-17 10:57:46 -0400, Robert Haas wrote:
> On Mon, Sep 16, 2013 at 1:25 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
> > How about adding new system view with new function which returns
> > a single pg_controldata value in text type, and using a cast for
> > each column in the view definition?
> >
> > CREATE VIEW pg_catalog.pg_controldata AS
> >   SELECT pg_controldata('control_version')::integer AS control_version,
> >          pg_controldata('catalog_version')::integer AS catalog_version,
> >          pg_controldata('system_identifier')::bigint AS system_identifier,
> >          ...
> >          pg_controldata('next_xlog_file')::char(25) AS next_xlog_file,
> >          ...
> >          pg_controldata('encoding')::text AS encoding;
> >
> > Given that the view can work like a SRF, and it allows us to retrieve
> > all the values of pg_controldata with appropriate types in single
> > record from the view:
> 
> I like this idea.  I think having an easy way to get the values with
> the right types will be a plus.  But adding a separate function for
> each field seems excessive, so I think this is a good compromise.

Why not add a single function returning a composite type then? That'd at
least have a chance of returning consistent values for the individual
values that change during runtime. It would also produce proper errors
when you load a view using columns that don't exist anymore instead of
just at runtime.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_system_identifier()

From
Robert Haas
Date:
On Tue, Sep 17, 2013 at 10:59 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-17 10:57:46 -0400, Robert Haas wrote:
>> On Mon, Sep 16, 2013 at 1:25 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>> > How about adding new system view with new function which returns
>> > a single pg_controldata value in text type, and using a cast for
>> > each column in the view definition?
>> >
>> > CREATE VIEW pg_catalog.pg_controldata AS
>> >   SELECT pg_controldata('control_version')::integer AS control_version,
>> >          pg_controldata('catalog_version')::integer AS catalog_version,
>> >          pg_controldata('system_identifier')::bigint AS system_identifier,
>> >          ...
>> >          pg_controldata('next_xlog_file')::char(25) AS next_xlog_file,
>> >          ...
>> >          pg_controldata('encoding')::text AS encoding;
>> >
>> > Given that the view can work like a SRF, and it allows us to retrieve
>> > all the values of pg_controldata with appropriate types in single
>> > record from the view:
>>
>> I like this idea.  I think having an easy way to get the values with
>> the right types will be a plus.  But adding a separate function for
>> each field seems excessive, so I think this is a good compromise.
>
> Why not add a single function returning a composite type then? That'd at
> least have a chance of returning consistent values for the individual
> values that change during runtime. It would also produce proper errors
> when you load a view using columns that don't exist anymore instead of
> just at runtime.

Hmm.  Yeah, that might be better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_system_identifier()

From
Robert Haas
Date:
On Tue, Sep 17, 2013 at 11:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Sep 17, 2013 at 10:59 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> On 2013-09-17 10:57:46 -0400, Robert Haas wrote:
>>> On Mon, Sep 16, 2013 at 1:25 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>>> > How about adding new system view with new function which returns
>>> > a single pg_controldata value in text type, and using a cast for
>>> > each column in the view definition?
>>> >
>>> > CREATE VIEW pg_catalog.pg_controldata AS
>>> >   SELECT pg_controldata('control_version')::integer AS control_version,
>>> >          pg_controldata('catalog_version')::integer AS catalog_version,
>>> >          pg_controldata('system_identifier')::bigint AS system_identifier,
>>> >          ...
>>> >          pg_controldata('next_xlog_file')::char(25) AS next_xlog_file,
>>> >          ...
>>> >          pg_controldata('encoding')::text AS encoding;
>>> >
>>> > Given that the view can work like a SRF, and it allows us to retrieve
>>> > all the values of pg_controldata with appropriate types in single
>>> > record from the view:
>>>
>>> I like this idea.  I think having an easy way to get the values with
>>> the right types will be a plus.  But adding a separate function for
>>> each field seems excessive, so I think this is a good compromise.
>>
>> Why not add a single function returning a composite type then? That'd at
>> least have a chance of returning consistent values for the individual
>> values that change during runtime. It would also produce proper errors
>> when you load a view using columns that don't exist anymore instead of
>> just at runtime.
>
> Hmm.  Yeah, that might be better.

Nobody's objected to this design, so I think it's the way to go.  But
since that's not what the patch implements, I'm marking this "Returned
with Feedback" in the CF app.  Please feel free to submit an updated
patch for the next CommitFest (perhaps on a new thread with a name
more accurately reflecting the outcome of this design discussion).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_system_identifier()

From
Pavel Stehule
Date:
<p dir="ltr"><br /> Dne 17. 9. 2013 17:01 "Andres Freund" <<a
href="mailto:andres@2ndquadrant.com">andres@2ndquadrant.com</a>>napsal(a):<br /> ><br /> > On 2013-09-17
10:57:46-0400, Robert Haas wrote:<br /> > > On Mon, Sep 16, 2013 at 1:25 AM, Satoshi Nagayasu <<a
href="mailto:snaga@uptime.jp">snaga@uptime.jp</a>>wrote:<br /> > > > How about adding new system view with
newfunction which returns<br /> > > > a single pg_controldata value in text type, and using a cast for<br />
>> > each column in the view definition?<br /> > > ><br /> > > > CREATE VIEW
pg_catalog.pg_controldataAS<br /> > > >   SELECT pg_controldata('control_version')::integer AS
control_version,<br/> > > >          pg_controldata('catalog_version')::integer AS catalog_version,<br /> >
>>          pg_controldata('system_identifier')::bigint AS system_identifier,<br /> > > >        
 ...<br/> > > >          pg_controldata('next_xlog_file')::char(25) AS next_xlog_file,<br /> > > >  
      ...<br /> > > >          pg_controldata('encoding')::text AS encoding;<br /> > > ><br /> >
>> Given that the view can work like a SRF, and it allows us to retrieve<br /> > > > all the values of
pg_controldatawith appropriate types in single<br /> > > > record from the view:<br /> > ><br /> >
>I like this idea.  I think having an easy way to get the values with<br /> > > the right types will be a
plus. But adding a separate function for<br /> > > each field seems excessive, so I think this is a good
compromise.<br/> ><br /> > Why not add a single function returning a composite type then? That'd at<br /> >
leasthave a chance of returning consistent values for the individual<br /> > values that change during runtime. It
wouldalso produce proper errors<br /> > when you load a view using columns that don't exist anymore instead of<br />
>just at runtime.<br /> ><br /><p dir="ltr">+1<p dir="ltr">Pavel<p dir="ltr">> Greetings,<br /> ><br />
>Andres Freund<br /> ><br /> > --<br /> >  Andres Freund                     <a
href="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a><br/> >  PostgreSQL Development, 24x7 Support,
Training& Services<br /> ><br /> ><br /> > --<br /> > Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> > To make changes to your
subscription:<br/> > <a
href="http://www.postgresql.org/mailpref/pgsql-hackers">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/>