Thread: 9.4 Proposal: Initdb creates a single table
We start with a database called Postgres and a schema called Public. Yet we don't start up with any usable tables. I propose we add a single table called Postgres when we Initdb CREATE TABLE Postgres (Id Integer, Data Jsonb); COMMENT ON TABLE Postgres IS 'Single table for quick start usage - design your database'; The purpose of this is to make the database immediately usable. By including this table in the default initdb it will mean that programs can rely on the existence of this table and begin working quicker. By now, some of you will be doubled over laughing as if this is an April fool joke. I don't mean it to be at all. The idea is to have a stupidly obvious and easy table that will potentially be usable by just about everyone, in any language. If you don't like it, don't use it. If you really dislike it, drop it. But for new people coming to Postgres, they will have a data object to access and begin using the database immediately. Their code will work, their examples will work. OK, so they need to go back and think about the design, but at least they got it to work and will be encouraged to do more. Remember when we didn't have a database called Postgres? Remember how much simpler life is now? Remember that now. We can also insert a single row, Id = 0 with "Postgres sample data" in it, but that seems a step too far. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > By now, some of you will be doubled over laughing as if this is an > April fool joke. Indeed. regards, tom lane
On 04/23/2014 02:11 PM, Simon Riggs wrote: > I propose we add a single table called Postgres when we Initdb > > CREATE TABLE Postgres (Id Integer, Data Jsonb); Without particular comment on the need for the table, I'd be concerned about calling it "postgres". My personal impression from Stack Overflow etc has been that users are readily confused by the fact that we have: - Database engine/system "postgres" - backend binary "postgres" (they see it in ps) - unix user "postgres" - Pg superuser "postgres" - database "postgres" Sure, there's an argument for running with the theme here, but I suspect using the name "postgres" for a default table will just muddy the waters a bit more. Even "postgres_table" would help. It *absolutely must* be lower case, whatever it is, IMO. If you're going for newest-of-the-newbies, the last thing you want to do is having them dealing with it being just Postgres in some places, and having to be "Postgres" in others. Personally, don't know if I'm convinced it's overly worth doing - but I think it's silly to dismiss without actually corralling up some users who're unfamiliar with Pg and watching them get started. I'd love to see some properly conducted usability studies of Pg, and something like this would fit in well. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hello
if you are thinking about this direction, then store there some demo project.Pavel
2014-04-23 8:45 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 04/23/2014 02:11 PM, Simon Riggs wrote:
> I propose we add a single table called Postgres when we Initdb
>
> CREATE TABLE Postgres (Id Integer, Data Jsonb);
Without particular comment on the need for the table, I'd be concerned
about calling it "postgres".
My personal impression from Stack Overflow etc has been that users are
readily confused by the fact that we have:
- Database engine/system "postgres"
- backend binary "postgres" (they see it in ps)
- unix user "postgres"
- Pg superuser "postgres"
- database "postgres"
Sure, there's an argument for running with the theme here, but I suspect
using the name "postgres" for a default table will just muddy the waters
a bit more.
Even "postgres_table" would help.
It *absolutely must* be lower case, whatever it is, IMO. If you're going
for newest-of-the-newbies, the last thing you want to do is having them
dealing with it being just Postgres in some places, and having to be
"Postgres" in others.
Personally, don't know if I'm convinced it's overly worth doing - but I
think it's silly to dismiss without actually corralling up some users
who're unfamiliar with Pg and watching them get started. I'd love to see
some properly conducted usability studies of Pg, and something like this
would fit in well.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Simon Riggs wrote: > I propose we add a single table called Postgres when we Initdb > > CREATE TABLE Postgres (Id Integer, Data Jsonb); > COMMENT ON TABLE Postgres IS 'Single table for quick start usage - > design your database'; > > The purpose of this is to make the database immediately usable. By > including this table in the default initdb it will mean that programs > can rely on the existence of this table and begin working quicker. > > By now, some of you will be doubled over laughing as if this is an > April fool joke. I don't mean it to be at all. > > The idea is to have a stupidly obvious and easy table that will > potentially be usable by just about everyone, in any language. I am a PostgreSQL newbie. How is this table useful for me? I want to develop a database application. I want to store personal data like name and birth date! Actually, I feel confused. What should I do with this table? Is it part of the database system? Will the database be broken if I drop it? Do I have to ship it with my application? > If you don't like it, don't use it. If you really dislike it, drop it. No, I'm not the kind of person who reads a manual to figure out what to do with this table. I want to start coding *right now*. > But for new people coming to Postgres, they will have a data object to > access and begin using the database immediately. Their code will work, > their examples will work. OK, so they need to go back and think about > the design, but at least they got it to work and will be encouraged to > do more. I have found a sample application for personal data on the internet. How can I make it work with this table? > Remember when we didn't have a database called Postgres? Remember how > much simpler life is now? Remember that now. Good that you mention that! I have wondered what to do with it. When I first connected to PostgreSQL, I created a sample table, but the senior developer from the other office told me that this is the "postgres" database and that I shouldn't create any objects there. What is it good for? Can I delete it? Yours, Laurenz Albe
On 04/23/2014 03:20 PM, Albe Laurenz wrote: > Good that you mention that! I have wondered what to do with it. > When I first connected to PostgreSQL, I created a sample table, but the > senior developer from the other office told me that this is the "postgres" > database and that I shouldn't create any objects there. > > What is it good for? Can I delete it? A key difference between the "postgres" DB and a default table is that the "postgres" DB is very convenient with PostgreSQL's default of connecting to a DB of the same name as the user. We don't have a corresponding INSERT VALUES (42); SELECT fred; where there is some invisible implicit table name. Personally I wish Pg was able to start w/o "connecting" to any specific DB, but that's just not how the architecture works, and with that limitation the "postgres" DB seemed like a good compromise. (That said, it's really weird that the username of the superuser defaults to the current unix user when initdb'ing, but the db created by default is still always "postgres"). -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 23 April 2014 07:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> By now, some of you will be doubled over laughing as if this is an >> April fool joke. > > Indeed. I do like to share the odd joke now and then, it has to be said. So I'm glad I enriched your day. I was taught that ideas are accepted in this order: first we think them a joke, then we perceive them as a threat, then they become obvious. I can't find a Wikipedia article to give that idea more weight (OK, that was a joke). Just trying to think about how to improve the out of the box experience in ways that others already consider obvious. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer wrote: >> Good that you mention that! I have wondered what to do with it. >> When I first connected to PostgreSQL, I created a sample table, but the >> senior developer from the other office told me that this is the "postgres" >> database and that I shouldn't create any objects there. >> >> What is it good for? Can I delete it? > > A key difference between the "postgres" DB and a default table is that > the "postgres" DB is very convenient with PostgreSQL's default of > connecting to a DB of the same name as the user. I did not seriously want to dispute the value of the "postgres" DB, I just think that making things easier to understand for the newbie is *not* its greatest merit. It is mostly for the convenience of the administrator, right? Yours, Laurenz Albe
On Wed, Apr 23, 2014 at 6:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I propose we add a single table called Postgres when we Initdb
CREATE TABLE Postgres (Id Integer, Data Jsonb);
COMMENT ON TABLE Postgres IS 'Single table for quick start usage -
design your database';
The purpose of this is to make the database immediately usable. By
including this table in the default initdb it will mean that programs
can rely on the existence of this table and begin working quicker.
I'm not quite sure it would serve the same purpose as to what you're proposing here, but for a long time I've thought that it would be nice if PostgreSQL came with an example database that had a number of tables, perhaps that mock up some easy to relate to real-world application. These would be very useful to use as examples in the documents instead of inventing them in the ad-hoc way that we currently do. Like here: http://www.postgresql.org/docs/9.3/static/tutorial-window.html
In the above link we have some table called empsalary, but the new user can't go an execute that query to test it without first working out how to first create a table and insert some data into that table.
It would be really nice if new users could create this example database somehow and then they could play around with the example queries we put in the manual.
Regards
David Rowley
On 04/23/2014 08:11 AM, Simon Riggs wrote: > We start with a database called Postgres and a schema called Public. > Yet we don't start up with any usable tables. > > I propose we add a single table called Postgres when we Initdb > > CREATE TABLE Postgres (Id Integer, Data Jsonb); > COMMENT ON TABLE Postgres IS 'Single table for quick start usage - > design your database'; > > The purpose of this is to make the database immediately usable. With default access controls it still aint :( And most of complaints I have heard are about the need of fidgeting with pg_hba.conf to get *anything* started. So maybe we could start with something like this at the end of initdb: > createdb example > psql example sample# create table stuff(id serial primary key, data jsonb); sample# grant all on stuff to public; And also have the following lines in pg_hba.conf for it local all all trust host all all 127.0.0.1/32 trust host all all ::1/128 trust This would solve most of the frustration with starting pg development for newcomers Cheers Hannu > By > including this table in the default initdb it will mean that programs > can rely on the existence of this table and begin working quicker. > > By now, some of you will be doubled over laughing as if this is an > April fool joke. I don't mean it to be at all. > > The idea is to have a stupidly obvious and easy table that will > potentially be usable by just about everyone, in any language. > > If you don't like it, don't use it. If you really dislike it, drop it. > > But for new people coming to Postgres, they will have a data object to > access and begin using the database immediately. Their code will work, > their examples will work. OK, so they need to go back and think about > the design, but at least they got it to work and will be encouraged to > do more. > > Remember when we didn't have a database called Postgres? Remember how > much simpler life is now? Remember that now. > > We can also insert a single row, Id = 0 with "Postgres sample data" in > it, but that seems a step too far. > -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 23 April 2014 09:26, David Rowley <dgrowleyml@gmail.com> wrote: > On Wed, Apr 23, 2014 at 6:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> I propose we add a single table called Postgres when we Initdb >> >> CREATE TABLE Postgres (Id Integer, Data Jsonb); >> COMMENT ON TABLE Postgres IS 'Single table for quick start usage - >> design your database'; >> >> The purpose of this is to make the database immediately usable. By >> including this table in the default initdb it will mean that programs >> can rely on the existence of this table and begin working quicker. >> > > I'm not quite sure it would serve the same purpose as to what you're > proposing here, but for a long time I've thought that it would be nice if > PostgreSQL came with an example database that had a number of tables, > perhaps that mock up some easy to relate to real-world application. These > would be very useful to use as examples in the documents instead of > inventing them in the ad-hoc way that we currently do. Like here: > http://www.postgresql.org/docs/9.3/static/tutorial-window.html +1 to the idea of an example database, used throughout the docs Sounds like a summer of code project. Since we don't have that now, it won't work for 9.4. I still like the idea of a database installed by default on initdb, by default. Packagers can of course do what they like. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
This seems like a much better idea - whereas a single table, related to nothing - on the other hand, is at best not very helpful (and it could be argued, might contribute to teaching poor data data design). Regards Mark On 23/04/14 19:13, Pavel Stehule wrote: > Hello > > if you are thinking about this direction, then store there some demo > project. > > I am don't think so isolated table has significant price. >
* Simon Riggs (simon@2ndQuadrant.com) wrote: > +1 to the idea of an example database, used throughout the docs > Sounds like a summer of code project. Agreed. I'll add it to the GSoC ideas page. > Since we don't have that now, it won't work for 9.4. None of this is on the table for 9.4 as far as I'm concerned.. > I still like the idea of a database installed by default on initdb, by > default. Packagers can of course do what they like. I fail to see the point of adding something that's targetted at novice / end-users which 90% (yes, it's a random # that I pulled, but it's surely the majority, at least) of installs won't have. For my 2c, it'd also be a disservice to our users and to ourselves to encourage a design that minimizes the database's understanding of the data and greatly reduces the set of PG's capabilities that can be used. Thanks, Stephen
On 04/23/2014 03:28 PM, Stephen Frost wrote: > * Simon Riggs (simon@2ndQuadrant.com) wrote: >> >+1 to the idea of an example database, used throughout the docs >> >Sounds like a summer of code project. > Agreed. I'll add it to the GSoC ideas page. I don't think this is a good GSoC project. Documentation-only projects are not eligible for GSoC (https://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#12._Are_proposals_for_documentation_work). Perhaps you can argue that a sample database is not documentation, but it's certainly not in the spirit of the program. - Heikki
On 23 April 2014 13:46, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 04/23/2014 03:28 PM, Stephen Frost wrote: >> >> * Simon Riggs (simon@2ndQuadrant.com) wrote: >>> >>> >+1 to the idea of an example database, used throughout the docs >>> >Sounds like a summer of code project. >> >> Agreed. I'll add it to the GSoC ideas page. > > > I don't think this is a good GSoC project. Documentation-only projects are > not eligible for GSoC > (https://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#12._Are_proposals_for_documentation_work). > Perhaps you can argue that a sample database is not documentation, but it's > certainly not in the spirit of the program. Out of curiosity, had anyone seen this which some of us were briefly working on last year at PgCon's unconference? https://github.com/pvh/postgresql-sample-database -- Thom
Heikki, * Heikki Linnakangas (hlinnakangas@vmware.com) wrote: > On 04/23/2014 03:28 PM, Stephen Frost wrote: > >Agreed. I'll add it to the GSoC ideas page. > > I don't think this is a good GSoC project. Documentation-only > projects are not eligible for GSoC (https://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#12._Are_proposals_for_documentation_work). Ah, right, of course. Sorry, I've gotten to thinking of the GSoC page as being a "TODO" but for small projects. :) > Perhaps you can argue that a sample database is not documentation, > but it's certainly not in the spirit of the program. Nah, that wasn't my intent at all. Thanks, Stephen
On Wed, Apr 23, 2014 at 10:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
We start with a database called Postgres and a schema called Public.
Yet we don't start up with any usable tables.
I propose we add a single table called Postgres when we Initdb
CREATE TABLE Postgres (Id Integer, Data Jsonb);
COMMENT ON TABLE Postgres IS 'Single table for quick start usage -
design your database';
The purpose of this is to make the database immediately usable. By
including this table in the default initdb it will mean that programs
can rely on the existence of this table and begin working quicker.
By now, some of you will be doubled over laughing as if this is an
April fool joke. I don't mean it to be at all.
I can propose contrib PostgreNoSQL providing following:
1) Table postgres as you proposed.
2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id integer, data jsonb) returns void, search_postgres(query jsonb) returns setof postgres. search_postgres will have semantics of @> jsonb operator
3) Background workers which provides HTTP wrapper over those functions.
------
With best regards,
Alexander Korotkov.
With best regards,
Alexander Korotkov.
<div dir="ltr"><div class="gmail_extra"><br />On Wed, Apr 23, 2014 at 4:50 AM, Simon Riggs <<a href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br />><br />> On 23 April 2014 07:14, Tom Lane<<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /> > > Simon Riggs <simon@2ndQuadrant.com>writes:<br />> >> By now, some of you will be doubled over laughing as if this is an<br/>> >> April fool joke.<br />> ><br />> > Indeed.<br />><br /> > I do like to share the oddjoke now and then, it has to be said. So<br />> I'm glad I enriched your day.<br />><br />> I was taught thatideas are accepted in this order: first we think<br />> them a joke, then we perceive them as a threat, then theybecome<br /> > obvious. I can't find a Wikipedia article to give that idea more<br />> weight (OK, that was a joke).<br/>><br />> Just trying to think about how to improve the out of the box<br />> experience in ways thatothers already consider obvious.<br /> ><br /><br />Mahatma Gandhi said:<br /><br />"First they ignore you, then theylaugh at you, then they fight you, then you win."<br /><br />;-)<br /><br /></div><div class="gmail_extra">More infoin [1] [2].</div><div class="gmail_extra"><br /><br />Regards,<br /><br />[1] <a href="http://en.wikiquote.org/wiki/Mahatma_Gandhi">http://en.wikiquote.org/wiki/Mahatma_Gandhi</a><br/>[2] <a href="http://www.brainyquote.com/quotes/quotes/m/mahatmagan103630.html">http://www.brainyquote.com/quotes/quotes/m/mahatmagan103630.html</a><br /><br/>--<br />Fabrízio de Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>> Timbira: <a href="http://www.timbira.com.br">http://www.timbira.com.br</a><br/>>> Blog sobre TI: <a href="http://fabriziomello.blogspot.com">http://fabriziomello.blogspot.com</a><br/> >> Perfil Linkedin: <a href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/>>> Twitter: <a href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a></div></div>
On 04/23/2014 07:43 AM, Alexander Korotkov wrote: > I can propose contrib PostgreNoSQL providing following: > 1) Table postgres as you proposed. > 2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id > integer, data jsonb) returns void, search_postgres(query jsonb) returns > setof postgres. search_postgres will have semantics of @> jsonb operator > 3) Background workers which provides HTTP wrapper over those functions. You're forgetting ... sharding/replication over multiple masters. Also, the id should be a text value so that folks can use hash keys, or whatever. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus wrote: > On 04/23/2014 07:43 AM, Alexander Korotkov wrote: > > I can propose contrib PostgreNoSQL providing following: > > 1) Table postgres as you proposed. > > 2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id > > integer, data jsonb) returns void, search_postgres(query jsonb) returns > > setof postgres. search_postgres will have semantics of @> jsonb operator > > 3) Background workers which provides HTTP wrapper over those functions. > > You're forgetting ... sharding/replication over multiple masters. > > Also, the id should be a text value so that folks can use hash keys, or > whatever. We should totally have a type "uuidserial". -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Apr 23, 2014 at 02:26:50PM -0300, Alvaro Herrera wrote: > Josh Berkus wrote: > > On 04/23/2014 07:43 AM, Alexander Korotkov wrote: > > > I can propose contrib PostgreNoSQL providing following: > > > 1) Table postgres as you proposed. > > > 2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id > > > integer, data jsonb) returns void, search_postgres(query jsonb) returns > > > setof postgres. search_postgres will have semantics of @> jsonb operator > > > 3) Background workers which provides HTTP wrapper over those functions. > > > > You're forgetting ... sharding/replication over multiple masters. > > > > Also, the id should be a text value so that folks can use hash keys, or > > whatever. > > We should totally have a type "uuidserial". Something like it, certainly. One thing SQL Server does right is to have an opaque identity column, for which UUID would do an admirable job. We would need to build UUID functionality in, and I don't see this as a hard task. Should I draft it up as a self-contained extension? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, Apr 23, 2014 at 12:24:21PM -0700, David Fetter wrote: > On Wed, Apr 23, 2014 at 02:26:50PM -0300, Alvaro Herrera wrote: > > Josh Berkus wrote: > > > On 04/23/2014 07:43 AM, Alexander Korotkov wrote: > > > > I can propose contrib PostgreNoSQL providing following: > > > > 1) Table postgres as you proposed. > > > > 2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id > > > > integer, data jsonb) returns void, search_postgres(query jsonb) returns > > > > setof postgres. search_postgres will have semantics of @> jsonb operator > > > > 3) Background workers which provides HTTP wrapper over those functions. > > > > > > You're forgetting ... sharding/replication over multiple masters. > > > > > > Also, the id should be a text value so that folks can use hash keys, or > > > whatever. > > > > We should totally have a type "uuidserial". > > Something like it, certainly. One thing SQL Server does right is to > have an opaque identity column, for which UUID would do an admirable > job. We would need to build UUID functionality in, and I don't see > this as a hard task. Should I draft it up as a self-contained > extension? So I did a little research, and it appears that there's a part of util-linux that does UUIDs and is available under the 3-clause BSDL. It only does time- and urandom-based UUIDs, but that's probably a better start than nothing. Is there any good reason not to roll native UUID generation into our standard distribution? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > Is there any good reason not to roll native UUID generation into our > standard distribution? It's already there (as of 9.4) in pg_crypto. regards, tom lane
On Wed, Apr 23, 2014 at 08:27:52PM -0400, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > > Is there any good reason not to roll native UUID generation into > > our standard distribution? > > It's already there (as of 9.4) in pg_crypto. Sorry I wasn't clear enough. Since contrib/pgcrypto is a module that might well not be installed, people can't just build software for PostgreSQL and have UUIDs available, certainly not in the sense that, for example, BIGSERIAL is. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 24 April 2014 05:32, David Fetter <david@fetter.org> wrote: > On Wed, Apr 23, 2014 at 08:27:52PM -0400, Tom Lane wrote: >> David Fetter <david@fetter.org> writes: >> > Is there any good reason not to roll native UUID generation into >> > our standard distribution? >> >> It's already there (as of 9.4) in pg_crypto. > > Sorry I wasn't clear enough. > > Since contrib/pgcrypto is a module that might well not be installed, > people can't just build software for PostgreSQL and have UUIDs > available, certainly not in the sense that, for example, BIGSERIAL is. +1 to include in core - strange to have a UUID datatype in core but no means to generate -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Apr 24, 2014 at 3:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 24 April 2014 05:32, David Fetter <david@fetter.org> wrote: >> Since contrib/pgcrypto is a module that might well not be installed, >> people can't just build software for PostgreSQL and have UUIDs >> available, certainly not in the sense that, for example, BIGSERIAL is. > > +1 to include in core - strange to have a UUID datatype in core but no > means to generate +1. -- Michael
On Wed, Apr 23, 2014 at 11:26 AM, David Rowley <dgrowleyml@gmail.com> wrote: > but for a long time I've thought that it would be nice if > PostgreSQL came with an example database that had a number of tables, > perhaps that mock up some easy to relate to real-world application. These > would be very useful to use as examples in the documents instead of > inventing them in the ad-hoc way that we currently do. Like here: > http://www.postgresql.org/docs/9.3/static/tutorial-window.html I think that's a great idea. I'm not convinced it should be created by default in initdb, but a "CREATE EXTENSION sample_data" seems easy enough for newbies to use and has a good chance of getting merged into contrib. Regards, Marti
2014-04-24 15:40 GMT+02:00 Marti Raudsepp <marti@juffo.org>:
On Wed, Apr 23, 2014 at 11:26 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> but for a long time I've thought that it would be nice if
> PostgreSQL came with an example database that had a number of tables,
> perhaps that mock up some easy to relate to real-world application. These
> would be very useful to use as examples in the documents instead of
> inventing them in the ad-hoc way that we currently do. Like here:
> http://www.postgresql.org/docs/9.3/static/tutorial-window.html
I think that's a great idea. I'm not convinced it should be created by
default in initdb, but a "CREATE EXTENSION sample_data" seems easy
enough for newbies to use and has a good chance of getting merged into
contrib.
Good idea
Pavel
Regards,
Marti
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Simon Riggs <simon@2ndQuadrant.com> writes: > On 24 April 2014 05:32, David Fetter <david@fetter.org> wrote: >> Since contrib/pgcrypto is a module that might well not be installed, >> people can't just build software for PostgreSQL and have UUIDs >> available, certainly not in the sense that, for example, BIGSERIAL is. > +1 to include in core - strange to have a UUID datatype in core but no > means to generate The reason why there's no generation function in core is that there is no standardized, guaranteed-to-produce-a-universally-unique-value generation algorithm. That was the reason for not putting something in core when the type was first created, and I do not see that the technology has advanced. regards, tom lane
On Thu, Apr 24, 2014 at 11:21 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
+1
-- 2014-04-24 15:40 GMT+02:00 Marti Raudsepp <marti@juffo.org>:On Wed, Apr 23, 2014 at 11:26 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> but for a long time I've thought that it would be nice if
> PostgreSQL came with an example database that had a number of tables,
> perhaps that mock up some easy to relate to real-world application. These
> would be very useful to use as examples in the documents instead of
> inventing them in the ad-hoc way that we currently do. Like here:
> http://www.postgresql.org/docs/9.3/static/tutorial-window.html
I think that's a great idea. I'm not convinced it should be created by
default in initdb, but a "CREATE EXTENSION sample_data" seems easy
enough for newbies to use and has a good chance of getting merged into
contrib.Good idea
+1
This "sample data" can be very useful to many things:
* trainning
* automated tests
* benchmark ???
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
On 04/24/2014 04:57 PM, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 24 April 2014 05:32, David Fetter <david@fetter.org> wrote: >>> Since contrib/pgcrypto is a module that might well not be installed, >>> people can't just build software for PostgreSQL and have UUIDs >>> available, certainly not in the sense that, for example, BIGSERIAL is. >> +1 to include in core - strange to have a UUID datatype in core but no >> means to generate > The reason why there's no generation function in core is that there is no > standardized, guaranteed-to-produce-a-universally-unique-value generation > algorithm. That was the reason for not putting something in core when the > type was first created, and I do not see that the technology has advanced. Why can't we implement all 5 variants from http://en.wikipedia.org/wiki/Universally_unique_identifier and just warn about the dangers in documentation ? we could expose it something like next_uuid(<version nr>); As the article points out " Since the identifiers have a finite size, it is possible for two differing items to share the same identifier." so it is a known limitation of UUID and not something PostgreSQL specific. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Hannu Krosing <hannu@2ndQuadrant.com> writes: > On 04/24/2014 04:57 PM, Tom Lane wrote: >> The reason why there's no generation function in core is that there is no >> standardized, guaranteed-to-produce-a-universally-unique-value generation >> algorithm. That was the reason for not putting something in core when the >> type was first created, and I do not see that the technology has advanced. > Why can't we implement all 5 variants from > http://en.wikipedia.org/wiki/Universally_unique_identifier > and just warn about the dangers in documentation ? Essentially, that would mean carrying around our own implementation of libuuid --- which includes a bunch of not-terribly-portable stuff, such as discovering the machine's MAC address(es). That's not really something I want to see us putting project manpower into. I wonder what it would take to adapt contrib/uuid-ossp to work on top of some other popular implementation of that code. We pretty much bet on the wrong horse when we picked the OSSP library to depend on, but otherwise I think the principle of using an external library was good. regards, tom lane
On Thu, Apr 24, 2014 at 11:30:15AM -0400, Tom Lane wrote: > Hannu Krosing <hannu@2ndQuadrant.com> writes: > > On 04/24/2014 04:57 PM, Tom Lane wrote: > >> The reason why there's no generation function in core is that > >> there is no standardized, > >> guaranteed-to-produce-a-universally-unique-value generation > >> algorithm. That was the reason for not putting something in core > >> when the type was first created, and I do not see that the > >> technology has advanced. > > > Why can't we implement all 5 variants from > > http://en.wikipedia.org/wiki/Universally_unique_identifier and > > just warn about the dangers in documentation ? > > Essentially, that would mean carrying around our own implementation > of libuuid --- which includes a bunch of not-terribly-portable > stuff, such as discovering the machine's MAC address(es). That's > not really something I want to see us putting project manpower into. We don't need to do the not-terribly-portable stuff in the first round. For that, there could still be a bundled extension. The point is that UUIDs are nowhere near as usable as users have the right to expect, and we should fix that. > I wonder what it would take to adapt contrib/uuid-ossp to work on > top of some other popular implementation of that code. We pretty > much bet on the wrong horse when we picked the OSSP library to > depend on, but otherwise I think the principle of using an external > library was good. So long as we can pick another horse later, sure. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Thu, Apr 24, 2014 at 11:30:15AM -0400, Tom Lane wrote: >> Essentially, that would mean carrying around our own implementation >> of libuuid --- which includes a bunch of not-terribly-portable >> stuff, such as discovering the machine's MAC address(es). That's >> not really something I want to see us putting project manpower into. > We don't need to do the not-terribly-portable stuff in the first > round. For that, there could still be a bundled extension. > The point is that UUIDs are nowhere near as usable as users have the > right to expect, and we should fix that. The reason that UUIDs aren't as usable as users "have a right to expect" is that the underlying technology doesn't meet their (your) expectations. Just because it's easy to imagine that there are universally unique identifiers doesn't mean that there actually *are* universally unique identifiers. There are only approximations with varying failure modes. This is not our fault, and I don't want us to get caught up in trying to fix a fundamentally broken concept --- which is what a generic "uuidserial" API would be. If you try to paper over the difficulties here, they'll just bite you on the rear someday. regards, tom lane
Tom Lane wrote: > This is not our fault, and I don't want us to get caught up in trying > to fix a fundamentally broken concept --- which is what a generic > "uuidserial" API would be. If you try to paper over the difficulties > here, they'll just bite you on the rear someday. But we have non-colliding generation technology for OIDs in system catalogs. We could try to reuse the idea in a UUID generator: grab one value, try to insert; if it fails generate a new one, lather, rinse, repeat. This would make uuidserial not quite as simple as SERIAL pseudotype, of course. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 04/24/2014 08:00 PM, Alvaro Herrera wrote: > Tom Lane wrote: > >> This is not our fault, and I don't want us to get caught up in trying >> to fix a fundamentally broken concept --- which is what a generic >> "uuidserial" API would be. If you try to paper over the difficulties >> here, they'll just bite you on the rear someday. > > But we have non-colliding generation technology for OIDs in system > catalogs. We could try to reuse the idea in a UUID generator: grab one > value, try to insert; if it fails generate a new one, lather, rinse, > repeat. Umm, UUID stands for Universally Unique IDentifier. That would hardly be *universally* unique. - Heikki
On 04/24/2014 10:00 AM, Alvaro Herrera wrote: > Tom Lane wrote: > >> This is not our fault, and I don't want us to get caught up in trying >> to fix a fundamentally broken concept --- which is what a generic >> "uuidserial" API would be. If you try to paper over the difficulties >> here, they'll just bite you on the rear someday. > > But we have non-colliding generation technology for OIDs in system > catalogs. We could try to reuse the idea in a UUID generator: grab one > value, try to insert; if it fails generate a new one, lather, rinse, > repeat. > > This would make uuidserial not quite as simple as SERIAL pseudotype, of > course. Tangentally related to this, I believe that PostgreSQL may be the only current user of the uuid-ossp code at this point. So we're going to end up maintaining it for the UUID contrib module, or dropping that module. Regardless, this all sounds like speculative 9.5ish conversation, and only vaguely related to Simon's original proposal. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Heikki Linnakangas wrote: > On 04/24/2014 08:00 PM, Alvaro Herrera wrote: > >Tom Lane wrote: > > > >>This is not our fault, and I don't want us to get caught up in trying > >>to fix a fundamentally broken concept --- which is what a generic > >>"uuidserial" API would be. If you try to paper over the difficulties > >>here, they'll just bite you on the rear someday. > > > >But we have non-colliding generation technology for OIDs in system > >catalogs. We could try to reuse the idea in a UUID generator: grab one > >value, try to insert; if it fails generate a new one, lather, rinse, > >repeat. > > Umm, UUID stands for Universally Unique IDentifier. That would > hardly be *universally* unique. I don't understand your point. I'm only replying to Tom's assertion that UUID generation might not be all that unique after all (or, in other words, AIUI, that the "universally unique" part of the name is wishful thinking and not an actual property of the real thing.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera wrote: > Heikki Linnakangas wrote: > > On 04/24/2014 08:00 PM, Alvaro Herrera wrote: > > >Tom Lane wrote: > > > > > >>This is not our fault, and I don't want us to get caught up in trying > > >>to fix a fundamentally broken concept --- which is what a generic > > >>"uuidserial" API would be. If you try to paper over the difficulties > > >>here, they'll just bite you on the rear someday. > > > > > >But we have non-colliding generation technology for OIDs in system > > >catalogs. We could try to reuse the idea in a UUID generator: grab one > > >value, try to insert; if it fails generate a new one, lather, rinse, > > >repeat. > > > > Umm, UUID stands for Universally Unique IDentifier. That would > > hardly be *universally* unique. > > I don't understand your point. I'm only replying to Tom's assertion > that UUID generation might not be all that unique after all (or, in > other words, AIUI, that the "universally unique" part of the name is > wishful thinking and not an actual property of the real thing.) Oh, I think I see your point: it's that no matter what we do here, there would be no way to guarantee that a value we generate does not collide with any other value elsewhere (either on other uuidserial columns, or on other servers). Is that it? Because if it is, then I think the problem is that the UUID concept might be flawed yet users still want to use it, and we do no service by refusing to provide it on those grounds. Now I certainly don't want my rear bitten more than it already is, anyway. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 04/24/2014 08:23 PM, Alvaro Herrera wrote: > Alvaro Herrera wrote: >> Heikki Linnakangas wrote: >>> On 04/24/2014 08:00 PM, Alvaro Herrera wrote: >>>> Tom Lane wrote: >>>> >>>>> This is not our fault, and I don't want us to get caught up in trying >>>>> to fix a fundamentally broken concept --- which is what a generic >>>>> "uuidserial" API would be. If you try to paper over the difficulties >>>>> here, they'll just bite you on the rear someday. >>>> >>>> But we have non-colliding generation technology for OIDs in system >>>> catalogs. We could try to reuse the idea in a UUID generator: grab one >>>> value, try to insert; if it fails generate a new one, lather, rinse, >>>> repeat. >>> >>> Umm, UUID stands for Universally Unique IDentifier. That would >>> hardly be *universally* unique. >> >> I don't understand your point. I'm only replying to Tom's assertion >> that UUID generation might not be all that unique after all (or, in >> other words, AIUI, that the "universally unique" part of the name is >> wishful thinking and not an actual property of the real thing.) > > Oh, I think I see your point: it's that no matter what we do here, there > would be no way to guarantee that a value we generate does not collide > with any other value elsewhere (either on other uuidserial columns, or > on other servers). > > Is that it? Yep. > Because if it is, then I think the problem is that the UUID concept > might be flawed yet users still want to use it, and we do no service by > refusing to provide it on those grounds. Well, we should make a reasonable effort to make them unique. If there is a reliable-enough way to generate UUIDs that doesn't depend on external libraries, by all means lets have it in core. I believe the reason we put gen_random_uuid() in pgcrypto is that it needs a good random number generator, and we don't trust plain old random() to be good enough for that. - Heikki
Alvaro, >> I don't understand your point. I'm only replying to Tom's assertion >> that UUID generation might not be all that unique after all (or, in >> other words, AIUI, that the "universally unique" part of the name is >> wishful thinking and not an actual property of the real thing.) > > Oh, I think I see your point: it's that no matter what we do here, there > would be no way to guarantee that a value we generate does not collide > with any other value elsewhere (either on other uuidserial columns, or > on other servers). > > Is that it? > > Because if it is, then I think the problem is that the UUID concept > might be flawed yet users still want to use it, and we do no service by > refusing to provide it on those grounds. It's more than that: 1) the concept of UUIDs is fundamentally flawed, to the extent that if we have a UUID type in core its flaws become our flaws, to be handled in bug reports forever. 2) Because the concept of UUIDs is flawed, there are multiple competing implementations, none of which is clearly dominant and durable. As such, any UUID algorithm we adopt for core stands a significant risk of being later abandoned by everyone else and becoming a PostgreSQL wart. 3) In general, users who want UUIDs don't want a generic concept of UUIDs; they want the specific UUIDs which work with their individual programming languages, web frameworks, or queueing platforms. So, see competing implementations above. As case in point for (2), as I said upthread: uuid-ossp, which has been our option for UUID in contrib since originally it was the only OSS implementation, is now abandoned by everyone but us. Additionally, were I to adopt a UUID scheme for PostgreSQL, I would want to to be *for postgresql*, with components indicating server, table and schema of origin for each ID. A pseudo-random UUID is frankly pretty useless to me because (a) it's not really unique, and (b) it doesn't help me route data at all. Alternatively, what would be *really* useful is to have a way for an extension to plug into the "serial" concept, so that it gets all of the benefits of serial (permissions, dependancies, etc.) while being able to call a custom generator function. Oh, and: 4) IIRC, Andres has already worked out a scheme for distributed serials to support BDR. So this is a solved problem for the only really interesting use case ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Oh, I think I see your point: it's that no matter what we do here, there > would be no way to guarantee that a value we generate does not collide > with any other value elsewhere (either on other uuidserial columns, or > on other servers). Not that way, anyway. > Because if it is, then I think the problem is that the UUID concept > might be flawed yet users still want to use it, and we do no service by > refusing to provide it on those grounds. My point is a bit more subtle than that. It's perfectly reasonable to want a universally-unique identifier in some applications. The problem is that there is no perfect implementation, and so people really need to stop and think about which generator algorithm they're going to use and whether its particular failure modes are acceptable in their context. So I'm not eager to provide some "easy to use" API that encourages users to believe that no thought is required on their part. Having said that, though, the argument around whether such facilities belong in core seems to devolve to whether you want to buy into maintaining libuuid for ourselves (in fact worse than that: it's not clear that libuuid ever has worked on all our platforms, so we might be buying into some up-front porting work too). As Josh notes, we may be forced into that anyway if we don't want to walk away from contrib/uuid-ossp. But first I'd like to see a bit more research into what the alternatives might be... regards, tom lane
Last year, I built a pl/pgsql generator of "version 1-ish" UUIDs, which would combine timestamps with local information to construct data that kind of emulated the timestamp+MAC address that is version #1 of UUID.
Note that there are several versions of UUIDs:a) Having a sequence feeding some local uniqueness would fit with the "clock seq" bits (e.g. - the octets in RFC 4122 entitled clock-seq-and-reserved and clock-seq-low)
b) NOW() provides data for time-low, time-mid, time-high-and-version
c) We'd need 6 hex octets for "node"; I seem to recall there being something established by initdb that might be usable.
The only piece that's directly troublesome, for UUID Type 1, is the "node" value. I'll observe that it isn't unusual for UUID implementations to generate random values for that.
Note that for the other UUID versions, there's NO non-portable data needed.
It seems to me that a "UUIDserial" type, which combined:
a) A sequence, to be the 'clock';
b) Possibly another sequence to store local node ID, which might get seeded from DB internals
would provide a "PostgreSQL-flavoured" version of UUID Type 1.
would provide a "PostgreSQL-flavoured" version of UUID Type 1.
On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote: > A pseudo-random UUID is frankly pretty > useless to me because (a) it's not really unique This is FUD. A pseudorandom UUID contains 122 bits of randomness. As long as you can trust the random number generator, the chances of a value occurring twice can be estimated using the birthday paradox: there's a 50% chance of having *one* collision in a set of 2^61 items. Storing this amount of UUIDs alone requires 32 exabytes of storage. Factor in the tuple and indexing overheads and you'd be needing close to all the hard disk space ever manufactured in the world. If you believe there's a chance of ever seeing a pseudorandom UUID collision in practice, you should be buying lottery tickets. To the contrary. Combined with the fact that pseudorandom UUID generation doesn't require any configuration (node ID), doesn't leak any private data (MAC address) and relies on infrastructure that's ubiquitous anyway (cryptographic PRNG) it's almost always the right answer. > (b) it doesn't help me route data at all. That's really out of scope for UUIDs. They're about generating identifiers, not describing what the identifier means. UUIDs also don't happen to cure cancer. Regards, Marti
Marti Raudsepp <marti@juffo.org> writes: > On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote: >> A pseudo-random UUID is frankly pretty >> useless to me because (a) it's not really unique > This is FUD. A pseudorandom UUID contains 122 bits of randomness. As > long as you can trust the random number generator, the chances of a > value occurring twice can be estimated using the birthday paradox: > there's a 50% chance of having *one* collision in a set of 2^61 items. Of course, the weak spot in this analysis is the assumption that there are actually 122 independent bits in the value. It's not difficult to imagine that systems with crummy random() implementations might only have something like 32 bits worth of real randomness. Or less. Seeding your PRNG from gettimeofday(), for instance, is highly likely to lead to collisions ... no matter how good the PRNG itself is. > If you believe there's a chance of ever seeing a pseudorandom UUID > collision in practice, you should be buying lottery tickets. Now *that*, I'd call FUD. The issue here is not whether collisions are improbable under ideal circumstances. The issue is how much work does it take to have some confidence that you're anywhere near the ideal case. regards, tom lane
On Fri, Apr 25, 2014 at 3:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Of course, the weak spot in this analysis is the assumption that there > are actually 122 independent bits in the value. It's not difficult to > imagine that systems with crummy random() implementations might only have > something like 32 bits worth of real randomness. Obviously you can't use random(). That's why I talked about cryptographic PRNGs, crypto libraries do proper seeding and generate reliably random numbers all the time. Regards, Marti
On Thu, 2014-04-24 at 13:45 -0400, Tom Lane wrote: > Having said that, though, the argument around whether such facilities > belong in core seems to devolve to whether you want to buy into > maintaining libuuid for ourselves (in fact worse than that: it's not > clear > that libuuid ever has worked on all our platforms, so we might be > buying > into some up-front porting work too). As Josh notes, we may be forced > into that anyway if we don't want to walk away from contrib/uuid-ossp. I think we arrived at the conclusion a short while go that we wanted to stick a deprecation notice on contrib/uuid-ossp for the 9.4 release. I wrote an extension that wraps libuuid: https://github.com/petere/pglibuuid Between pgcrypto and that, is there anything missing that users of uuid-ossp would have had?
On Thu, Apr 24, 2014 at 8:43 PM, Marti Raudsepp <marti@juffo.org> wrote:
Cryptography is *hard*; treating it as certainty that things will be gotten correct
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
On Fri, Apr 25, 2014 at 3:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Obviously you can't use random(). That's why I talked about
> Of course, the weak spot in this analysis is the assumption that there
> are actually 122 independent bits in the value. It's not difficult to
> imagine that systems with crummy random() implementations might only have
> something like 32 bits worth of real randomness.
cryptographic PRNGs, crypto libraries do proper seeding and generate
reliably random numbers all the time.
... And we can't be certain that there won't be some internal characteristic weakness.
Cryptography is *hard*; treating it as certainty that things will be gotten correct
tends to be a foolish assumption.
Which is why UUID "type 1" resolves this by combining multiple sorts of
anti-correlations, the combination of:
a) Node-specific information (e.g. - in the standard form, parts of the MAC address), so no collisions between node A and node B.
b) Timestamp, so that things that happen at different times will be kept unique.
c) An extra sequence, so that if there are multiple events on the same node at the same time, they *still* don't collide.
I trust the combination to work pretty well, and that's why it was designed that way.
A RNG, however good, can't provide the same guarantees of lack of conflicts.
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
On Fri, Apr 25, 2014 at 1:43 AM, Marti Raudsepp <marti@juffo.org> wrote: > Obviously you can't use random(). That's why I talked about > cryptographic PRNGs, crypto libraries do proper seeding and generate > reliably random numbers all the time. The difficulty lies not really in the PRNG implementation (which is hard but well enough understood that it's not much of an issue these days). The difficulty lies in obtaining enough entropy. There are ways of obtaining enough entropy and they are available. But they're not free. Obtaining enough entropy requires access to hardware devices which means a kernel system call. Kernel system calls are relatively slow when you're talking about generating sequential IDs. They also deplete the available entropy pool for other sources which may means they have security consequences. Which isn't to say they're a bad idea but like everything else in engineering there are tradeoffs and no such thing as a free lunch. You can avoid depleting the entropy pool by including data you expect to be unique as a kind of fake entropy -- which quickly gets you back to looking for things like MAC address to avoid duplicates across systems. -- greg
On 04/24/2014 05:23 PM, Marti Raudsepp wrote: > On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote: >> A pseudo-random UUID is frankly pretty >> useless to me because (a) it's not really unique > > This is FUD. A pseudorandom UUID contains 122 bits of randomness. As > long as you can trust the random number generator, the chances of a > value occurring twice can be estimated using the birthday paradox: > there's a 50% chance of having *one* collision in a set of 2^61 items. > Storing this amount of UUIDs alone requires 32 exabytes of storage. > Factor in the tuple and indexing overheads and you'd be needing close > to all the hard disk space ever manufactured in the world. Well, I've already had collisions with UUID-OSSP, in production, with only around 20 billion values. So clearly there aren't 122bits of true randomness in OSSP. I can't speak for other implementations because I haven't tried them. >> (b) it doesn't help me route data at all. > > That's really out of scope for UUIDs. They're about generating > identifiers, not describing what the identifier means. UUIDs also > don't happen to cure cancer. http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 On the contrary, I would argue that an object identifier which is completely random is possibly the worst way to form an ID of all possible concepts; there's no relationship whatsoever between the ID, the application stack, and the application data; you don't even get the pseudo-time indexing you get with Serials. The only reason to do it is because you're too lazy do implement a better way. Or to put it another way: a value which is truly random is no identifier at all. Compare this with a composite identifier which carries information about the node, table, and schema of origin for the tuple. Not only does this help ensure uniqueness, but it also supports intelligent sharding and multi-master replication systems. I don't speak hypothetically; we've done this in the past and will do it again in the future. I would love to have some machinery inside PostgreSQL to make this easier (for example, a useful unique database ID), but I suspect that acutal implementation will always remain application-specific. You may say "oh, that's not the job of the identifer", but if it's not, WTF is the identifer for, then? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, Apr 25, 2014 at 10:58:29AM -0700, Josh Berkus wrote: > On 04/24/2014 05:23 PM, Marti Raudsepp wrote: > > On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> A pseudo-random UUID is frankly pretty > >> useless to me because (a) it's not really unique > > > > This is FUD. A pseudorandom UUID contains 122 bits of randomness. As > > long as you can trust the random number generator, the chances of a > > value occurring twice can be estimated using the birthday paradox: > > there's a 50% chance of having *one* collision in a set of 2^61 items. > > Storing this amount of UUIDs alone requires 32 exabytes of storage. > > Factor in the tuple and indexing overheads and you'd be needing close > > to all the hard disk space ever manufactured in the world. > > Well, I've already had collisions with UUID-OSSP, in production, with > only around 20 billion values. So clearly there aren't 122bits of true > randomness in OSSP. I can't speak for other implementations because I > haven't tried them. > > >> (b) it doesn't help me route data at all. > > > > That's really out of scope for UUIDs. They're about generating > > identifiers, not describing what the identifier means. UUIDs also > > don't happen to cure cancer. > > http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 > > On the contrary, I would argue that an object identifier which is > completely random is possibly the worst way to form an ID of all > possible concepts; there's no relationship whatsoever between the ID, > the application stack, and the application data; you don't even get the > pseudo-time indexing you get with Serials. The only reason to do it is > because you're too lazy do implement a better way. > > Or to put it another way: a value which is truly random is no identifier > at all. Not exactly. It's at least potentially hiding information an attacker could use, with all the caveats that carries. > Compare this with a composite identifier which carries information about > the node, table, and schema of origin for the tuple. Not only does this > help ensure uniqueness, but it also supports intelligent sharding and > multi-master replication systems. I don't speak hypothetically; we've > done this in the past and will do it again in the future. This is an excellent idea, but I don't think it's in scope for UUIDs. > I would love to have some machinery inside PostgreSQL to make this > easier (for example, a useful unique database ID), but I suspect that > acutal implementation will always remain application-specific. > > You may say "oh, that's not the job of the identifer", but if it's not, > WTF is the identifer for, then? Frequently, it's to provide some kind of opacity in the sense of not have an obvious predecessor or successor. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table
From
Martijn van Oosterhout
Date:
On Fri, Apr 25, 2014 at 04:18:18PM +0100, Greg Stark wrote: > Which isn't to say they're a bad idea but like everything else in > engineering there are tradeoffs and no such thing as a free lunch. > You can avoid depleting the entropy pool by including data you expect > to be unique as a kind of fake entropy -- which quickly gets you back > to looking for things like MAC address to avoid duplicates across > systems. ISTM you could use the database identifier we already have to at least produce "UUID"s which are unique amongst PostgreSQL instances. That might be something worth aiming for? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
Martijn van Oosterhout <kleptog@svana.org> writes: > On Fri, Apr 25, 2014 at 04:18:18PM +0100, Greg Stark wrote: >> Which isn't to say they're a bad idea but like everything else in >> engineering there are tradeoffs and no such thing as a free lunch. >> You can avoid depleting the entropy pool by including data you expect >> to be unique as a kind of fake entropy -- which quickly gets you back >> to looking for things like MAC address to avoid duplicates across >> systems. > ISTM you could use the database identifier we already have to at least > produce "UUID"s which are unique amongst PostgreSQL instances. That > might be something worth aiming for? It's worth noting in this connection that we've never tried hard to ensure that "database identifiers" are actually unique. One potentially serious issue is that slave servers will have the same identifier as their master. Also, I think there's a still-open issue that creation of the identifier has a thinko about using OR instead of XOR, resulting in way few bits of freedom than it should have even with the limited amount of entropy used. regards, tom lane
On 04/26/2014 11:18 AM, Tom Lane wrote: > It's worth noting in this connection that we've never tried hard to ensure > that "database identifiers" are actually unique. One potentially serious > issue is that slave servers will have the same identifier as their master. Yeah, this is one of those things I've been thinking about. The proble is that we need a "node" ID, which identifies the PostgreSQL instance, and a "dataset ID", which identifies the chain of data, especially when combined with the timeline ID. So a master and replica would have different node IDs, but the same dataset ID, until the replica is promoted, at which point its dataset ID + timeline No. would change. This would allow for relatively easy management of large clusters by allowing automated identification of databases and their mirrors. However, there's a fundamental problem with the concept of the dataset ID in that there's absolutely no way for PostgreSQL to know when it has a unique dataset. Consider a downtime database file cloning for example; the two databases would have the same identifier and yet both be standalones which quickly diverge. So I haven't thought of a good solution to that. We could implement a NodeID, though, based on some combination of IP/MAC address and port though. Not entirely reliable, but better than nothing ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 04/25/2014 11:46 AM, David Fetter wrote: > On Fri, Apr 25, 2014 at 10:58:29AM -0700, Josh Berkus wrote: >> You may say "oh, that's not the job of the identifer", but if it's not, >> WTF is the identifer for, then? > > Frequently, it's to provide some kind of opacity in the sense of not > have an obvious predecessor or successor. A far better solution to that is to not share the unadorned ID with the user. Basically, there's two different reasons to offer UUIDs in PostgreSQL: 1) because they actually serve a useful purpose in providing a globally "unique" identifier; 2) because they work well with existing platforms and frameworks. Given the state of the art, the above two goals are separate and exclusive, apologists for poorly conceived UUID algorithms nonwithstanding. So either we provide a UUID type which actually helps identify unique entities between database servers, OR we supply a UUID which "just works" with popular web frameworks, or we supply both *as two or more different types*. But claiming that types chosen because they're popular are also technically sound is misleading at best. Further, based on our experience with OSSP, if we're going to make a UUId type in core because it's currently popular, we'd better be pretty sure that it's still going to be popular in 5 or 10 years from now. Otherwise we're better off keeping it an extension. I personally am interested in a UUID type which would support doing multi-master replication of JSON databases built on PostgreSQL, and will probably write one if nobody else does first, and I don't see existing, naive randomization-based UUIDS as ever filling that role adequately. Although, as I said, Andres' work in this area may have already taken care of this. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 4/25/14, 12:58 PM, Josh Berkus wrote: > Well, I've already had collisions with UUID-OSSP, in production, with > only around 20 billion values. So clearly there aren't 122bits of true > randomness in OSSP. I can't speak for other implementations because I > haven't tried them. Or perhaps you should be buying lottery tickets? ;) Can you write this up in a blog post? I've argued with people more than once about why it's a bad idea to trust on "1 ina bazillion" odds to protect your data (though, usually in the context of SHA1), and it'd be good to be able to point ata real world example of this failing. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Sat, Apr 26, 2014 at 8:58 PM, Josh Berkus <josh@agliodbs.com> wrote: > However, there's a fundamental problem with the concept of the dataset > ID in that there's absolutely no way for PostgreSQL to know when it has > a unique dataset. Consider a downtime database file cloning for > example; the two databases would have the same identifier and yet both > be standalones which quickly diverge. So I haven't thought of a good > solution to that. If you're content to use random numbers then you could generate one from system entropy on every startup. If you generated a new timeline for every startup then the pair of system id and random startup id (which would be the new timelineid) would let you look at any two instances and determine if they're related and where they diverged even if it was from a database clone. I don't think MAC address or other hardware identifiers really saves you from using system entropy anyways. You might very well install a clone on the same machine and in an environment like Heroku you could very easily end up restoring a database onto the same VM twice entirely by accident. I actually think using /dev/urandom is a better idea than depending on things like MAC address almost always. -- greg
On Fri, Apr 25, 2014 at 8:58 PM, Josh Berkus <josh@agliodbs.com> wrote: > Well, I've already had collisions with UUID-OSSP, in production, with > only around 20 billion values. So clearly there aren't 122bits of true > randomness in OSSP. I can't speak for other implementations because I > haven't tried them. Interesting. The statistical chances of this happening should be approximately 4e-17. Are you certain that this was due to uuid-ossp and not an application bug? Can you say what kind of operating system and environment that was? I skimmed the sources of uuid-ossp 1.6.2 and it seems to be doing the right thing, using /dev/urandom or /dev/random on Unixes and CryptGenRandom on Windows. Barring any bugs, of course. However, if these fail for whatever reason (e.g. out of file descriptors), it falls back to libc random(), which is clearly broken. On Fri, Apr 25, 2014 at 6:18 PM, Greg Stark <stark@mit.edu> wrote: > The difficulty lies not really in the PRNG implementation (which is > hard but well enough understood that it's not much of an issue these > days). The difficulty lies in obtaining enough entropy. There are ways > of obtaining enough entropy and they are available. > Obtaining enough entropy requires access to hardware devices which > means a kernel system call. This is a solved problem in most environments, too. The kernel collects entropy from unpredictable events and then seeds a global CSPRNG with that. This collection happens always regardless of whether you request random numbers or not, so essentially comes for "free". Applications can then request output from this CSPRNG. Reason being, this infrastructure is necessary for more critical tasks than generating UUIDs: pretty much all of cryptography requires random numbers. > They also deplete > the available entropy pool for other sources which may means they have > security consequences. This only affects the Linux /dev/random, which is discouraged these days for that reason. Applications should use urandom instead. To my knowledge, there are no other operating systems that have this "depletion" behavior. Regards, Marti