Thread: 9.4 Proposal: Initdb creates a single table

9.4 Proposal: Initdb creates a single table

From

Simon Riggs

Date:

23 April 2014, 06:11:47

We start with a database called Postgres and a schema called Public.
Yet we don't start up with any usable tables.

I propose we add a single table called Postgres when we Initdb
 CREATE TABLE Postgres (Id Integer, Data Jsonb); COMMENT ON TABLE Postgres IS 'Single table for quick start usage -
design your database';

The purpose of this is to make the database immediately usable. By
including this table in the default  initdb it will mean that programs
can rely on the existence of this table and begin working quicker.

By now, some of you will be doubled over laughing as if this is an
April fool joke.  I don't mean it to be at all.

The idea is to have a stupidly obvious and easy table that will
potentially be usable by just about everyone, in any language.

If you don't like it, don't use it. If you really dislike it, drop it.

But for new people coming to Postgres, they will have a data object to
access and begin using the database immediately. Their code will work,
their examples will work. OK, so they need to go back and think about
the design, but at least they got it to work and will be encouraged to
do more.

Remember when we didn't have a database called Postgres? Remember how
much simpler life is now? Remember that now.

We can also insert a single row, Id = 0 with "Postgres sample data" in
it, but that seems a step too far.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.4 Proposal: Initdb creates a single table

From

Tom Lane

Date:

23 April 2014, 06:14:43

Simon Riggs <simon@2ndQuadrant.com> writes:
> By now, some of you will be doubled over laughing as if this is an
> April fool joke.

Indeed.
        regards, tom lane

Re: 9.4 Proposal: Initdb creates a single table

From

Craig Ringer

Date:

23 April 2014, 06:45:38

On 04/23/2014 02:11 PM, Simon Riggs wrote:
> I propose we add a single table called Postgres when we Initdb
> 
>   CREATE TABLE Postgres (Id Integer, Data Jsonb);

Without particular comment on the need for the table, I'd be concerned
about calling it "postgres".

My personal impression from Stack Overflow etc has been that users are
readily confused by the fact that we have:

- Database engine/system "postgres"
- backend binary "postgres" (they see it in ps)
- unix user "postgres"
- Pg superuser "postgres"
- database "postgres"

Sure, there's an argument for running with the theme here, but I suspect
using the name "postgres" for a default table will just muddy the waters
a bit more.

Even "postgres_table" would help.

It *absolutely must* be lower case, whatever it is, IMO. If you're going
for newest-of-the-newbies, the last thing you want to do is having them
dealing with it being just Postgres in some places, and having to be
"Postgres" in others.

Personally, don't know if I'm convinced it's overly worth doing - but I
think it's silly to dismiss without actually corralling up some users
who're unfamiliar with Pg and watching them get started. I'd love to see
some properly conducted usability studies of Pg, and something like this
would fit in well.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.4 Proposal: Initdb creates a single table

From

Pavel Stehule

Date:

23 April 2014, 07:14:28

Hello

if you are thinking about this direction, then store there some demo project.

I am don't think so isolated table has significant price.

Regards

Pavel

2014-04-23 8:45 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 04/23/2014 02:11 PM, Simon Riggs wrote:
> I propose we add a single table called Postgres when we Initdb
>
> CREATE TABLE Postgres (Id Integer, Data Jsonb);

Without particular comment on the need for the table, I'd be concerned
about calling it "postgres".

My personal impression from Stack Overflow etc has been that users are
readily confused by the fact that we have:

- Database engine/system "postgres"
- backend binary "postgres" (they see it in ps)
- unix user "postgres"
- Pg superuser "postgres"
- database "postgres"

Sure, there's an argument for running with the theme here, but I suspect
using the name "postgres" for a default table will just muddy the waters
a bit more.

Even "postgres_table" would help.

It *absolutely must* be lower case, whatever it is, IMO. If you're going
for newest-of-the-newbies, the last thing you want to do is having them
dealing with it being just Postgres in some places, and having to be
"Postgres" in others.

Personally, don't know if I'm convinced it's overly worth doing - but I
think it's silly to dismiss without actually corralling up some users
who're unfamiliar with Pg and watching them get started. I'd love to see
some properly conducted usability studies of Pg, and something like this
would fit in well.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: 9.4 Proposal: Initdb creates a single table

From

Albe Laurenz

Date:

23 April 2014, 07:21:00

Simon Riggs wrote:
> I propose we add a single table called Postgres when we Initdb
> 
>   CREATE TABLE Postgres (Id Integer, Data Jsonb);
>   COMMENT ON TABLE Postgres IS 'Single table for quick start usage -
> design your database';
> 
> The purpose of this is to make the database immediately usable. By
> including this table in the default  initdb it will mean that programs
> can rely on the existence of this table and begin working quicker.
> 
> By now, some of you will be doubled over laughing as if this is an
> April fool joke.  I don't mean it to be at all.
> 
> The idea is to have a stupidly obvious and easy table that will
> potentially be usable by just about everyone, in any language.

I am a PostgreSQL newbie.
How is this table useful for me?
I want to develop a database application.
I want to store personal data like name and birth date!

Actually, I feel confused.  What should I do with this table?
Is it part of the database system?  Will the database be broken if
I drop it?  Do I have to ship it with my application?

> If you don't like it, don't use it. If you really dislike it, drop it.

No, I'm not the kind of person who reads a manual to figure out what to
do with this table.  I want to start coding *right now*.

> But for new people coming to Postgres, they will have a data object to
> access and begin using the database immediately. Their code will work,
> their examples will work. OK, so they need to go back and think about
> the design, but at least they got it to work and will be encouraged to
> do more.

I have found a sample application for personal data on the internet.
How can I make it work with this table?

> Remember when we didn't have a database called Postgres? Remember how
> much simpler life is now? Remember that now.

Good that you mention that!  I have wondered what to do with it.
When I first connected to PostgreSQL, I created a sample table, but the
senior developer from the other office told me that this is the "postgres"
database and that I shouldn't create any objects there.

What is it good for?  Can I delete it?

Yours,
Laurenz Albe

Re: 9.4 Proposal: Initdb creates a single table

From

Craig Ringer

Date:

23 April 2014, 07:39:36

On 04/23/2014 03:20 PM, Albe Laurenz wrote:
> Good that you mention that!  I have wondered what to do with it.
> When I first connected to PostgreSQL, I created a sample table, but the
> senior developer from the other office told me that this is the "postgres"
> database and that I shouldn't create any objects there.
> 
> What is it good for?  Can I delete it?

A key difference between the "postgres" DB and a default table is that
the "postgres" DB is very convenient with PostgreSQL's default of
connecting to a DB of the same name as the user.

We don't have a corresponding
   INSERT VALUES (42);
   SELECT fred;

where there is some invisible implicit table name.

Personally I wish Pg was able to start w/o "connecting" to any specific
DB, but that's just not how the architecture works, and with that
limitation the "postgres" DB seemed like a good compromise. (That said,
it's really weird that the username of the superuser defaults to the
current unix user when initdb'ing, but the db created by default is
still always "postgres").

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.4 Proposal: Initdb creates a single table

From

Simon Riggs

Date:

23 April 2014, 07:51:01

On 23 April 2014 07:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> By now, some of you will be doubled over laughing as if this is an
>> April fool joke.
>
> Indeed.

I do like to share the odd joke now and then, it has to be said. So
I'm glad I enriched your day.

I was taught that ideas are accepted in this order: first we think
them a joke, then we perceive them as a threat, then they become
obvious. I can't find a Wikipedia article to give that idea more
weight (OK, that was a joke).

Just trying to think about how to improve the out of the box
experience in ways that others already consider obvious.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.4 Proposal: Initdb creates a single table

From

Albe Laurenz

Date:

23 April 2014, 07:58:38

Craig Ringer wrote:
>> Good that you mention that!  I have wondered what to do with it.
>> When I first connected to PostgreSQL, I created a sample table, but the
>> senior developer from the other office told me that this is the "postgres"
>> database and that I shouldn't create any objects there.
>>
>> What is it good for?  Can I delete it?
> 
> A key difference between the "postgres" DB and a default table is that
> the "postgres" DB is very convenient with PostgreSQL's default of
> connecting to a DB of the same name as the user.

I did not seriously want to dispute the value of the "postgres" DB,
I just think that making things easier to understand for the newbie
is *not* its greatest merit.

It is mostly for the convenience of the administrator, right?

Yours,
Laurenz Albe

Re: 9.4 Proposal: Initdb creates a single table

From

David Rowley

Date:

23 April 2014, 08:26:46

On Wed, Apr 23, 2014 at 6:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

I propose we add a single table called Postgres when we Initdb

CREATE TABLE Postgres (Id Integer, Data Jsonb);
COMMENT ON TABLE Postgres IS 'Single table for quick start usage -
design your database';

The purpose of this is to make the database immediately usable. By
including this table in the default initdb it will mean that programs
can rely on the existence of this table and begin working quicker.

I'm not quite sure it would serve the same purpose as to what you're proposing here, but for a long time I've thought that it would be nice if PostgreSQL came with an example database that had a number of tables, perhaps that mock up some easy to relate to real-world application. These would be very useful to use as examples in the documents instead of inventing them in the ad-hoc way that we currently do. Like here: http://www.postgresql.org/docs/9.3/static/tutorial-window.html

In the above link we have some table called empsalary, but the new user can't go an execute that query to test it without first working out how to first create a table and insert some data into that table.

It would be really nice if new users could create this example database somehow and then they could play around with the example queries we put in the manual.

Regards

David Rowley

Re: 9.4 Proposal: Initdb creates a single table

From

Hannu Krosing

Date:

23 April 2014, 09:37:20

On 04/23/2014 08:11 AM, Simon Riggs wrote:
> We start with a database called Postgres and a schema called Public.
> Yet we don't start up with any usable tables.
>
> I propose we add a single table called Postgres when we Initdb
>
>   CREATE TABLE Postgres (Id Integer, Data Jsonb);
>   COMMENT ON TABLE Postgres IS 'Single table for quick start usage -
> design your database';
>
> The purpose of this is to make the database immediately usable.
With default access controls it still aint :(

And most of complaints I have heard are about the need of
fidgeting with pg_hba.conf to get *anything* started.

So maybe we could start with something like this at the end of initdb:

> createdb example

> psql example

sample# create table stuff(id serial primary key, data jsonb);
sample# grant all on stuff to public;

And also have the following lines in pg_hba.conf for it

local   all             all                                         trust
host    all             all             127.0.0.1/32        trust
host    all             all             ::1/128                 trust


This would solve most of the frustration with starting pg development
for newcomers

Cheers
Hannu

>  By
> including this table in the default  initdb it will mean that programs
> can rely on the existence of this table and begin working quicker.
>
> By now, some of you will be doubled over laughing as if this is an
> April fool joke.  I don't mean it to be at all.
>
> The idea is to have a stupidly obvious and easy table that will
> potentially be usable by just about everyone, in any language.
>
> If you don't like it, don't use it. If you really dislike it, drop it.
>
> But for new people coming to Postgres, they will have a data object to
> access and begin using the database immediately. Their code will work,
> their examples will work. OK, so they need to go back and think about
> the design, but at least they got it to work and will be encouraged to
> do more.
>
> Remember when we didn't have a database called Postgres? Remember how
> much simpler life is now? Remember that now.
>
> We can also insert a single row, Id = 0 with "Postgres sample data" in
> it, but that seems a step too far.
>


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: 9.4 Proposal: Initdb creates a single table

From

Simon Riggs

Date:

23 April 2014, 11:39:51

On 23 April 2014 09:26, David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, Apr 23, 2014 at 6:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>> I propose we add a single table called Postgres when we Initdb
>>
>>   CREATE TABLE Postgres (Id Integer, Data Jsonb);
>>   COMMENT ON TABLE Postgres IS 'Single table for quick start usage -
>> design your database';
>>
>> The purpose of this is to make the database immediately usable. By
>> including this table in the default  initdb it will mean that programs
>> can rely on the existence of this table and begin working quicker.
>>
>
> I'm not quite sure it would serve the same purpose as to what you're
> proposing here, but for a long time I've thought that it would be nice if
> PostgreSQL came with an example database that had a number of tables,
> perhaps that mock up some easy to relate to real-world application. These
> would be very useful to use as examples in the documents instead of
> inventing them in the ad-hoc way that we currently do. Like here:
> http://www.postgresql.org/docs/9.3/static/tutorial-window.html

+1 to the idea of an example database, used throughout the docs

Sounds like a summer of code project.

Since we don't have that now, it won't work for 9.4.

I still like the idea of a database installed by default on initdb, by
default. Packagers can of course do what they like.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.4 Proposal: Initdb creates a single table

From

Mark Kirkwood

Date:

23 April 2014, 11:46:47

This seems like a much better idea - whereas a single table, related to 
nothing - on the other hand, is at best not very helpful (and it could 
be argued, might contribute to teaching poor data data design).

Regards

Mark

On 23/04/14 19:13, Pavel Stehule wrote:
> Hello
>
> if you are thinking about this direction, then store there some demo
> project.
>
> I am don't think so isolated table has significant price.
>

Re: 9.4 Proposal: Initdb creates a single table

From

Stephen Frost

Date:

23 April 2014, 12:25:22

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> +1 to the idea of an example database, used throughout the docs
> Sounds like a summer of code project.

Agreed.  I'll add it to the GSoC ideas page.

> Since we don't have that now, it won't work for 9.4.

None of this is on the table for 9.4 as far as I'm concerned..

> I still like the idea of a database installed by default on initdb, by
> default. Packagers can of course do what they like.

I fail to see the point of adding something that's targetted at novice /
end-users which 90% (yes, it's a random # that I pulled, but it's surely
the majority, at least) of installs won't have.

For my 2c, it'd also be a disservice to our users and to ourselves to
encourage a design that minimizes the database's understanding of the
data and greatly reduces the set of PG's capabilities that can be used.
Thanks,
    Stephen

Re: 9.4 Proposal: Initdb creates a single table

From

Heikki Linnakangas

Date:

23 April 2014, 12:46:43

On 04/23/2014 03:28 PM, Stephen Frost wrote:
> * Simon Riggs (simon@2ndQuadrant.com) wrote:
>> >+1 to the idea of an example database, used throughout the docs
>> >Sounds like a summer of code project.
> Agreed.  I'll add it to the GSoC ideas page.

I don't think this is a good GSoC project. Documentation-only projects 
are not eligible for GSoC 

(https://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#12._Are_proposals_for_documentation_work).

Perhaps you can argue that a sample database is not documentation, but 
it's certainly not in the spirit of the program.

- Heikki

Re: 9.4 Proposal: Initdb creates a single table

From

Thom Brown

Date:

23 April 2014, 12:57:10

On 23 April 2014 13:46, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 04/23/2014 03:28 PM, Stephen Frost wrote:
>>
>> * Simon Riggs (simon@2ndQuadrant.com) wrote:
>>>
>>> >+1 to the idea of an example database, used throughout the docs
>>> >Sounds like a summer of code project.
>>
>> Agreed.  I'll add it to the GSoC ideas page.
>
>
> I don't think this is a good GSoC project. Documentation-only projects are
> not eligible for GSoC
>
(https://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#12._Are_proposals_for_documentation_work).
> Perhaps you can argue that a sample database is not documentation, but it's
> certainly not in the spirit of the program.

Out of curiosity, had anyone seen this which some of us were briefly
working on last year at PgCon's unconference?

https://github.com/pvh/postgresql-sample-database

-- 
Thom

Re: 9.4 Proposal: Initdb creates a single table

From

Stephen Frost

Date:

23 April 2014, 12:59:02

Heikki,

* Heikki Linnakangas (hlinnakangas@vmware.com) wrote:
> On 04/23/2014 03:28 PM, Stephen Frost wrote:
> >Agreed.  I'll add it to the GSoC ideas page.
>
> I don't think this is a good GSoC project. Documentation-only
> projects are not eligible for GSoC
(https://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2014/help_page#12._Are_proposals_for_documentation_work).

Ah, right, of course.  Sorry, I've gotten to thinking of the GSoC page
as being a "TODO" but for small projects. :)

> Perhaps you can argue that a sample database is not documentation,
> but it's certainly not in the spirit of the program.

Nah, that wasn't my intent at all.
Thanks,
    Stephen

Re: 9.4 Proposal: Initdb creates a single table

From

Alexander Korotkov

Date:

23 April 2014, 14:43:27

On Wed, Apr 23, 2014 at 10:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

We start with a database called Postgres and a schema called Public.
Yet we don't start up with any usable tables.

I propose we add a single table called Postgres when we Initdb

CREATE TABLE Postgres (Id Integer, Data Jsonb);
COMMENT ON TABLE Postgres IS 'Single table for quick start usage -
design your database';

The purpose of this is to make the database immediately usable. By
including this table in the default initdb it will mean that programs
can rely on the existence of this table and begin working quicker.

By now, some of you will be doubled over laughing as if this is an
April fool joke. I don't mean it to be at all.

I can propose contrib PostgreNoSQL providing following:

1) Table postgres as you proposed.

2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id integer, data jsonb) returns void, search_postgres(query jsonb) returns setof postgres. search_postgres will have semantics of @> jsonb operator

3) Background workers which provides HTTP wrapper over those functions.

------
With best regards,
Alexander Korotkov.

Re: 9.4 Proposal: Initdb creates a single table

From

Fabrízio de Royes Mello

Date:

23 April 2014, 15:37:52

<div dir="ltr"><div class="gmail_extra"><br />On Wed, Apr 23, 2014 at 4:50 AM, Simon Riggs <<a
href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br />><br />> On 23 April 2014 07:14, Tom
Lane<<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /> > > Simon Riggs
<simon@2ndQuadrant.com>writes:<br />> >> By now, some of you will be doubled over laughing as if this is
an<br/>> >> April fool joke.<br />> ><br />> > Indeed.<br />><br /> > I do like to share the
oddjoke now and then, it has to be said. So<br />> I'm glad I enriched your day.<br />><br />> I was taught
thatideas are accepted in this order: first we think<br />> them a joke, then we perceive them as a threat, then
theybecome<br /> > obvious. I can't find a Wikipedia article to give that idea more<br />> weight (OK, that was a
joke).<br/>><br />> Just trying to think about how to improve the out of the box<br />> experience in ways
thatothers already consider obvious.<br /> ><br /><br />Mahatma Gandhi said:<br /><br />"First they ignore you, then
theylaugh at you, then they fight you, then you win."<br /><br />;-)<br /><br /></div><div class="gmail_extra">More
infoin [1] [2].</div><div class="gmail_extra"><br /><br />Regards,<br /><br />[1] <a
href="http://en.wikiquote.org/wiki/Mahatma_Gandhi">http://en.wikiquote.org/wiki/Mahatma_Gandhi</a><br/>[2] <a
href="http://www.brainyquote.com/quotes/quotes/m/mahatmagan103630.html">http://www.brainyquote.com/quotes/quotes/m/mahatmagan103630.html</a><br
/><br/>--<br />Fabrízio de Royes Mello<br />Consultoria/Coaching PostgreSQL<br />>> Timbira: <a
href="http://www.timbira.com.br">http://www.timbira.com.br</a><br/>>> Blog sobre TI: <a
href="http://fabriziomello.blogspot.com">http://fabriziomello.blogspot.com</a><br/> >> Perfil Linkedin: <a
href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br/>>> Twitter: <a
href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a></div></div>

Re: 9.4 Proposal: Initdb creates a single table

From

Josh Berkus

Date:

23 April 2014, 17:23:25

On 04/23/2014 07:43 AM, Alexander Korotkov wrote:
> I can propose contrib PostgreNoSQL providing following:
> 1) Table postgres as you proposed.
> 2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id
> integer, data jsonb) returns void, search_postgres(query jsonb) returns
> setof postgres. search_postgres will have semantics of @> jsonb operator
> 3) Background workers which provides HTTP wrapper over those functions.

You're forgetting ... sharding/replication over multiple masters.

Also, the id should be a text value so that folks can use hash keys, or
whatever.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: 9.4 Proposal: Initdb creates a single table

From

Alvaro Herrera

Date:

23 April 2014, 17:27:02

Josh Berkus wrote:
> On 04/23/2014 07:43 AM, Alexander Korotkov wrote:
> > I can propose contrib PostgreNoSQL providing following:
> > 1) Table postgres as you proposed.
> > 2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id
> > integer, data jsonb) returns void, search_postgres(query jsonb) returns
> > setof postgres. search_postgres will have semantics of @> jsonb operator
> > 3) Background workers which provides HTTP wrapper over those functions.
> 
> You're forgetting ... sharding/replication over multiple masters.
> 
> Also, the id should be a text value so that folks can use hash keys, or
> whatever.

We should totally have a type "uuidserial".

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.4 Proposal: Initdb creates a single table

From

David Fetter

Date:

23 April 2014, 19:24:35

On Wed, Apr 23, 2014 at 02:26:50PM -0300, Alvaro Herrera wrote:
> Josh Berkus wrote:
> > On 04/23/2014 07:43 AM, Alexander Korotkov wrote:
> > > I can propose contrib PostgreNoSQL providing following:
> > > 1) Table postgres as you proposed.
> > > 2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id
> > > integer, data jsonb) returns void, search_postgres(query jsonb) returns
> > > setof postgres. search_postgres will have semantics of @> jsonb operator
> > > 3) Background workers which provides HTTP wrapper over those functions.
> > 
> > You're forgetting ... sharding/replication over multiple masters.
> > 
> > Also, the id should be a text value so that folks can use hash keys, or
> > whatever.
> 
> We should totally have a type "uuidserial".

Something like it, certainly.  One thing SQL Server does right is to
have an opaque identity column, for which UUID would do an admirable
job.  We would need to build UUID functionality in, and I don't see
this as a hard task.  Should I draft it up as a self-contained
extension?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: 9.4 Proposal: Initdb creates a single table

From

David Fetter

Date:

23 April 2014, 23:25:23

On Wed, Apr 23, 2014 at 12:24:21PM -0700, David Fetter wrote:
> On Wed, Apr 23, 2014 at 02:26:50PM -0300, Alvaro Herrera wrote:
> > Josh Berkus wrote:
> > > On 04/23/2014 07:43 AM, Alexander Korotkov wrote:
> > > > I can propose contrib PostgreNoSQL providing following:
> > > > 1) Table postgres as you proposed.
> > > > 2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id
> > > > integer, data jsonb) returns void, search_postgres(query jsonb) returns
> > > > setof postgres. search_postgres will have semantics of @> jsonb operator
> > > > 3) Background workers which provides HTTP wrapper over those functions.
> > > 
> > > You're forgetting ... sharding/replication over multiple masters.
> > > 
> > > Also, the id should be a text value so that folks can use hash keys, or
> > > whatever.
> > 
> > We should totally have a type "uuidserial".
> 
> Something like it, certainly.  One thing SQL Server does right is to
> have an opaque identity column, for which UUID would do an admirable
> job.  We would need to build UUID functionality in, and I don't see
> this as a hard task.  Should I draft it up as a self-contained
> extension?

So I did a little research, and it appears that there's a part of
util-linux that does UUIDs and is available under the 3-clause BSDL.
It only does time- and urandom-based UUIDs, but that's probably a
better start than nothing.

Is there any good reason not to roll native UUID generation into our
standard distribution?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: 9.4 Proposal: Initdb creates a single table

From

Tom Lane

Date:

24 April 2014, 00:28:01

David Fetter <david@fetter.org> writes:
> Is there any good reason not to roll native UUID generation into our
> standard distribution?

It's already there (as of 9.4) in pg_crypto.
        regards, tom lane

Re: 9.4 Proposal: Initdb creates a single table

From

David Fetter

Date:

24 April 2014, 04:32:47

On Wed, Apr 23, 2014 at 08:27:52PM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > Is there any good reason not to roll native UUID generation into
> > our standard distribution?
> 
> It's already there (as of 9.4) in pg_crypto.

Sorry I wasn't clear enough.

Since contrib/pgcrypto is a module that might well not be installed,
people can't just build software for PostgreSQL and have UUIDs
available, certainly not in the sense that, for example, BIGSERIAL is.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: 9.4 Proposal: Initdb creates a single table

From

Simon Riggs

Date:

24 April 2014, 06:59:14

On 24 April 2014 05:32, David Fetter <david@fetter.org> wrote:
> On Wed, Apr 23, 2014 at 08:27:52PM -0400, Tom Lane wrote:
>> David Fetter <david@fetter.org> writes:
>> > Is there any good reason not to roll native UUID generation into
>> > our standard distribution?
>>
>> It's already there (as of 9.4) in pg_crypto.
>
> Sorry I wasn't clear enough.
>
> Since contrib/pgcrypto is a module that might well not be installed,
> people can't just build software for PostgreSQL and have UUIDs
> available, certainly not in the sense that, for example, BIGSERIAL is.

+1 to include in core - strange to have a UUID datatype in core but no
means to generate

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.4 Proposal: Initdb creates a single table

From

Michael Paquier

Date:

24 April 2014, 11:55:01

On Thu, Apr 24, 2014 at 3:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 24 April 2014 05:32, David Fetter <david@fetter.org> wrote:
>> Since contrib/pgcrypto is a module that might well not be installed,
>> people can't just build software for PostgreSQL and have UUIDs
>> available, certainly not in the sense that, for example, BIGSERIAL is.
>
> +1 to include in core - strange to have a UUID datatype in core but no
> means to generate
+1.
-- 
Michael

Re: 9.4 Proposal: Initdb creates a single table

From

Marti Raudsepp

Date:

24 April 2014, 13:40:58

On Wed, Apr 23, 2014 at 11:26 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> but for a long time I've thought that it would be nice if
> PostgreSQL came with an example database that had a number of tables,
> perhaps that mock up some easy to relate to real-world application. These
> would be very useful to use as examples in the documents instead of
> inventing them in the ad-hoc way that we currently do. Like here:
> http://www.postgresql.org/docs/9.3/static/tutorial-window.html

I think that's a great idea. I'm not convinced it should be created by
default in initdb, but a "CREATE EXTENSION sample_data" seems easy
enough for newbies to use and has a good chance of getting merged into
contrib.

Regards,
Marti

Re: 9.4 Proposal: Initdb creates a single table

From

Pavel Stehule

Date:

24 April 2014, 14:21:53

2014-04-24 15:40 GMT+02:00 Marti Raudsepp <marti@juffo.org>:

On Wed, Apr 23, 2014 at 11:26 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> but for a long time I've thought that it would be nice if
> PostgreSQL came with an example database that had a number of tables,
> perhaps that mock up some easy to relate to real-world application. These
> would be very useful to use as examples in the documents instead of
> inventing them in the ad-hoc way that we currently do. Like here:
> http://www.postgresql.org/docs/9.3/static/tutorial-window.html

I think that's a great idea. I'm not convinced it should be created by
default in initdb, but a "CREATE EXTENSION sample_data" seems easy
enough for newbies to use and has a good chance of getting merged into
contrib.

Good idea

Pavel

Regards,
Marti

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: 9.4 Proposal: Initdb creates a single table

From

Tom Lane

Date:

24 April 2014, 14:57:49

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 24 April 2014 05:32, David Fetter <david@fetter.org> wrote:
>> Since contrib/pgcrypto is a module that might well not be installed,
>> people can't just build software for PostgreSQL and have UUIDs
>> available, certainly not in the sense that, for example, BIGSERIAL is.

> +1 to include in core - strange to have a UUID datatype in core but no
> means to generate

The reason why there's no generation function in core is that there is no
standardized, guaranteed-to-produce-a-universally-unique-value generation
algorithm.  That was the reason for not putting something in core when the
type was first created, and I do not see that the technology has advanced.
        regards, tom lane

Re: 9.4 Proposal: Initdb creates a single table

From

Fabrízio de Royes Mello

Date:

24 April 2014, 14:58:08

On Thu, Apr 24, 2014 at 11:21 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2014-04-24 15:40 GMT+02:00 Marti Raudsepp <marti@juffo.org>:

On Wed, Apr 23, 2014 at 11:26 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> but for a long time I've thought that it would be nice if
> PostgreSQL came with an example database that had a number of tables,
> perhaps that mock up some easy to relate to real-world application. These
> would be very useful to use as examples in the documents instead of
> inventing them in the ad-hoc way that we currently do. Like here:
> http://www.postgresql.org/docs/9.3/static/tutorial-window.html

I think that's a great idea. I'm not convinced it should be created by
default in initdb, but a "CREATE EXTENSION sample_data" seems easy
enough for newbies to use and has a good chance of getting merged into
contrib.

Good idea

This "sample data" can be very useful to many things:

* trainning

* automated tests

* benchmark ???

Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL

>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: 9.4 Proposal: Initdb creates a single table

From

Hannu Krosing

Date:

24 April 2014, 15:06:10

On 04/24/2014 04:57 PM, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 24 April 2014 05:32, David Fetter <david@fetter.org> wrote:
>>> Since contrib/pgcrypto is a module that might well not be installed,
>>> people can't just build software for PostgreSQL and have UUIDs
>>> available, certainly not in the sense that, for example, BIGSERIAL is.
>> +1 to include in core - strange to have a UUID datatype in core but no
>> means to generate
> The reason why there's no generation function in core is that there is no
> standardized, guaranteed-to-produce-a-universally-unique-value generation
> algorithm.  That was the reason for not putting something in core when the
> type was first created, and I do not see that the technology has advanced.
Why can't we implement all 5 variants from 
http://en.wikipedia.org/wiki/Universally_unique_identifier
and just warn about the dangers in documentation ?

we could expose it something like next_uuid(<version nr>);

As the article points out " Since the identifiers have a finite size, it is
possible for two differing items to share the same identifier." so it is a
known limitation of UUID and not something PostgreSQL specific.


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: 9.4 Proposal: Initdb creates a single table

From

Tom Lane

Date:

24 April 2014, 15:30:29

Hannu Krosing <hannu@2ndQuadrant.com> writes:
> On 04/24/2014 04:57 PM, Tom Lane wrote:
>> The reason why there's no generation function in core is that there is no
>> standardized, guaranteed-to-produce-a-universally-unique-value generation
>> algorithm.  That was the reason for not putting something in core when the
>> type was first created, and I do not see that the technology has advanced.

> Why can't we implement all 5 variants from 
> http://en.wikipedia.org/wiki/Universally_unique_identifier
> and just warn about the dangers in documentation ?

Essentially, that would mean carrying around our own implementation
of libuuid --- which includes a bunch of not-terribly-portable stuff,
such as discovering the machine's MAC address(es).  That's not really
something I want to see us putting project manpower into.

I wonder what it would take to adapt contrib/uuid-ossp to work on
top of some other popular implementation of that code.  We pretty
much bet on the wrong horse when we picked the OSSP library to
depend on, but otherwise I think the principle of using an external
library was good.
        regards, tom lane

Re: 9.4 Proposal: Initdb creates a single table

From

David Fetter

Date:

24 April 2014, 16:35:25

On Thu, Apr 24, 2014 at 11:30:15AM -0400, Tom Lane wrote:
> Hannu Krosing <hannu@2ndQuadrant.com> writes:
> > On 04/24/2014 04:57 PM, Tom Lane wrote:
> >> The reason why there's no generation function in core is that
> >> there is no standardized,
> >> guaranteed-to-produce-a-universally-unique-value generation
> >> algorithm.  That was the reason for not putting something in core
> >> when the type was first created, and I do not see that the
> >> technology has advanced.
> 
> > Why can't we implement all 5 variants from
> > http://en.wikipedia.org/wiki/Universally_unique_identifier and
> > just warn about the dangers in documentation ?
> 
> Essentially, that would mean carrying around our own implementation
> of libuuid --- which includes a bunch of not-terribly-portable
> stuff, such as discovering the machine's MAC address(es).  That's
> not really something I want to see us putting project manpower into.

We don't need to do the not-terribly-portable stuff in the first
round.  For that, there could still be a bundled extension.

The point is that UUIDs are nowhere near as usable as users have the
right to expect, and we should fix that.

> I wonder what it would take to adapt contrib/uuid-ossp to work on
> top of some other popular implementation of that code.  We pretty
> much bet on the wrong horse when we picked the OSSP library to
> depend on, but otherwise I think the principle of using an external
> library was good.

So long as we can pick another horse later, sure.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: 9.4 Proposal: Initdb creates a single table

From

Tom Lane

Date:

24 April 2014, 16:53:27

David Fetter <david@fetter.org> writes:
> On Thu, Apr 24, 2014 at 11:30:15AM -0400, Tom Lane wrote:
>> Essentially, that would mean carrying around our own implementation
>> of libuuid --- which includes a bunch of not-terribly-portable
>> stuff, such as discovering the machine's MAC address(es).  That's
>> not really something I want to see us putting project manpower into.

> We don't need to do the not-terribly-portable stuff in the first
> round.  For that, there could still be a bundled extension.

> The point is that UUIDs are nowhere near as usable as users have the
> right to expect, and we should fix that.

The reason that UUIDs aren't as usable as users "have a right to expect"
is that the underlying technology doesn't meet their (your) expectations.
Just because it's easy to imagine that there are universally unique
identifiers doesn't mean that there actually *are* universally unique
identifiers.  There are only approximations with varying failure modes.

This is not our fault, and I don't want us to get caught up in trying
to fix a fundamentally broken concept --- which is what a generic
"uuidserial" API would be.  If you try to paper over the difficulties
here, they'll just bite you on the rear someday.
        regards, tom lane

Re: 9.4 Proposal: Initdb creates a single table

From

Alvaro Herrera

Date:

24 April 2014, 17:00:30

Tom Lane wrote:

> This is not our fault, and I don't want us to get caught up in trying
> to fix a fundamentally broken concept --- which is what a generic
> "uuidserial" API would be.  If you try to paper over the difficulties
> here, they'll just bite you on the rear someday.

But we have non-colliding generation technology for OIDs in system
catalogs.  We could try to reuse the idea in a UUID generator: grab one
value, try to insert; if it fails generate a new one, lather, rinse,
repeat.

This would make uuidserial not quite as simple as SERIAL pseudotype, of
course.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.4 Proposal: Initdb creates a single table

From

Heikki Linnakangas

Date:

24 April 2014, 17:02:46

On 04/24/2014 08:00 PM, Alvaro Herrera wrote:
> Tom Lane wrote:
>
>> This is not our fault, and I don't want us to get caught up in trying
>> to fix a fundamentally broken concept --- which is what a generic
>> "uuidserial" API would be.  If you try to paper over the difficulties
>> here, they'll just bite you on the rear someday.
>
> But we have non-colliding generation technology for OIDs in system
> catalogs.  We could try to reuse the idea in a UUID generator: grab one
> value, try to insert; if it fails generate a new one, lather, rinse,
> repeat.

Umm, UUID stands for Universally Unique IDentifier. That would hardly be 
*universally* unique.

- Heikki

Re: 9.4 Proposal: Initdb creates a single table

From

Josh Berkus

Date:

24 April 2014, 17:07:19

On 04/24/2014 10:00 AM, Alvaro Herrera wrote:
> Tom Lane wrote:
> 
>> This is not our fault, and I don't want us to get caught up in trying
>> to fix a fundamentally broken concept --- which is what a generic
>> "uuidserial" API would be.  If you try to paper over the difficulties
>> here, they'll just bite you on the rear someday.
> 
> But we have non-colliding generation technology for OIDs in system
> catalogs.  We could try to reuse the idea in a UUID generator: grab one
> value, try to insert; if it fails generate a new one, lather, rinse,
> repeat.
> 
> This would make uuidserial not quite as simple as SERIAL pseudotype, of
> course.

Tangentally related to this, I believe that PostgreSQL may be the only
current user of the uuid-ossp code at this point.  So we're going to end
up maintaining it for the UUID contrib module, or dropping that module.

Regardless, this all sounds like speculative 9.5ish conversation, and
only vaguely related to Simon's original proposal.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: 9.4 Proposal: Initdb creates a single table

From

Alvaro Herrera

Date:

24 April 2014, 17:14:56

Heikki Linnakangas wrote:
> On 04/24/2014 08:00 PM, Alvaro Herrera wrote:
> >Tom Lane wrote:
> >
> >>This is not our fault, and I don't want us to get caught up in trying
> >>to fix a fundamentally broken concept --- which is what a generic
> >>"uuidserial" API would be.  If you try to paper over the difficulties
> >>here, they'll just bite you on the rear someday.
> >
> >But we have non-colliding generation technology for OIDs in system
> >catalogs.  We could try to reuse the idea in a UUID generator: grab one
> >value, try to insert; if it fails generate a new one, lather, rinse,
> >repeat.
> 
> Umm, UUID stands for Universally Unique IDentifier. That would
> hardly be *universally* unique.

I don't understand your point.  I'm only replying to Tom's assertion
that UUID generation might not be all that unique after all (or, in
other words, AIUI, that the "universally unique" part of the name is
wishful thinking and not an actual property of the real thing.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.4 Proposal: Initdb creates a single table

From

Alvaro Herrera

Date:

24 April 2014, 17:23:43

Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
> > On 04/24/2014 08:00 PM, Alvaro Herrera wrote:
> > >Tom Lane wrote:
> > >
> > >>This is not our fault, and I don't want us to get caught up in trying
> > >>to fix a fundamentally broken concept --- which is what a generic
> > >>"uuidserial" API would be.  If you try to paper over the difficulties
> > >>here, they'll just bite you on the rear someday.
> > >
> > >But we have non-colliding generation technology for OIDs in system
> > >catalogs.  We could try to reuse the idea in a UUID generator: grab one
> > >value, try to insert; if it fails generate a new one, lather, rinse,
> > >repeat.
> > 
> > Umm, UUID stands for Universally Unique IDentifier. That would
> > hardly be *universally* unique.
> 
> I don't understand your point.  I'm only replying to Tom's assertion
> that UUID generation might not be all that unique after all (or, in
> other words, AIUI, that the "universally unique" part of the name is
> wishful thinking and not an actual property of the real thing.)

Oh, I think I see your point: it's that no matter what we do here, there
would be no way to guarantee that a value we generate does not collide
with any other value elsewhere (either on other uuidserial columns, or
on other servers).

Is that it?

Because if it is, then I think the problem is that the UUID concept
might be flawed yet users still want to use it, and we do no service by
refusing to provide it on those grounds.


Now I certainly don't want my rear bitten more than it already is,
anyway.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.4 Proposal: Initdb creates a single table

From

Heikki Linnakangas

Date:

24 April 2014, 17:37:46

On 04/24/2014 08:23 PM, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>> Heikki Linnakangas wrote:
>>> On 04/24/2014 08:00 PM, Alvaro Herrera wrote:
>>>> Tom Lane wrote:
>>>>
>>>>> This is not our fault, and I don't want us to get caught up in trying
>>>>> to fix a fundamentally broken concept --- which is what a generic
>>>>> "uuidserial" API would be.  If you try to paper over the difficulties
>>>>> here, they'll just bite you on the rear someday.
>>>>
>>>> But we have non-colliding generation technology for OIDs in system
>>>> catalogs.  We could try to reuse the idea in a UUID generator: grab one
>>>> value, try to insert; if it fails generate a new one, lather, rinse,
>>>> repeat.
>>>
>>> Umm, UUID stands for Universally Unique IDentifier. That would
>>> hardly be *universally* unique.
>>
>> I don't understand your point.  I'm only replying to Tom's assertion
>> that UUID generation might not be all that unique after all (or, in
>> other words, AIUI, that the "universally unique" part of the name is
>> wishful thinking and not an actual property of the real thing.)
>
> Oh, I think I see your point: it's that no matter what we do here, there
> would be no way to guarantee that a value we generate does not collide
> with any other value elsewhere (either on other uuidserial columns, or
> on other servers).
>
> Is that it?

Yep.

> Because if it is, then I think the problem is that the UUID concept
> might be flawed yet users still want to use it, and we do no service by
> refusing to provide it on those grounds.

Well, we should make a reasonable effort to make them unique. If there 
is a reliable-enough way to generate UUIDs that doesn't depend on 
external libraries, by all means lets have it in core. I believe the 
reason we put gen_random_uuid() in pgcrypto is that it needs a good 
random number generator, and we don't trust plain old random() to be 
good enough for that.

- Heikki

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Josh Berkus

Date:

24 April 2014, 17:40:28

Alvaro,

>> I don't understand your point.  I'm only replying to Tom's assertion
>> that UUID generation might not be all that unique after all (or, in
>> other words, AIUI, that the "universally unique" part of the name is
>> wishful thinking and not an actual property of the real thing.)
> 
> Oh, I think I see your point: it's that no matter what we do here, there
> would be no way to guarantee that a value we generate does not collide
> with any other value elsewhere (either on other uuidserial columns, or
> on other servers).
> 
> Is that it?
> 
> Because if it is, then I think the problem is that the UUID concept
> might be flawed yet users still want to use it, and we do no service by
> refusing to provide it on those grounds.

It's more than that:

1) the concept of UUIDs is fundamentally flawed, to the extent that if
we have a UUID type in core its flaws become our flaws, to be handled in
bug reports forever.

2) Because the concept of UUIDs is flawed, there are multiple competing
implementations, none of which is clearly dominant and durable.  As
such, any UUID algorithm we adopt for core stands a significant risk of
being later abandoned by everyone else and becoming a PostgreSQL wart.

3) In general, users who want UUIDs don't want a generic concept of
UUIDs; they want the specific UUIDs which work with their individual
programming languages, web frameworks, or queueing platforms.  So, see
competing implementations above.

As case in point for (2), as I said upthread: uuid-ossp, which has been
our option for UUID in contrib since originally it was the only OSS
implementation, is now abandoned by everyone but us.

Additionally, were I to adopt a UUID scheme for PostgreSQL, I would want
to to be *for postgresql*, with components indicating server, table and
schema of origin for each ID.  A pseudo-random UUID is frankly pretty
useless to me because (a) it's not really unique, and (b) it doesn't
help me route data at all.

Alternatively, what would be *really* useful is to have a way for an
extension to plug into the "serial" concept, so that it gets all of the
benefits of serial (permissions, dependancies, etc.) while being able to
call a custom generator function.

Oh, and:

4) IIRC, Andres has already worked out a scheme for distributed serials
to support BDR.  So this is a solved problem for the only really
interesting use case ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: 9.4 Proposal: Initdb creates a single table

From

Tom Lane

Date:

24 April 2014, 17:45:48

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Oh, I think I see your point: it's that no matter what we do here, there
> would be no way to guarantee that a value we generate does not collide
> with any other value elsewhere (either on other uuidserial columns, or
> on other servers).

Not that way, anyway.

> Because if it is, then I think the problem is that the UUID concept
> might be flawed yet users still want to use it, and we do no service by
> refusing to provide it on those grounds.

My point is a bit more subtle than that.  It's perfectly reasonable to
want a universally-unique identifier in some applications.  The problem
is that there is no perfect implementation, and so people really need
to stop and think about which generator algorithm they're going to use
and whether its particular failure modes are acceptable in their
context.  So I'm not eager to provide some "easy to use" API that
encourages users to believe that no thought is required on their part.

Having said that, though, the argument around whether such facilities
belong in core seems to devolve to whether you want to buy into
maintaining libuuid for ourselves (in fact worse than that: it's not clear
that libuuid ever has worked on all our platforms, so we might be buying
into some up-front porting work too).  As Josh notes, we may be forced
into that anyway if we don't want to walk away from contrib/uuid-ossp.
But first I'd like to see a bit more research into what the alternatives
might be...
        regards, tom lane

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Christopher Browne

Date:

24 April 2014, 19:48:01

Last year, I built a pl/pgsql generator of "version 1-ish" UUIDs, which would combine timestamps with local information to construct data that kind of emulated the timestamp+MAC address that is version #1 of UUID.

Note that there are several versions of UUIDs:

1. Combines MAC address, timestamp, random #

2. DCE Security (replaces some bits with user's UID/GID and others with POSIX Domain); I don't think this one is much used...

3. MD5 Hash

4. Purely Random

5. SHA-1 Hash

There are merits to each. The "tough one" is #1, as that requires pulling data that can't generally be accessed portably.

I figured out (and could probably donate some code) how to construct the bits of #1 using the inputs of *my* choice (e.g. - I set up to "make up" my own MAC address surrogate, and transformed PostgreSQL timestamp values into the timestamp, and threw in my own bit of randomness), which provided well-formed UUIDs with nice enough characteristics.

It wouldn't be "out there" to do a somewhat PostgreSQL-flavoured version of this that wouldn't actually use MAC addresses, but rather, would use data we have:

a) Having a sequence feeding some local uniqueness would fit with the "clock seq" bits (e.g. - the octets in RFC 4122 entitled clock-seq-and-reserved and clock-seq-low)

b) NOW() provides data for time-low, time-mid, time-high-and-version

c) We'd need 6 hex octets for "node"; I seem to recall there being something established by initdb that might be usable.

The only piece that's directly troublesome, for UUID Type 1, is the "node" value. I'll observe that it isn't unusual for UUID implementations to generate random values for that.

Note that for the other UUID versions, there's NO non-portable data needed.

It seems to me that a "UUIDserial" type, which combined:

a) A sequence, to be the 'clock';

b) Possibly another sequence to store local node ID, which might get seeded from DB internals
would provide a "PostgreSQL-flavoured" version of UUID Type 1.

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Marti Raudsepp

Date:

25 April 2014, 00:23:51

On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote:
> A pseudo-random UUID is frankly pretty
> useless to me because (a) it's not really unique

This is FUD. A pseudorandom UUID contains 122 bits of randomness. As
long as you can trust the random number generator, the chances of a
value occurring twice can be estimated using the birthday paradox:
there's a 50% chance of having *one* collision in a set of 2^61 items.
Storing this amount of UUIDs alone requires 32 exabytes of storage.
Factor in the tuple and indexing overheads and you'd be needing close
to all the hard disk space ever manufactured in the world.

If you believe there's a chance of ever seeing a pseudorandom UUID
collision in practice, you should be buying lottery tickets.

To the contrary. Combined with the fact that pseudorandom UUID
generation doesn't require any configuration (node ID), doesn't leak
any private data (MAC address) and relies on infrastructure that's
ubiquitous anyway (cryptographic PRNG) it's almost always the right
answer.

> (b) it doesn't help me route data at all.

That's really out of scope for UUIDs. They're about generating
identifiers, not describing what the identifier means. UUIDs also
don't happen to cure cancer.

Regards,
Marti

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Tom Lane

Date:

25 April 2014, 00:36:18

Marti Raudsepp <marti@juffo.org> writes:
> On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> A pseudo-random UUID is frankly pretty
>> useless to me because (a) it's not really unique

> This is FUD. A pseudorandom UUID contains 122 bits of randomness. As
> long as you can trust the random number generator, the chances of a
> value occurring twice can be estimated using the birthday paradox:
> there's a 50% chance of having *one* collision in a set of 2^61 items.

Of course, the weak spot in this analysis is the assumption that there
are actually 122 independent bits in the value.  It's not difficult to
imagine that systems with crummy random() implementations might only have
something like 32 bits worth of real randomness.  Or less.  Seeding your
PRNG from gettimeofday(), for instance, is highly likely to lead to
collisions ... no matter how good the PRNG itself is.

> If you believe there's a chance of ever seeing a pseudorandom UUID
> collision in practice, you should be buying lottery tickets.

Now *that*, I'd call FUD.  The issue here is not whether collisions
are improbable under ideal circumstances.  The issue is how much work
does it take to have some confidence that you're anywhere near the
ideal case.
        regards, tom lane

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Marti Raudsepp

Date:

25 April 2014, 00:43:48

On Fri, Apr 25, 2014 at 3:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Of course, the weak spot in this analysis is the assumption that there
> are actually 122 independent bits in the value.  It's not difficult to
> imagine that systems with crummy random() implementations might only have
> something like 32 bits worth of real randomness.

Obviously you can't use random(). That's why I talked about
cryptographic PRNGs, crypto libraries do proper seeding and generate
reliably random numbers all the time.

Regards,
Marti

Re: 9.4 Proposal: Initdb creates a single table

From

Peter Eisentraut

Date:

25 April 2014, 01:46:54

On Thu, 2014-04-24 at 13:45 -0400, Tom Lane wrote:
> Having said that, though, the argument around whether such facilities
> belong in core seems to devolve to whether you want to buy into
> maintaining libuuid for ourselves (in fact worse than that: it's not
> clear
> that libuuid ever has worked on all our platforms, so we might be
> buying
> into some up-front porting work too).  As Josh notes, we may be forced
> into that anyway if we don't want to walk away from contrib/uuid-ossp.

I think we arrived at the conclusion a short while go that we wanted to
stick a deprecation notice on contrib/uuid-ossp for the 9.4 release.

I wrote an extension that wraps libuuid:
https://github.com/petere/pglibuuid

Between pgcrypto and that, is there anything missing that users of
uuid-ossp would have had?

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Christopher Browne

Date:

25 April 2014, 01:51:27

On Thu, Apr 24, 2014 at 8:43 PM, Marti Raudsepp <marti@juffo.org> wrote:

On Fri, Apr 25, 2014 at 3:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Of course, the weak spot in this analysis is the assumption that there
> are actually 122 independent bits in the value. It's not difficult to
> imagine that systems with crummy random() implementations might only have
> something like 32 bits worth of real randomness.

Obviously you can't use random(). That's why I talked about
cryptographic PRNGs, crypto libraries do proper seeding and generate
reliably random numbers all the time.

... And we can't be certain that there won't be some internal characteristic weakness.

Cryptography is *hard*; treating it as certainty that things will be gotten correct

tends to be a foolish assumption.

Which is why UUID "type 1" resolves this by combining multiple sorts of

anti-correlations, the combination of:

a) Node-specific information (e.g. - in the standard form, parts of the MAC address), so no collisions between node A and node B.

b) Timestamp, so that things that happen at different times will be kept unique.

c) An extra sequence, so that if there are multiple events on the same node at the same time, they *still* don't collide.

I trust the combination to work pretty well, and that's why it was designed that way.

A RNG, however good, can't provide the same guarantees of lack of conflicts.

--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Greg Stark

Date:

25 April 2014, 15:19:08

On Fri, Apr 25, 2014 at 1:43 AM, Marti Raudsepp <marti@juffo.org> wrote:
> Obviously you can't use random(). That's why I talked about
> cryptographic PRNGs, crypto libraries do proper seeding and generate
> reliably random numbers all the time.

The difficulty lies not really in the PRNG implementation (which is
hard but well enough understood that it's not much of an issue these
days). The difficulty lies in obtaining enough entropy. There are ways
of obtaining enough entropy and they are available. But they're not
free.

Obtaining enough entropy requires access to hardware devices which
means a kernel system call. Kernel system calls are relatively slow
when you're talking about generating sequential IDs. They also deplete
the available entropy pool for other sources which may means they have
security consequences.

Which isn't to say they're a bad idea but like everything else in
engineering there are tradeoffs and no such thing as a free lunch.
You can avoid depleting the entropy pool by including data you expect
to be unique as a kind of fake entropy -- which quickly gets you back
to looking for things like MAC address to avoid duplicates across
systems.

-- 
greg

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Josh Berkus

Date:

25 April 2014, 17:58:44

On 04/24/2014 05:23 PM, Marti Raudsepp wrote:
> On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> A pseudo-random UUID is frankly pretty
>> useless to me because (a) it's not really unique
> 
> This is FUD. A pseudorandom UUID contains 122 bits of randomness. As
> long as you can trust the random number generator, the chances of a
> value occurring twice can be estimated using the birthday paradox:
> there's a 50% chance of having *one* collision in a set of 2^61 items.
> Storing this amount of UUIDs alone requires 32 exabytes of storage.
> Factor in the tuple and indexing overheads and you'd be needing close
> to all the hard disk space ever manufactured in the world.

Well, I've already had collisions with UUID-OSSP, in production, with
only around 20 billion values.  So clearly there aren't 122bits of true
randomness in OSSP.  I can't speak for other implementations because I
haven't tried them.

>> (b) it doesn't help me route data at all.
> 
> That's really out of scope for UUIDs. They're about generating
> identifiers, not describing what the identifier means. UUIDs also
> don't happen to cure cancer.

http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327

On the contrary, I would argue that an object identifier which is
completely random is possibly the worst way to form an ID of all
possible concepts; there's no relationship whatsoever between the ID,
the application stack, and the application data; you don't even get the
pseudo-time indexing you get with Serials.   The only reason to do it is
because you're too lazy do implement a better way.

Or to put it another way: a value which is truly random is no identifier
at all.

Compare this with a composite identifier which carries information about
the node, table, and schema of origin for the tuple.  Not only does this
help ensure uniqueness, but it also supports intelligent sharding and
multi-master replication systems.  I don't speak hypothetically; we've
done this in the past and will do it again in the future.

I would love to have some machinery inside PostgreSQL to make this
easier (for example, a useful unique database ID), but I suspect that
acutal implementation will always remain application-specific.

You may say "oh, that's not the job of the identifer", but if it's not,
WTF is the identifer for, then?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

David Fetter

Date:

25 April 2014, 18:46:22

On Fri, Apr 25, 2014 at 10:58:29AM -0700, Josh Berkus wrote:
> On 04/24/2014 05:23 PM, Marti Raudsepp wrote:
> > On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus <josh@agliodbs.com> wrote:
> >> A pseudo-random UUID is frankly pretty
> >> useless to me because (a) it's not really unique
> > 
> > This is FUD. A pseudorandom UUID contains 122 bits of randomness. As
> > long as you can trust the random number generator, the chances of a
> > value occurring twice can be estimated using the birthday paradox:
> > there's a 50% chance of having *one* collision in a set of 2^61 items.
> > Storing this amount of UUIDs alone requires 32 exabytes of storage.
> > Factor in the tuple and indexing overheads and you'd be needing close
> > to all the hard disk space ever manufactured in the world.
> 
> Well, I've already had collisions with UUID-OSSP, in production, with
> only around 20 billion values.  So clearly there aren't 122bits of true
> randomness in OSSP.  I can't speak for other implementations because I
> haven't tried them.
> 
> >> (b) it doesn't help me route data at all.
> > 
> > That's really out of scope for UUIDs. They're about generating
> > identifiers, not describing what the identifier means. UUIDs also
> > don't happen to cure cancer.
> 
> http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327
> 
> On the contrary, I would argue that an object identifier which is
> completely random is possibly the worst way to form an ID of all
> possible concepts; there's no relationship whatsoever between the ID,
> the application stack, and the application data; you don't even get the
> pseudo-time indexing you get with Serials.   The only reason to do it is
> because you're too lazy do implement a better way.
> 
> Or to put it another way: a value which is truly random is no identifier
> at all.

Not exactly.  It's at least potentially hiding information an attacker
could use, with all the caveats that carries.

> Compare this with a composite identifier which carries information about
> the node, table, and schema of origin for the tuple.  Not only does this
> help ensure uniqueness, but it also supports intelligent sharding and
> multi-master replication systems.  I don't speak hypothetically; we've
> done this in the past and will do it again in the future.

This is an excellent idea, but I don't think it's in scope for UUIDs.

> I would love to have some machinery inside PostgreSQL to make this
> easier (for example, a useful unique database ID), but I suspect that
> acutal implementation will always remain application-specific.
> 
> You may say "oh, that's not the job of the identifer", but if it's not,
> WTF is the identifer for, then?

Frequently, it's to provide some kind of opacity in the sense of not
have an obvious predecessor or successor.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Martijn van Oosterhout

Date:

26 April 2014, 17:41:07

On Fri, Apr 25, 2014 at 04:18:18PM +0100, Greg Stark wrote:
> Which isn't to say they're a bad idea but like everything else in
> engineering there are tradeoffs and no such thing as a free lunch.
> You can avoid depleting the entropy pool by including data you expect
> to be unique as a kind of fake entropy -- which quickly gets you back
> to looking for things like MAC address to avoid duplicates across
> systems.

ISTM you could use the database identifier we already have to at least
produce "UUID"s which are unique amongst PostgreSQL instances. That
might be something worth aiming for?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Tom Lane

Date:

26 April 2014, 18:18:53

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Fri, Apr 25, 2014 at 04:18:18PM +0100, Greg Stark wrote:
>> Which isn't to say they're a bad idea but like everything else in
>> engineering there are tradeoffs and no such thing as a free lunch.
>> You can avoid depleting the entropy pool by including data you expect
>> to be unique as a kind of fake entropy -- which quickly gets you back
>> to looking for things like MAC address to avoid duplicates across
>> systems.

> ISTM you could use the database identifier we already have to at least
> produce "UUID"s which are unique amongst PostgreSQL instances. That
> might be something worth aiming for?

It's worth noting in this connection that we've never tried hard to ensure
that "database identifiers" are actually unique.  One potentially serious
issue is that slave servers will have the same identifier as their master.

Also, I think there's a still-open issue that creation of the identifier
has a thinko about using OR instead of XOR, resulting in way few bits of
freedom than it should have even with the limited amount of entropy used.
        regards, tom lane

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Josh Berkus

Date:

26 April 2014, 19:58:47

On 04/26/2014 11:18 AM, Tom Lane wrote:
> It's worth noting in this connection that we've never tried hard to ensure
> that "database identifiers" are actually unique.  One potentially serious
> issue is that slave servers will have the same identifier as their master.

Yeah, this is one of those things I've been thinking about.  The proble
is that we need a "node" ID, which identifies the PostgreSQL instance,
and a "dataset ID", which identifies the chain of data, especially when
combined with the timeline ID.  So a master and replica would have
different node IDs, but the same dataset ID, until the replica is
promoted, at which point its dataset ID + timeline No. would change.
This would allow for relatively easy management of large clusters by
allowing automated identification of databases and their mirrors.

However, there's a fundamental problem with the concept of the dataset
ID in that there's absolutely no way for PostgreSQL to know when it has
a unique dataset.  Consider a downtime database file cloning for
example; the two databases would have the same identifier and yet both
be standalones which quickly diverge.  So I haven't thought of a good
solution to that.

We could implement a NodeID, though, based on some combination of IP/MAC
address and port though.  Not entirely reliable, but better than nothing ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Josh Berkus

Date:

26 April 2014, 20:08:40

On 04/25/2014 11:46 AM, David Fetter wrote:
> On Fri, Apr 25, 2014 at 10:58:29AM -0700, Josh Berkus wrote:
>> You may say "oh, that's not the job of the identifer", but if it's not,
>> WTF is the identifer for, then?
> 
> Frequently, it's to provide some kind of opacity in the sense of not
> have an obvious predecessor or successor.

A far better solution to that is to not share the unadorned ID with the
user.

Basically, there's two different reasons to offer UUIDs in PostgreSQL:

1) because they actually serve a useful purpose in providing a globally
"unique" identifier;

2) because they work well with existing platforms and frameworks.

Given the state of the art, the above two goals are separate and
exclusive, apologists for poorly conceived UUID algorithms
nonwithstanding.  So either we provide a UUID type which actually helps
identify unique entities between database servers, OR we supply a UUID
which "just works" with popular web frameworks, or we supply both *as
two or more different types*.  But claiming that types chosen because
they're popular are also technically sound is misleading at best.

Further, based on our experience with OSSP, if we're going to make a
UUId type in core because it's currently popular, we'd better be pretty
sure that it's still going to be popular in 5 or 10 years from now.
Otherwise we're better off keeping it an extension.

I personally am interested in a UUID type which would support doing
multi-master replication of JSON databases built on PostgreSQL, and will
probably write one if nobody else does first, and I don't see existing,
naive randomization-based UUIDS as ever filling that role adequately.
Although, as I said, Andres' work in this area may have already taken
care of this.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Jim Nasby

Date:

26 April 2014, 23:57:44

On 4/25/14, 12:58 PM, Josh Berkus wrote:
> Well, I've already had collisions with UUID-OSSP, in production, with
> only around 20 billion values.  So clearly there aren't 122bits of true
> randomness in OSSP.  I can't speak for other implementations because I
> haven't tried them.

Or perhaps you should be buying lottery tickets? ;)

Can you write this up in a blog post? I've argued with people more than once about why it's a bad idea to trust on "1
ina bazillion" odds to protect your data (though, usually in the context of SHA1), and it'd be good to be able to point
ata real world example of this failing.

-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Greg Stark

Date:

27 April 2014, 01:53:13

On Sat, Apr 26, 2014 at 8:58 PM, Josh Berkus <josh@agliodbs.com> wrote:
> However, there's a fundamental problem with the concept of the dataset
> ID in that there's absolutely no way for PostgreSQL to know when it has
> a unique dataset.  Consider a downtime database file cloning for
> example; the two databases would have the same identifier and yet both
> be standalones which quickly diverge.  So I haven't thought of a good
> solution to that.

If you're content to use random numbers then you could generate one
from system entropy on every startup. If you generated a new timeline
for every startup then the pair of system id and random startup id
(which would be the new timelineid) would let you look at any two
instances and determine if they're related and where they diverged
even if it was from a database clone.

I don't think MAC address or other hardware identifiers really saves
you from using system entropy anyways. You might very well install a
clone on the same machine and in an environment like Heroku you could
very easily end up restoring a database onto the same VM twice
entirely by accident. I actually think using /dev/urandom is a better
idea than depending on things like MAC address almost always.

-- 
greg

Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

From

Marti Raudsepp

Date:

28 April 2014, 11:45:48

On Fri, Apr 25, 2014 at 8:58 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Well, I've already had collisions with UUID-OSSP, in production, with
> only around 20 billion values.  So clearly there aren't 122bits of true
> randomness in OSSP.  I can't speak for other implementations because I
> haven't tried them.

Interesting. The statistical chances of this happening should be
approximately 4e-17. Are you certain that this was due to uuid-ossp
and not an application bug?

Can you say what kind of operating system and environment that was? I
skimmed the sources of uuid-ossp 1.6.2 and it seems to be doing the
right thing, using /dev/urandom or /dev/random on Unixes and
CryptGenRandom on Windows. Barring any bugs, of course. However, if
these fail for whatever reason (e.g. out of file descriptors), it
falls back to libc random(), which is clearly broken.

On Fri, Apr 25, 2014 at 6:18 PM, Greg Stark <stark@mit.edu> wrote:
> The difficulty lies not really in the PRNG implementation (which is
> hard but well enough understood that it's not much of an issue these
> days). The difficulty lies in obtaining enough entropy. There are ways
> of obtaining enough entropy and they are available.

> Obtaining enough entropy requires access to hardware devices which
> means a kernel system call.

This is a solved problem in most environments, too. The kernel
collects entropy from unpredictable events and then seeds a global
CSPRNG with that. This collection happens always regardless of whether
you request random numbers or not, so essentially comes for "free".
Applications can then request output from this CSPRNG.

Reason being, this infrastructure is necessary for more critical tasks
than generating UUIDs: pretty much all of cryptography requires random
numbers.

> They also deplete
> the available entropy pool for other sources which may means they have
> security consequences.

This only affects the Linux /dev/random, which is discouraged these
days for that reason. Applications should use urandom instead. To my
knowledge, there are no other operating systems that have this
"depletion" behavior.

Regards,
Marti