Thread: Test database for new installs?

Test database for new installs?

From
Josh Berkus
Date:
Folks,

Some issues have come up repeatedly on IRC with new users, enough so that they 
might be worth addressing in the code:
a) new users try just to "psql" as postgres, and get a "no such database 
postgres";
b) new users use template1 as a testing database, and then have to re-initdb 
to clean it up.

Both of these things could be solved by creating an additional, non-template 
database called "postgres" at initdb.    For security reasons, this db would 
be set up in pg_hba.conf as accessable only by postgres via local.   It might 
not seem like it to experienced programmers, but having a "sandbox" database 
which lets you get used to PG commands would be a boon to people how are new 
to both Postgres and SQL databases in general.  

The only reason not to do it is space; each database takes up about 5mb.    
That's nothing to most users but could be a problem for a few.   Also, it 
would create a minor burden on the fsm to track an extra set of relations.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: Test database for new installs?

From
Rod Taylor
Date:
> The only reason not to do it is space; each database takes up about 5mb.    
> That's nothing to most users but could be a problem for a few.   Also, it 
> would create a minor burden on the fsm to track an extra set of relations.

Perhaps it could have an initdb flag to turn it off and be easily
dropped via drop database? Then it's not such a big deal.

As a side note, the database shouldn't be "postgres" but representative
of the username they're installing with. pgsql is another popular
username.

-- 



Re: Test database for new installs?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Some issues have come up repeatedly on IRC with new users, enough so that they 
> might be worth addressing in the code:
> a) new users try just to "psql" as postgres, and get a "no such database 
> postgres";
> b) new users use template1 as a testing database, and then have to re-initdb 
> to clean it up.

I think this is a documentation thing as much as anything else.  We
could just suggest that the first move after starting the postmaster becreatedb
(they don't even need to give it an argument ... how painless can you
get?)
        regards, tom lane


Re: Test database for new installs?

From
Gavin Sherry
Date:
On Thu, 18 Nov 2004, Tom Lane wrote:

> Josh Berkus <josh@agliodbs.com> writes:
> > Some issues have come up repeatedly on IRC with new users, enough so that they
> > might be worth addressing in the code:
> > a) new users try just to "psql" as postgres, and get a "no such database
> > postgres";
> > b) new users use template1 as a testing database, and then have to re-initdb
> > to clean it up.
>
> I think this is a documentation thing as much as anything else.  We
> could just suggest that the first move after starting the postmaster be
>     createdb
> (they don't even need to give it an argument ... how painless can you
> get?)

Perhaps initdb could spit out a line saying 'to create a test database for
the current user, issue /path/to/createdb'.

*thinks* I don't think it solves the problem Josh and others are seeing on
IRC though. From my experience, at least with reference to (a) above,
the user's OS comes with PostgreSQL installed or allows it to be installed
in some semi-automated way. Generally, the installation process runs
initdb in the background (which is a broken idea in my opinion). An
run level init script brings up the server and the user wants to connect.

It seems as though, if the distribution wants to make it this simple for a
user to get at an SQL console, then they should also create default
databases for users. My personal opinion is, however, that the
administrator of the machine should be forced to initdb which will force
he or her to read at least some of the manual.

Gavin


Re: Test database for new installs?

From
Peter Eisentraut
Date:
Am Donnerstag, 18. November 2004 22:07 schrieb Josh Berkus:
> a) new users try just to "psql" as postgres, and get a "no such database
> postgres";

This "problem" has been recognized before.  I think a possible solution is to 
make psql recognize the error (the error code regime in libpq would have to 
be extended for that), recognize that the user didn't specify a database, and 
then tell him something like:

ERROR: no such database "postgres"
HINT: Since no database was specified explicitly, the name of the current user 
was taken as the database name.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Test database for new installs?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Am Donnerstag, 18. November 2004 22:07 schrieb Josh Berkus:
>> a) new users try just to "psql" as postgres, and get a "no such database
>> postgres";

> This "problem" has been recognized before.  I think a possible solution is to
> make psql recognize the error (the error code regime in libpq would have to 
> be extended for that),

Extended how?  The error you're interested in will come back as
ERRCODE_UNDEFINED_DATABASE.
        regards, tom lane


Re: Test database for new installs?

From
Andrew Dunstan
Date:
I can't get too excited about this, to be honest. What I would like to 
see, either in contrib or on pgfoundry, is one or more moderately 
complete and well populated sample databases.

cheers

andrew

Josh Berkus wrote:

>Folks,
>
>Some issues have come up repeatedly on IRC with new users, enough so that they 
>might be worth addressing in the code:
>a) new users try just to "psql" as postgres, and get a "no such database 
>postgres";
>b) new users use template1 as a testing database, and then have to re-initdb 
>to clean it up.
>
>Both of these things could be solved by creating an additional, non-template 
>database called "postgres" at initdb.    For security reasons, this db would 
>be set up in pg_hba.conf as accessable only by postgres via local.   It might 
>not seem like it to experienced programmers, but having a "sandbox" database 
>which lets you get used to PG commands would be a boon to people how are new 
>to both Postgres and SQL databases in general.  
>
>The only reason not to do it is space; each database takes up about 5mb.    
>That's nothing to most users but could be a problem for a few.   Also, it 
>would create a minor burden on the fsm to track an extra set of relations.
>
>  
>


Re: Test database for new installs?

From
Peter Eisentraut
Date:
Am Freitag, 19. November 2004 15:59 schrieb Tom Lane:
> > is to make psql recognize the error (the error code regime in libpq would
> > have to be extended for that),
>
> Extended how?  The error you're interested in will come back as
> ERRCODE_UNDEFINED_DATABASE.

AFAICT, error codes are only accessible through PGresult.  But if the 
connection attempt fails, you have at best a PGconn.  This is the same kind 
of issue we have with frontends parsing the "no password supplied" message, 
because PQconnect cannot supply an error code in this case.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Test database for new installs?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> AFAICT, error codes are only accessible through PGresult.  But if the 
> connection attempt fails, you have at best a PGconn.  This is the same kind 
> of issue we have with frontends parsing the "no password supplied" message, 
> because PQconnect cannot supply an error code in this case.

Oh, right.  Yes, we should do something about that.  I think we could
invent PQerrorField which is to PQerrorMessage as PQresultErrorField
is to PQresultErrorMessage.  The reason I didn't do so during the 7.4
cycle is that errors generated internally within libpq wouldn't have any
field information, at least not without significant hacking on the libpq
sources.
        regards, tom lane


Re: Test database for new installs?

From
Andreas Pflug
Date:
Andrew Dunstan wrote:
> 
> I can't get too excited about this, to be honest. What I would like to 
> see, either in contrib or on pgfoundry, is one or more moderately 
> complete and well populated sample databases.

How about the tpcw database model, filled with some real world data 
(e.g. pgsql books)? Other proposals? I could spend some minutes on that.

Regards,
Andreas


Re: Test database for new installs?

From
Josh Berkus
Date:
Andrew,

> > I can't get too excited about this, to be honest. What I would like to
> > see, either in contrib or on pgfoundry, is one or more moderately
> > complete and well populated sample databases.
>
> How about the tpcw database model, filled with some real world data
> (e.g. pgsql books)? Other proposals? I could spend some minutes on that.

Hmmm ... sounds like an add-in project.    I'm not sure, I think something 
which demonstrates more general principles than the TPC-W database would be 
useful, sort of a "training database".    Maybe one of the writers of PGSQL 
books has such a thing?   Maybe Bruce?

This could be done on pgFoundry as an add-in, but potentially included with 
the Windows install and major distribution RPMs.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: Test database for new installs?

From
Andreas Pflug
Date:
Josh Berkus wrote:
> Andrew,
> 
> 
>>>I can't get too excited about this, to be honest. What I would like to
>>>see, either in contrib or on pgfoundry, is one or more moderately
>>>complete and well populated sample databases.
>>
>>How about the tpcw database model, filled with some real world data
>>(e.g. pgsql books)? Other proposals? I could spend some minutes on that.
> 
> 
> Hmmm ... sounds like an add-in project.    I'm not sure, I think something 
> which demonstrates more general principles than the TPC-W database would be 
> useful, sort of a "training database".   

I didn't mean to supply the optimal training database with tpc-w; at 
least, it's not ill-designed, and uses some FKs. Better than what we 
have now.

> Maybe one of the writers of PGSQL books has such a thing?   Maybe Bruce?

Maybe Michael? Or other OS projects using pgsql? Unfortunately, I only 
know projects (groupware) which use pgsql with ill-designed db schemas.

Regards,
Andreas


Re: Test database for new installs?

From
"Joshua D. Drake"
Date:
> Hmmm ... sounds like an add-in project.    I'm not sure, I think something
> which demonstrates more general principles than the TPC-W database would be
> useful, sort of a "training database".    Maybe one of the writers of PGSQL
> books has such a thing?   Maybe Bruce?

I have the complete books database that we used in Practical PostgreSQL.
I would want to do a little work on it to get it up to snuff (add
comments etc..) but it would be a start.


>
> This could be done on pgFoundry as an add-in, but potentially included with
> the Windows install and major distribution RPMs.
>


--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: Test database for new installs?

From
Andreas Pflug
Date:
Joshua D. Drake wrote:
> 
> I have the complete books database that we used in Practical PostgreSQL. 
> I would want to do a little work on it to get it up to snuff (add 
> comments etc..) but it would be a start.

Is the DDL online somewhere to peek at it?

Regards,
Andreas


Re: Test database for new installs?

From
Robert Treat
Date:
On Friday 19 November 2004 13:54, Joshua D. Drake wrote:
> > Hmmm ... sounds like an add-in project.    I'm not sure, I think
> > something which demonstrates more general principles than the TPC-W
> > database would be useful, sort of a "training database".    Maybe one of
> > the writers of PGSQL books has such a thing?   Maybe Bruce?
>
> I have the complete books database that we used in Practical PostgreSQL.
> I would want to do a little work on it to get it up to snuff (add
> comments etc..) but it would be a start.
>

I might have access to one for another book as well if we need it, although 
what I have always thought we should provide is a copy of the database 
generated from the tutorial section of the PostgreSQL documentation.   

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: Test database for new installs?

From
Andreas Pflug
Date:
Robert Treat wrote:
> 
> I might have access to one for another book as well if we need it, although 
> what I have always thought we should provide is a copy of the database 
> generated from the tutorial section of the PostgreSQL documentation.   

This is the kind of database design I definitely would *not* use to 
demonstrate good db design practice:
- no primary key/indexes
- no foreign key
- implicitely WITH OID
- Usage of varchar for key column

Regards,
Andreas




Re: Test database for new installs?

From
Josh Berkus
Date:
Andreas,

> - Usage of varchar for key column

And?  Varchar is somehow implicitly inferior for keys?

Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves, the 
overuse on INT surrogate keys ...

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: Test database for new installs?

From
Andreas Pflug
Date:
Josh Berkus wrote:
> Andreas,
> 
> 
>>- Usage of varchar for key column
> 
> 
> And?  Varchar is somehow implicitly inferior for keys?
> 
> Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves, the 
> overuse on INT surrogate keys ...

You might be right for small dbs, but a temperature database will likely 
contain millions of rows, if filled in the real world. varchar will 
probably make the table several times bigger than needed.

Additionally, I regretted *every* case where I decided to use some text 
data as key, sooner or later.


Regards,
Andreas


Re: Test database for new installs?

From
Greg Stark
Date:
Josh Berkus <josh@agliodbs.com> writes:

> > - Usage of varchar for key column
> 
> And?  Varchar is somehow implicitly inferior for keys?

Yes, it's larger and larger is slower. It's also failure prone when pushed
through various levels of applications prone to uppercasing or misparsing
whitespace. It's also prone to failure in internationalized applications and
applications dealing with multiple character sets.

> Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves, the 
> overuse on INT surrogate keys ...

Oh yeah, and it's also a sign you're failing to use surrogate keys and using
something meaningful in the real world as your primary key and therefore
vulnerable to major problems when the real world fails your assumptions about
uniqueness or immutability.

-- 
greg



Re: Test database for new installs?

From
"Andrew Dunstan"
Date:
Josh Berkus said:
> Andreas,
>
>> - Usage of varchar for key column
>
> And?  Varchar is somehow implicitly inferior for keys?
>
> Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves,
> the  overuse on INT surrogate keys ...
>

We will probably find that *everthing* each of us does will offend somebody
else. I'm not too keen to get into "best practice" wars. That's what IRC is
for ;-)

I would like to see a number of samples, which demonstrate different design
methodologies/philosophies, so we are offending everyone with equal
opportunity.
cheers

andrew




Re: Test database for new installs?

From
Josh Berkus
Date:
Andreas, Andrew,

> You might be right for small dbs, but a temperature database will likely
> contain millions of rows, if filled in the real world. varchar will
> probably make the table several times bigger than needed.

Yeah, INT keys are useful for performance reasons.   It depends on the table.  
I often find that text keys are often more useful for short lookup lists 
because it allows me to avoid linking in dozens of tables in a star-schema 
OLAP database.  There I usually find that the efficiency gained by the 
smaller size of the INT column is more than offset by the cost and decrease 
in estimate accuracy inherent in adding a bunch of lookup tables to the 
query.

Also, for tables where the text key is required in the table, *adding* an 
additional INT column as a key is no improvement in performance.

> Additionally, I regretted *every* case where I decided to use some text
> data as key, sooner or later.

Well, obviously you and I have had different workloads then.

> We will probably find that *everthing* each of us does will offend somebody
> else. I'm not too keen to get into "best practice" wars. That's what IRC is
> for ;-)

<grin>

> I would like to see a number of samples, which demonstrate different design
> methodologies/philosophies, so we are offending everyone with equal
> opportunity.

Yeah, that would be great.

--Josh

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: Test database for new installs?

From
Robert Treat
Date:
On Monday 22 November 2004 11:07, Andreas Pflug wrote:
> Robert Treat wrote:
> > I might have access to one for another book as well if we need it,
> > although what I have always thought we should provide is a copy of the
> > database generated from the tutorial section of the PostgreSQL
> > documentation.
>
> This is the kind of database design I definitely would *not* use to
> demonstrate good db design practice:
> - no primary key/indexes
> - no foreign key
> - implicitely WITH OID
> - Usage of varchar for key column
>

Didn't know good database design was a pre-requisite.  If it is then yeah, I'd 
agree maybe thats not the best example. OTOH maybe we just need a better 
tutorial :-)

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: Test database for new installs?

From
Greg Stark
Date:
Josh Berkus <josh@agliodbs.com> writes:

> Also, for tables where the text key is required in the table, *adding* an 
> additional INT column as a key is no improvement in performance.

Not true. Every table which references the varchar-keyed table needs to have a
complete copy of the varchar key. Plus the indexes on the table (and often
those referencing tables) are all bigger too. 

> > Additionally, I regretted *every* case where I decided to use some text
> > data as key, sooner or later.
> 
> Well, obviously you and I have had different workloads then.

My experience agrees with his.

A good example was using user provided text usernames as a primary key. The
application guaranteed they would be unique, and they couldn't be changed. Had
those requirements changed things would have gotten very nasty. No, ON UPDATE
CASCADE doesn't solve things when you have a few hundred million records
referencing the table on a 24x7 application. Especially when you have a few
hundred million more archived records on tape and in your data warehouse for
doing statistical analyses.

In fact those requirements never changed. And yet we still ended up regretting
that decision for multiple reasons:

. The varchar field spread throughout the database like a slow rot to tables that referenced users. Some of the largest
tablesin the database ended up 10-30% inflated in size due to that field alone. Their indexes were even more inflated.
 

. Later we had to export data to a third party and receive data back from them. Their mainframe uppercased all the text
weprovided for them in the key. It was also fixed position so any trailing spaces were effectively lost.
 

Because of the latter problem we added an integer field. 

Afterwards we started using that to reference the users table on any new
table. The speed difference on index scans was noticable. Actually this was
Oracle. I don't have empirical tests for postgres to know if it would be the
same. Actually I would expect the difference for Postgres would be even
greater. Postgres stores integers directly in Datums but varchars require a
palloc, and any comparison involves strcoll calls which can be quite slow
compared to an integer == call.

> > We will probably find that *everthing* each of us does will offend somebody
> > else. I'm not too keen to get into "best practice" wars. That's what IRC is
> > for ;-)

Now can we discuss naming conventions for primary keys? :)

-- 
greg