Thread: Test database for new installs?
Folks, Some issues have come up repeatedly on IRC with new users, enough so that they might be worth addressing in the code: a) new users try just to "psql" as postgres, and get a "no such database postgres"; b) new users use template1 as a testing database, and then have to re-initdb to clean it up. Both of these things could be solved by creating an additional, non-template database called "postgres" at initdb. For security reasons, this db would be set up in pg_hba.conf as accessable only by postgres via local. It might not seem like it to experienced programmers, but having a "sandbox" database which lets you get used to PG commands would be a boon to people how are new to both Postgres and SQL databases in general. The only reason not to do it is space; each database takes up about 5mb. That's nothing to most users but could be a problem for a few. Also, it would create a minor burden on the fsm to track an extra set of relations. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
> The only reason not to do it is space; each database takes up about 5mb. > That's nothing to most users but could be a problem for a few. Also, it > would create a minor burden on the fsm to track an extra set of relations. Perhaps it could have an initdb flag to turn it off and be easily dropped via drop database? Then it's not such a big deal. As a side note, the database shouldn't be "postgres" but representative of the username they're installing with. pgsql is another popular username. --
Josh Berkus <josh@agliodbs.com> writes: > Some issues have come up repeatedly on IRC with new users, enough so that they > might be worth addressing in the code: > a) new users try just to "psql" as postgres, and get a "no such database > postgres"; > b) new users use template1 as a testing database, and then have to re-initdb > to clean it up. I think this is a documentation thing as much as anything else. We could just suggest that the first move after starting the postmaster becreatedb (they don't even need to give it an argument ... how painless can you get?) regards, tom lane
On Thu, 18 Nov 2004, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > Some issues have come up repeatedly on IRC with new users, enough so that they > > might be worth addressing in the code: > > a) new users try just to "psql" as postgres, and get a "no such database > > postgres"; > > b) new users use template1 as a testing database, and then have to re-initdb > > to clean it up. > > I think this is a documentation thing as much as anything else. We > could just suggest that the first move after starting the postmaster be > createdb > (they don't even need to give it an argument ... how painless can you > get?) Perhaps initdb could spit out a line saying 'to create a test database for the current user, issue /path/to/createdb'. *thinks* I don't think it solves the problem Josh and others are seeing on IRC though. From my experience, at least with reference to (a) above, the user's OS comes with PostgreSQL installed or allows it to be installed in some semi-automated way. Generally, the installation process runs initdb in the background (which is a broken idea in my opinion). An run level init script brings up the server and the user wants to connect. It seems as though, if the distribution wants to make it this simple for a user to get at an SQL console, then they should also create default databases for users. My personal opinion is, however, that the administrator of the machine should be forced to initdb which will force he or her to read at least some of the manual. Gavin
Am Donnerstag, 18. November 2004 22:07 schrieb Josh Berkus: > a) new users try just to "psql" as postgres, and get a "no such database > postgres"; This "problem" has been recognized before. I think a possible solution is to make psql recognize the error (the error code regime in libpq would have to be extended for that), recognize that the user didn't specify a database, and then tell him something like: ERROR: no such database "postgres" HINT: Since no database was specified explicitly, the name of the current user was taken as the database name. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > Am Donnerstag, 18. November 2004 22:07 schrieb Josh Berkus: >> a) new users try just to "psql" as postgres, and get a "no such database >> postgres"; > This "problem" has been recognized before. I think a possible solution is to > make psql recognize the error (the error code regime in libpq would have to > be extended for that), Extended how? The error you're interested in will come back as ERRCODE_UNDEFINED_DATABASE. regards, tom lane
I can't get too excited about this, to be honest. What I would like to see, either in contrib or on pgfoundry, is one or more moderately complete and well populated sample databases. cheers andrew Josh Berkus wrote: >Folks, > >Some issues have come up repeatedly on IRC with new users, enough so that they >might be worth addressing in the code: >a) new users try just to "psql" as postgres, and get a "no such database >postgres"; >b) new users use template1 as a testing database, and then have to re-initdb >to clean it up. > >Both of these things could be solved by creating an additional, non-template >database called "postgres" at initdb. For security reasons, this db would >be set up in pg_hba.conf as accessable only by postgres via local. It might >not seem like it to experienced programmers, but having a "sandbox" database >which lets you get used to PG commands would be a boon to people how are new >to both Postgres and SQL databases in general. > >The only reason not to do it is space; each database takes up about 5mb. >That's nothing to most users but could be a problem for a few. Also, it >would create a minor burden on the fsm to track an extra set of relations. > > >
Am Freitag, 19. November 2004 15:59 schrieb Tom Lane: > > is to make psql recognize the error (the error code regime in libpq would > > have to be extended for that), > > Extended how? The error you're interested in will come back as > ERRCODE_UNDEFINED_DATABASE. AFAICT, error codes are only accessible through PGresult. But if the connection attempt fails, you have at best a PGconn. This is the same kind of issue we have with frontends parsing the "no password supplied" message, because PQconnect cannot supply an error code in this case. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > AFAICT, error codes are only accessible through PGresult. But if the > connection attempt fails, you have at best a PGconn. This is the same kind > of issue we have with frontends parsing the "no password supplied" message, > because PQconnect cannot supply an error code in this case. Oh, right. Yes, we should do something about that. I think we could invent PQerrorField which is to PQerrorMessage as PQresultErrorField is to PQresultErrorMessage. The reason I didn't do so during the 7.4 cycle is that errors generated internally within libpq wouldn't have any field information, at least not without significant hacking on the libpq sources. regards, tom lane
Andrew Dunstan wrote: > > I can't get too excited about this, to be honest. What I would like to > see, either in contrib or on pgfoundry, is one or more moderately > complete and well populated sample databases. How about the tpcw database model, filled with some real world data (e.g. pgsql books)? Other proposals? I could spend some minutes on that. Regards, Andreas
Andrew, > > I can't get too excited about this, to be honest. What I would like to > > see, either in contrib or on pgfoundry, is one or more moderately > > complete and well populated sample databases. > > How about the tpcw database model, filled with some real world data > (e.g. pgsql books)? Other proposals? I could spend some minutes on that. Hmmm ... sounds like an add-in project. I'm not sure, I think something which demonstrates more general principles than the TPC-W database would be useful, sort of a "training database". Maybe one of the writers of PGSQL books has such a thing? Maybe Bruce? This could be done on pgFoundry as an add-in, but potentially included with the Windows install and major distribution RPMs. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > Andrew, > > >>>I can't get too excited about this, to be honest. What I would like to >>>see, either in contrib or on pgfoundry, is one or more moderately >>>complete and well populated sample databases. >> >>How about the tpcw database model, filled with some real world data >>(e.g. pgsql books)? Other proposals? I could spend some minutes on that. > > > Hmmm ... sounds like an add-in project. I'm not sure, I think something > which demonstrates more general principles than the TPC-W database would be > useful, sort of a "training database". I didn't mean to supply the optimal training database with tpc-w; at least, it's not ill-designed, and uses some FKs. Better than what we have now. > Maybe one of the writers of PGSQL books has such a thing? Maybe Bruce? Maybe Michael? Or other OS projects using pgsql? Unfortunately, I only know projects (groupware) which use pgsql with ill-designed db schemas. Regards, Andreas
> Hmmm ... sounds like an add-in project. I'm not sure, I think something > which demonstrates more general principles than the TPC-W database would be > useful, sort of a "training database". Maybe one of the writers of PGSQL > books has such a thing? Maybe Bruce? I have the complete books database that we used in Practical PostgreSQL. I would want to do a little work on it to get it up to snuff (add comments etc..) but it would be a start. > > This could be done on pgFoundry as an add-in, but potentially included with > the Windows install and major distribution RPMs. > -- Command Prompt, Inc., home of PostgreSQL Replication, and plPHP. Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
Joshua D. Drake wrote: > > I have the complete books database that we used in Practical PostgreSQL. > I would want to do a little work on it to get it up to snuff (add > comments etc..) but it would be a start. Is the DDL online somewhere to peek at it? Regards, Andreas
On Friday 19 November 2004 13:54, Joshua D. Drake wrote: > > Hmmm ... sounds like an add-in project. I'm not sure, I think > > something which demonstrates more general principles than the TPC-W > > database would be useful, sort of a "training database". Maybe one of > > the writers of PGSQL books has such a thing? Maybe Bruce? > > I have the complete books database that we used in Practical PostgreSQL. > I would want to do a little work on it to get it up to snuff (add > comments etc..) but it would be a start. > I might have access to one for another book as well if we need it, although what I have always thought we should provide is a copy of the database generated from the tutorial section of the PostgreSQL documentation. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Robert Treat wrote: > > I might have access to one for another book as well if we need it, although > what I have always thought we should provide is a copy of the database > generated from the tutorial section of the PostgreSQL documentation. This is the kind of database design I definitely would *not* use to demonstrate good db design practice: - no primary key/indexes - no foreign key - implicitely WITH OID - Usage of varchar for key column Regards, Andreas
Andreas, > - Usage of varchar for key column And? Varchar is somehow implicitly inferior for keys? Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves, the overuse on INT surrogate keys ... -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > Andreas, > > >>- Usage of varchar for key column > > > And? Varchar is somehow implicitly inferior for keys? > > Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves, the > overuse on INT surrogate keys ... You might be right for small dbs, but a temperature database will likely contain millions of rows, if filled in the real world. varchar will probably make the table several times bigger than needed. Additionally, I regretted *every* case where I decided to use some text data as key, sooner or later. Regards, Andreas
Josh Berkus <josh@agliodbs.com> writes: > > - Usage of varchar for key column > > And? Varchar is somehow implicitly inferior for keys? Yes, it's larger and larger is slower. It's also failure prone when pushed through various levels of applications prone to uppercasing or misparsing whitespace. It's also prone to failure in internationalized applications and applications dealing with multiple character sets. > Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves, the > overuse on INT surrogate keys ... Oh yeah, and it's also a sign you're failing to use surrogate keys and using something meaningful in the real world as your primary key and therefore vulnerable to major problems when the real world fails your assumptions about uniqueness or immutability. -- greg
Josh Berkus said: > Andreas, > >> - Usage of varchar for key column > > And? Varchar is somehow implicitly inferior for keys? > > Watch it ... you're about to hit one of my "abuses of SQL" pet-peeves, > the overuse on INT surrogate keys ... > We will probably find that *everthing* each of us does will offend somebody else. I'm not too keen to get into "best practice" wars. That's what IRC is for ;-) I would like to see a number of samples, which demonstrate different design methodologies/philosophies, so we are offending everyone with equal opportunity. cheers andrew
Andreas, Andrew, > You might be right for small dbs, but a temperature database will likely > contain millions of rows, if filled in the real world. varchar will > probably make the table several times bigger than needed. Yeah, INT keys are useful for performance reasons. It depends on the table. I often find that text keys are often more useful for short lookup lists because it allows me to avoid linking in dozens of tables in a star-schema OLAP database. There I usually find that the efficiency gained by the smaller size of the INT column is more than offset by the cost and decrease in estimate accuracy inherent in adding a bunch of lookup tables to the query. Also, for tables where the text key is required in the table, *adding* an additional INT column as a key is no improvement in performance. > Additionally, I regretted *every* case where I decided to use some text > data as key, sooner or later. Well, obviously you and I have had different workloads then. > We will probably find that *everthing* each of us does will offend somebody > else. I'm not too keen to get into "best practice" wars. That's what IRC is > for ;-) <grin> > I would like to see a number of samples, which demonstrate different design > methodologies/philosophies, so we are offending everyone with equal > opportunity. Yeah, that would be great. --Josh -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Monday 22 November 2004 11:07, Andreas Pflug wrote: > Robert Treat wrote: > > I might have access to one for another book as well if we need it, > > although what I have always thought we should provide is a copy of the > > database generated from the tutorial section of the PostgreSQL > > documentation. > > This is the kind of database design I definitely would *not* use to > demonstrate good db design practice: > - no primary key/indexes > - no foreign key > - implicitely WITH OID > - Usage of varchar for key column > Didn't know good database design was a pre-requisite. If it is then yeah, I'd agree maybe thats not the best example. OTOH maybe we just need a better tutorial :-) -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Josh Berkus <josh@agliodbs.com> writes: > Also, for tables where the text key is required in the table, *adding* an > additional INT column as a key is no improvement in performance. Not true. Every table which references the varchar-keyed table needs to have a complete copy of the varchar key. Plus the indexes on the table (and often those referencing tables) are all bigger too. > > Additionally, I regretted *every* case where I decided to use some text > > data as key, sooner or later. > > Well, obviously you and I have had different workloads then. My experience agrees with his. A good example was using user provided text usernames as a primary key. The application guaranteed they would be unique, and they couldn't be changed. Had those requirements changed things would have gotten very nasty. No, ON UPDATE CASCADE doesn't solve things when you have a few hundred million records referencing the table on a 24x7 application. Especially when you have a few hundred million more archived records on tape and in your data warehouse for doing statistical analyses. In fact those requirements never changed. And yet we still ended up regretting that decision for multiple reasons: . The varchar field spread throughout the database like a slow rot to tables that referenced users. Some of the largest tablesin the database ended up 10-30% inflated in size due to that field alone. Their indexes were even more inflated. . Later we had to export data to a third party and receive data back from them. Their mainframe uppercased all the text weprovided for them in the key. It was also fixed position so any trailing spaces were effectively lost. Because of the latter problem we added an integer field. Afterwards we started using that to reference the users table on any new table. The speed difference on index scans was noticable. Actually this was Oracle. I don't have empirical tests for postgres to know if it would be the same. Actually I would expect the difference for Postgres would be even greater. Postgres stores integers directly in Datums but varchars require a palloc, and any comparison involves strcoll calls which can be quite slow compared to an integer == call. > > We will probably find that *everthing* each of us does will offend somebody > > else. I'm not too keen to get into "best practice" wars. That's what IRC is > > for ;-) Now can we discuss naming conventions for primary keys? :) -- greg