Thread: Sample databases
Is there a good sample database (with decent data volumes) for postgresql? I am interested in one for learning and automated testing. I looked at http://pgfoundry.org/projects/dbsamples/ (specifically at pagila), but it seemed incomplete and not maintained, I also looked at http://dev.mysql.com/doc/index-other.html. Employee data (large dataset, includes data and test/verification suite) looks interesting. Is there anything similar for postgresql? Is there a dataset used by postgresql development for testing? If others are interested, I am willing to spend some time in helping to create one. --Vraj Mohan
On Wed, Jan 16, 2013 at 9:53 AM, Vraj Mohan <r.vrajmohan@gmail.com> wrote: > Is there a good sample database (with decent data volumes) for > postgresql? I am interested in one for learning and automated testing. > > I looked at http://pgfoundry.org/projects/dbsamples/ (specifically at > pagila), but it seemed incomplete and not maintained, I've spent some months searching a year ago and found the same thing as you. All the sample databases for PG are terrible. They're old, not maintained, incomplete, or just useless. I too would love to have a sample database with large data and relational values. Sadly unless you create one yourself, I don't think there's one publicly available that meets our expectations. -- Carlos Mennens
On Wednesday, January 16, 2013, Vraj Mohan wrote:
Is there a good sample database (with decent data volumes) for
postgresql? I am interested in one for learning and automated testing.
What do you mean by decent data volumes? Numbers and units are wonderful things!
What things are you looking to learn and test automatically? I like to learn about and test performance, and I find pgbench often quite good for doing that.
If you want to learn and test a framework like Django or Rails or whatever the latest and greatest is, I would think that those frameworks would provide their own sample data for doing that.
I looked at http://pgfoundry.org/projects/dbsamples/ (specifically at
pagila), but it seemed incomplete and not maintained,
Is there a specific thing about the completeness that concerns you?
Not being maintained doesn't bother me, unless of course it is actually broken in the first place. Data is data. It isn't like programs, which decay over time.
Have you looked at the Dell DVD store? It is also on that site you linked above, but it seems to be more up to date on Dell's own site.
I also looked at http://dev.mysql.com/doc/index-other.html. Employee
data (large dataset, includes data and test/verification suite) looks
interesting. Is there anything similar for postgresql?
I don't know enough about it to say. The data itself should probably not be all that hard to port (easy to say for someone who doesn't have to do it...). The verification suite, on the other hand, might be. Is that your primary interest, the code and not the data?
Is there a dataset used by postgresql development for testing?
There is a regression database used by "make check", but it is designed to test specific features of the database system that people worried might get broken by future work, not to provide a cohesive thing that looks like a web application.
Cheers,
Jeff
---------- Forwarded message ---------- From: Vraj Mohan <r.vrajmohan@gmail.com> Date: Thu, Jan 17, 2013 at 7:22 AM Subject: Re: [GENERAL] Sample databases To: Jeff Janes <jeff.janes@gmail.com> On Wed, Jan 16, 2013 at 10:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > What do you mean by decent data volumes? Numbers and units are wonderful > things! Sorry, I should have been more specific. I was looking for data with enough richness to explore corner cases in SQL and volume to be able to study different execution plans. I am guessing here, but would shoot for something like: - columns with NULLs - columns that are foreign keys with not all the values from the referenced table for exploring outer joins - tables with ~500,000 rows - columns with values such that the indexes are highly selective. I would assume that having over a 100 different values for a column should achieve this. > What things are you looking to learn and test automatically? I like to > learn about and test performance, and I find pgbench often quite good for > doing that. My goal is for a database that can be used for: - learning SQL, specifically postgresql SQL; going beyond simple queries and joins to using subqueries, correlated subqueries, window functions and so on - writing about postgresql - testing applications developed for postgresql; I would think that many of the pgfoundry projects would benefit from this. > Is there a specific thing about the completeness that concerns you? I tried pagila and found that it did not create any rows in the film table when I used pagila-insert-data.sql, After my original post, I have discovered that pagila-data.sql (which used COPY) works correctly. > Have you looked at the Dell DVD store? It is also on that site you linked > above, but it seems to be more up to date on Dell's own site. > Thanks, I'll take a look at the Dell DVD store. I plan to contribute to the http://pgfoundry.org/projects/dbsamples/ project. For starters, I will add the suppliers-parts-projects database from C J Date's "An Introduction to Database Systems". --Vraj Mohan
I have created a couple of PostgreSQL sample databases at https://github.com/vrajmohan/pgsql-sample-data: 1. The supplier-part-project database from C J Date's book. 2. The Employee sample database from the MySQL project. This db occupies 300MB and and has ~4 million total rows. Please let me know if you find this useful. I tried contributing to http://pgfoundry.org/projects/dbsamples/, but pgFoundry appears to be down. I spoke to the project admin and we agreed that it would make sense to migrate the project over to Github. --Vraj Mohan