Thread: Sample databases

Sample databases

From
Vraj Mohan
Date:
Is there a good sample database (with decent data volumes) for
postgresql? I am interested in one for learning and automated testing.

I looked at http://pgfoundry.org/projects/dbsamples/ (specifically at
pagila), but it seemed incomplete and not maintained,

I also looked at http://dev.mysql.com/doc/index-other.html. Employee
data (large dataset, includes data and test/verification suite) looks
interesting. Is there anything similar for postgresql?

Is there a dataset used by postgresql development for testing?

If others are interested, I am willing to spend some time in helping
to create one.

--Vraj Mohan


Re: Sample databases

From
Carlos Mennens
Date:
On Wed, Jan 16, 2013 at 9:53 AM, Vraj Mohan <r.vrajmohan@gmail.com> wrote:
> Is there a good sample database (with decent data volumes) for
> postgresql? I am interested in one for learning and automated testing.
>
> I looked at http://pgfoundry.org/projects/dbsamples/ (specifically at
> pagila), but it seemed incomplete and not maintained,

I've spent some months searching a year ago and found the same thing
as you. All the sample databases for PG are terrible. They're old, not
maintained, incomplete, or just useless. I too would love to have a
sample database with large data and relational values. Sadly unless
you create one yourself, I don't think there's one publicly available
that meets our expectations.

--
Carlos Mennens


Re: Sample databases

From
Jeff Janes
Date:
On Wednesday, January 16, 2013, Vraj Mohan wrote:
Is there a good sample database (with decent data volumes) for
postgresql? I am interested in one for learning and automated testing.

What do you mean by decent data volumes?  Numbers and units are wonderful things!

What things are you looking to learn and test automatically?  I like to learn about and test performance, and I find pgbench often quite good for doing that.

If you want to learn and test a framework like Django or Rails or whatever the latest and greatest is, I would think that those frameworks would provide their own sample data for doing that.

 

I looked at http://pgfoundry.org/projects/dbsamples/ (specifically at
pagila), but it seemed incomplete and not maintained,

Is there a specific thing about the completeness that concerns you?

Not being maintained doesn't bother me, unless of course it is actually broken in the first place.  Data is data.  It isn't like programs, which decay over time. 
 
Have you looked at the Dell DVD store?  It is also on that site you linked above, but it seems to be more up to date on Dell's own site.



I also looked at http://dev.mysql.com/doc/index-other.html. Employee
data (large dataset, includes data and test/verification suite) looks
interesting. Is there anything similar for postgresql?

I don't know enough about it to say.  The data itself should probably not be all that hard to port (easy to say for someone who doesn't have to do it...).  The verification suite, on the other hand, might be.  Is that your primary interest, the code and not the data?
 

Is there a dataset used by postgresql development for testing?


There is a regression database used by "make check", but it is designed to test specific features of the database system that people worried might get broken by future work, not to provide a cohesive thing that looks like a web application. 

Cheers,

Jeff

Fwd: Sample databases

From
Vraj Mohan
Date:
---------- Forwarded message ----------
From: Vraj Mohan <r.vrajmohan@gmail.com>
Date: Thu, Jan 17, 2013 at 7:22 AM
Subject: Re: [GENERAL] Sample databases
To: Jeff Janes <jeff.janes@gmail.com>


On Wed, Jan 16, 2013 at 10:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> What do you mean by decent data volumes?  Numbers and units are wonderful
> things!

Sorry, I should have been more specific. I was looking for data with
enough richness to explore corner cases in SQL and volume to be able
to study different execution plans. I am guessing here, but would
shoot for something like:
- columns with NULLs
- columns that are foreign keys with not all the values from the
referenced table for exploring outer joins
- tables with ~500,000 rows
- columns with values such that the indexes are highly selective. I
would assume that having over a 100 different values for a column
should achieve this.

> What things are you looking to learn and test automatically?  I like to
> learn about and test performance, and I find pgbench often quite good for
> doing that.

My goal is for a database that can be used for:
- learning SQL, specifically postgresql SQL; going beyond simple
queries and joins to using subqueries, correlated subqueries, window
functions and so on
- writing about postgresql
- testing applications developed for postgresql; I would think that
many of the pgfoundry projects would benefit from this.

> Is there a specific thing about the completeness that concerns you?

I tried pagila and found that it did not create any rows in the film
table when I used pagila-insert-data.sql, After my original post, I
have discovered that pagila-data.sql (which used COPY) works
correctly.

> Have you looked at the Dell DVD store?  It is also on that site you linked
> above, but it seems to be more up to date on Dell's own site.
>

Thanks, I'll take a look at the Dell DVD store.

I plan to contribute to the http://pgfoundry.org/projects/dbsamples/
project. For starters, I will add the suppliers-parts-projects
database from C J Date's "An Introduction to Database Systems".

--Vraj Mohan


Re: Sample databases

From
Vraj Mohan
Date:
I have created a couple of PostgreSQL sample databases at
https://github.com/vrajmohan/pgsql-sample-data:
1. The supplier-part-project database from C J Date's book.
2. The Employee sample database from the MySQL project. This db
occupies 300MB and and has ~4 million total rows.

Please let me know if you find this useful.

I tried contributing to http://pgfoundry.org/projects/dbsamples/, but
pgFoundry appears to be down. I spoke to the project admin and we
agreed that it would make sense to migrate the project over to Github.

--Vraj Mohan