Thread: speeding up table creation
I have an interesting performance improvement need. As part of the automatic test suite we run in our development environment, we re-initialize our test database a number of times in order to ensure it is clean before running a test. We currently do this by dropping the public schema and then recreating our tables (roughly 30 tables total). After that we do normal inserts, etc, but never with very much data. My question is, what settings can we tweak to improve performance is this scenario? Specifically, if there was a way to tell Postgres to keep all operations in memory, that would probably be ideal.
We actually tried running Postgres off of a RAM disk and this did help a reasonable amount, but we’re running under Windows and setting up the RAM disk is a hassle and all of our developers would need to do it.
Any tips would be appreciated.
--Rainer
Rainer Mager wrote: > > I have an interesting performance improvement need. As part of the > automatic test suite we run in our development environment, we > re-initialize our test database a number of times in order to ensure > it is clean before running a test. We currently do this by dropping > the public schema and then recreating our tables (roughly 30 tables > total). After that we do normal inserts, etc, but never with very much > data. My question is, what settings can we tweak to improve > performance is this scenario? Specifically, if there was a way to tell > Postgres to keep all operations in memory, that would probably be ideal. > What is the test part? In other words, do you start with a known initial database with all empty tables then run the tests or is part of the test itself the creation of those tables? How much data is in the initial database if the tables aren't empty. Creating 30 empty tables should take a trivial amount of time. Also, are there other schemas than public? A couple ideas/comments: You cannot keep the data in memory (that is, you can't disable writing to the disk). But since you don't care about data loss, you could turn off fsync in postgresql.conf. From a test perspective you should be fine - it will only be an issue in the event of a crash and then you can just restart with a fresh load. Remember, however, that any performance benchmarks won't translate to production use (of course they don't translate if you are using ramdisk anyway). Note that the system tables are updated whenever you add/delete/modify tables. Make sure they are being vacuumed or your performance will slowly degrade. My approach is to create a database exactly as you want it to be at the start of your tests (fully vacuumed and all) and then use it as a template to be used to create the testdb each time. Then you can just (with appropriate connection options) run "dropdb thetestdb" followed by "createdb --template thetestdbtemplate thetestdb" which is substantially faster than deleting and recreating tables - especially if they contain much data. Cheers, Steve
On Tue, Oct 14, 2008 at 5:08 PM, Rainer Mager <rainer@vanten.com> wrote: > I have an interesting performance improvement need. As part of the automatic > test suite we run in our development environment, we re-initialize our test > database a number of times in order to ensure it is clean before running a > test. We currently do this by dropping the public schema and then recreating > our tables (roughly 30 tables total). After that we do normal inserts, etc, > but never with very much data. My question is, what settings can we tweak to > improve performance is this scenario? Specifically, if there was a way to > tell Postgres to keep all operations in memory, that would probably be > ideal. I'm not sure we've identified the problem just yet. Do you have the autovacuum daemon enabled? Are your system catalogs bloated with dead tuples? Could you approach this by using a template database that was pre-setup for you and you could just "create database test with template test_template" or something like that? Also, if you're recreating a database you might need to analyze it first before you start firing queries. PostgreSQL buffers / caches what it can in shared_buffers. The OS also caches data in kernel cache. Having more memory makes it much faster. But if it writes, it needs to write. If the database server where this is happening doesn't have any important data in it, you might see a boost from disabling fsync, but keep in mind a server that loses power / crashes can corrupt all your databases in the db server. If you can fix the problem with any of those suggestions you might need to throw hardware at the problem.
Steve Crawford wrote: > You cannot keep the data in memory (that is, you can't disable writing > to the disk). But since you don't care about data loss, you could turn > off fsync in postgresql.conf. From a test perspective you should be fine > - it will only be an issue in the event of a crash and then you can just > restart with a fresh load. Remember, however, that any performance > benchmarks won't translate to production use (of course they don't > translate if you are using ramdisk anyway). Another thing that may really help, if you're not already doing it, is to do all your schema creation inside a single transaction - at least assuming you can't use the template database approach. -- Craig Ringer