Re: PostgreSQL DB in prod, test, debug - Mailing list pgsql-general

From Jay Stanley
Subject Re: PostgreSQL DB in prod, test, debug
Date
Msg-id 29aada4e3b72a39db3f50bc299ce9880@cruzio.com
Whole thread Raw
In response to Re: PostgreSQL DB in prod, test, debug  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
List pgsql-general

On 2024-02-15 23:51, Peter J. Holzer wrote:

On 2024-02-14 11:25:48 +0100, Daniel Gustafsson wrote:
On 14 Feb 2024, at 10:59, Simon Connah <simon.n.connah@protonmail.com> wrote:

This is probably a stupid question so I apologies in advance.

There is no such thing.

What I think is the best way to do this is to do a pg_dump of the
database (using the --schema-only flag) and then load it into a test
only database that gets created at the start of the unit tests and
destroyed at the end. The automated tests will insert, update,
delete and select data to test if it all still works.

If the source of truth for your schema is the database, then sure.  If the
source of truth is a .sql file in your source code repository then you should
use that.

I sort of do both for one of my projects:

I originally created the SQL script by running pg_dump on a manually
constructed test database (I do have code to create the empty schema,
but I had to insert the test data manually). That script went into the
git repo.

The test files all contain a fixture which drops and recreates the test
database using that sql file.

When I add a new test case I try to make do with the existing test data.

When I need additional data for a new test case, I create a new pristine
test database using the sql file, add the new data, and then create a
new sql file using sql_dump which is then committed with the test cases.

Same for migrations: If I need to migrate the schema, I run the
migration on the test database, then dump and commit it.

This project is small enough (86 tests in 10 files) that all test cases
can use the same test data. However, I could easily use different test
data for different tests.

        hp

I had a very similar issue a few years ago; our large-ish codebase needs a lot of suites tests (about 100) to run daily against specific database conditions -- each test involved specific data in around 50 tables.  At first we had 'source of truth' sql scripts checked into git, and with jenkins would run each to create a database for the test, load it with the exact data needed for that suite of tests, run the tests and record them, then drop the database.

This worked fine for a few tests but became unmanageable as more tests were added and the data volume increased. Instead, I created a 'master' script that creates one database template for each suite of tests by creating a blank database and running those same build scripts in git, then disconnecting from that target database and switching it to become a template.  This was re-done very infrequently -- only when we wanted to use a different testing database, or needed to refresh test content, drop tests or add more tests.  That's right - we have about 100 templates; I've found copying a template is FAR faster than reloading it from SQL.

When each test is run, it creates a new database from the template appropriate for that suite of tests.  When the test suite completes, the new database is dropped.  This sped up our tests by at least 2 orders of magnitude, and it was in a way more reliable because each test gets a binary copy of the test database including exactly how vacuumed each table is, exact state of the indexes, every block in the same place, etc.   Results were definitely more consistent in how long they ran.  Note that while a database is in template mode, it cannot be changed (unless a DBA switches it back to non-template-mode).

This has been in production for a few years, in multiple projects, and hasn't really hit any issues; the one cavaet is that you can't switch a database to 'template mode' if anything's logged into the target.  According to the doc, I don't see an upper limit for the number of template databases but there must be one - I haven't tested it with 1000's of templates.

See doc for 'create database DBNAME template TEMPLATENAME' as well as 'alter database DBNAME set datistemplate = true|false'.

- jay stanley
- https://cyc.com; The Next Generation of Enterprise AI

 

pgsql-general by date:

Previous
From: "Peter J. Holzer"
Date:
Subject: Re: Using a Conversion Table
Next
From: "Peter J. Holzer"
Date:
Subject: Re: How to do faster DML