Thread: PostgreSQL DB in prod, test, debug

PostgreSQL DB in prod, test, debug

From
Simon Connah
Date:
Hi,

This project uses Python 3.12, Flask, psycopg3 and PostgreSQL 15.

This is probably a stupid question so I apologies in advance.

I'm building a website using PostgreSQL and since I've just been doing some dev work on it I've just manually played
aroundwith the database if I needed new tables or functions for example but I want to start doing automated testing and
needto import a clean snapshot of the database with no data and then use the automated tests to test if things work
withthe tests. 

What I think is the best way to do this is to do a pg_dump of the database (using the --schema-only flag) and then load
itinto a test only database that gets created at the start of the unit tests and destroyed at the end. The automated
testswill insert, update, delete and select data to test if it all still works. 

My main question is does this sound OK? And if so is there a nice way to automate the dump / restore in Python?

Simon.
Attachment

Re: PostgreSQL DB in prod, test, debug

From
Daniel Gustafsson
Date:
> On 14 Feb 2024, at 10:59, Simon Connah <simon.n.connah@protonmail.com> wrote:

> This is probably a stupid question so I apologies in advance.

There is no such thing.

> What I think is the best way to do this is to do a pg_dump of the database (using the --schema-only flag) and then
loadit into a test only database that gets created at the start of the unit tests and destroyed at the end. The
automatedtests will insert, update, delete and select data to test if it all still works. 

If the source of truth for your schema is the database, then sure.  If the
source of truth is a .sql file in your source code repository then you should
use that.  In essence, create the test database identically to how you create
the production database to ensure that you are testing what you will run in
production.

--
Daniel Gustafsson




Re: PostgreSQL DB in prod, test, debug

From
Erik Wienhold
Date:
On 2024-02-14 10:59 +0100, Simon Connah wrote:
> This project uses Python 3.12, Flask, psycopg3 and PostgreSQL 15.
> 
> This is probably a stupid question so I apologies in advance.
> 
> I'm building a website using PostgreSQL and since I've just been doing
> some dev work on it I've just manually played around with the database
> if I needed new tables or functions for example but I want to start
> doing automated testing and need to import a clean snapshot of the
> database with no data and then use the automated tests to test if
> things work with the tests.
> 
> What I think is the best way to do this is to do a pg_dump of the
> database (using the --schema-only flag)

You create a dump from the prod database each time?  Yikes.

> and then load it into a test only database that gets created at the
> start of the unit tests and destroyed at the end. The automated tests
> will insert, update, delete and select data to test if it all still
> works.
> 
> My main question is does this sound OK? And if so is there a nice way
> to automate the dump / restore in Python?

The database schema should be defined by migration scripts that you also
check into version control with the rest of your application sources.
Some people also prefer a separate repository just for the database
schema, depending on how tightly coupled database and application are,
or if there are other applications relying on the database schema.

I use Sqitch[1] which works well if you want to track an existing
database schema.  Alembic is popular in Python but I don't know how it
works with an existing schema because Alembic migrations are usually not
written in plain SQL.

This is also useful for automated deployment because it allows you to
migrate the database to a specific schema version that is necessary for
your application.

For testing, I have a Bash script that starts a Docker container with
Postgres and then runs sqitch-deploy before running pytest.  That can of
course be adapted to spin up a local Postgres instance instead.  I also
use pgTAP[2] with that to just test the database schema.

You can also use testcontainers[3] to start/stop a Docker container in
pytest and run sqitch with subprocess before running all tests.

[1] https://sqitch.org/
[2] https://pgtap.org/
[3] https://testcontainers-python.readthedocs.io/en/latest/postgres/README.html

-- 
Erik



Re: PostgreSQL DB in prod, test, debug

From
Simon Connah
Date:
On Wednesday, 14 February 2024 at 10:25, Daniel Gustafsson <daniel@yesql.se> wrote:

>

>

> > On 14 Feb 2024, at 10:59, Simon Connah simon.n.connah@protonmail.com wrote:
>

> > This is probably a stupid question so I apologies in advance.
>

>

> There is no such thing.
>

> > What I think is the best way to do this is to do a pg_dump of the database (using the --schema-only flag) and then
loadit into a test only database that gets created at the start of the unit tests and destroyed at the end. The
automatedtests will insert, update, delete and select data to test if it all still works. 
>

>

> If the source of truth for your schema is the database, then sure. If the
> source of truth is a .sql file in your source code repository then you should
> use that. In essence, create the test database identically to how you create
> the production database to ensure that you are testing what you will run in
> production.
>

> --
> Daniel Gustafsson

Thank you. I'm a bit disorganised with things at the moment. I'm kinda developing this site in an ad hoc manner which
isprobably a bad idea so I'm trying to fix up a few mistakes I made early on. 

In terms of the database I've just added new functions as needed which I use instead of direct SQL in my Python code.
MaybeI should sit down with Visio and try and do a diagram before I go any further? 

I'll just write a simple script for backup and restore and call it before each test run but have the schema only dump
comefrom production. At this point it doesn't really matter as the website is so small. 

Simon.
Attachment

Re: PostgreSQL DB in prod, test, debug

From
Simon Connah
Date:
On Wednesday, 14 February 2024 at 10:38, Erik Wienhold <ewie@ewie.name> wrote:

>

>

> On 2024-02-14 10:59 +0100, Simon Connah wrote:
>

> > This project uses Python 3.12, Flask, psycopg3 and PostgreSQL 15.
> >

> > This is probably a stupid question so I apologies in advance.
> >

> > I'm building a website using PostgreSQL and since I've just been doing
> > some dev work on it I've just manually played around with the database
> > if I needed new tables or functions for example but I want to start
> > doing automated testing and need to import a clean snapshot of the
> > database with no data and then use the automated tests to test if
> > things work with the tests.
> >

> > What I think is the best way to do this is to do a pg_dump of the
> > database (using the --schema-only flag)
>

>

> You create a dump from the prod database each time? Yikes.

Sorry. That came out wrong. I don't take a dump each time I run tests but I will generally take a full dump when
workingon something specific to the database. 

>

> > and then load it into a test only database that gets created at the
> > start of the unit tests and destroyed at the end. The automated tests
> > will insert, update, delete and select data to test if it all still
> > works.
> >

> > My main question is does this sound OK? And if so is there a nice way
> > to automate the dump / restore in Python?
>

>

> The database schema should be defined by migration scripts that you also
> check into version control with the rest of your application sources.
> Some people also prefer a separate repository just for the database
> schema, depending on how tightly coupled database and application are,
> or if there are other applications relying on the database schema.
>

> I use Sqitch[1] which works well if you want to track an existing
> database schema. Alembic is popular in Python but I don't know how it
> works with an existing schema because Alembic migrations are usually not
> written in plain SQL.
>

> This is also useful for automated deployment because it allows you to
> migrate the database to a specific schema version that is necessary for
> your application.
>

> For testing, I have a Bash script that starts a Docker container with
> Postgres and then runs sqitch-deploy before running pytest. That can of
> course be adapted to spin up a local Postgres instance instead. I also
> use pgTAP[2] with that to just test the database schema.
>

> You can also use testcontainers[3] to start/stop a Docker container in
> pytest and run sqitch with subprocess before running all tests.
>

> [1] https://sqitch.org/
> [2] https://pgtap.org/
> [3] https://testcontainers-python.readthedocs.io/en/latest/postgres/README.html
>

> --
> Erik

Thank you. Squitch looks like a useful tool. I'll certainly look into all of that.

I've never really used Docker before. Instead I just spin up Vultr instances but I can read about Docker as well.

Simon.
Attachment

Re: PostgreSQL DB in prod, test, debug

From
Adrian Klaver
Date:
On 2/14/24 02:38, Erik Wienhold wrote:
> On 2024-02-14 10:59 +0100, Simon Connah wrote:

> 
> The database schema should be defined by migration scripts that you also
> check into version control with the rest of your application sources.
> Some people also prefer a separate repository just for the database
> schema, depending on how tightly coupled database and application are,
> or if there are other applications relying on the database schema.
> 
> I use Sqitch[1] which works well if you want to track an existing
> database schema.  Alembic is popular in Python but I don't know how it
> works with an existing schema because Alembic migrations are usually not
> written in plain SQL.

Just to be clear you can start from scratch with Sqitch. The only thing 
that needs to be done outside of it is the original CREATE DATABASE 
statement. After that you can build the contents of the database from 
the beginning. At any point you can move forward(deploy) or 
backward(revert). For your use case the important feature is target, 
where each target is a different instance of the database. In your case 
prod, test and debug. This allow you to make changes in test/debug and 
when they are verified good deploy them to prod.



-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: PostgreSQL DB in prod, test, debug

From
"Peter J. Holzer"
Date:
On 2024-02-14 11:25:48 +0100, Daniel Gustafsson wrote:
> > On 14 Feb 2024, at 10:59, Simon Connah <simon.n.connah@protonmail.com> wrote:
>
> > This is probably a stupid question so I apologies in advance.
>
> There is no such thing.
>
> > What I think is the best way to do this is to do a pg_dump of the
> > database (using the --schema-only flag) and then load it into a test
> > only database that gets created at the start of the unit tests and
> > destroyed at the end. The automated tests will insert, update,
> > delete and select data to test if it all still works.
>
> If the source of truth for your schema is the database, then sure.  If the
> source of truth is a .sql file in your source code repository then you should
> use that.

I sort of do both for one of my projects:

I originally created the SQL script by running pg_dump on a manually
constructed test database (I do have code to create the empty schema,
but I had to insert the test data manually). That script went into the
git repo.

The test files all contain a fixture which drops and recreates the test
database using that sql file.

When I add a new test case I try to make do with the existing test data.

When I need additional data for a new test case, I create a new pristine
test database using the sql file, add the new data, and then create a
new sql file using sql_dump which is then committed with the test cases.

Same for migrations: If I need to migrate the schema, I run the
migration on the test database, then dump and commit it.

This project is small enough (86 tests in 10 files) that all test cases
can use the same test data. However, I could easily use different test
data for different tests.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: PostgreSQL DB in prod, test, debug

From
Jay Stanley
Date:

On 2024-02-15 23:51, Peter J. Holzer wrote:

On 2024-02-14 11:25:48 +0100, Daniel Gustafsson wrote:
On 14 Feb 2024, at 10:59, Simon Connah <simon.n.connah@protonmail.com> wrote:

This is probably a stupid question so I apologies in advance.

There is no such thing.

What I think is the best way to do this is to do a pg_dump of the
database (using the --schema-only flag) and then load it into a test
only database that gets created at the start of the unit tests and
destroyed at the end. The automated tests will insert, update,
delete and select data to test if it all still works.

If the source of truth for your schema is the database, then sure.  If the
source of truth is a .sql file in your source code repository then you should
use that.

I sort of do both for one of my projects:

I originally created the SQL script by running pg_dump on a manually
constructed test database (I do have code to create the empty schema,
but I had to insert the test data manually). That script went into the
git repo.

The test files all contain a fixture which drops and recreates the test
database using that sql file.

When I add a new test case I try to make do with the existing test data.

When I need additional data for a new test case, I create a new pristine
test database using the sql file, add the new data, and then create a
new sql file using sql_dump which is then committed with the test cases.

Same for migrations: If I need to migrate the schema, I run the
migration on the test database, then dump and commit it.

This project is small enough (86 tests in 10 files) that all test cases
can use the same test data. However, I could easily use different test
data for different tests.

        hp

I had a very similar issue a few years ago; our large-ish codebase needs a lot of suites tests (about 100) to run daily against specific database conditions -- each test involved specific data in around 50 tables.  At first we had 'source of truth' sql scripts checked into git, and with jenkins would run each to create a database for the test, load it with the exact data needed for that suite of tests, run the tests and record them, then drop the database.

This worked fine for a few tests but became unmanageable as more tests were added and the data volume increased. Instead, I created a 'master' script that creates one database template for each suite of tests by creating a blank database and running those same build scripts in git, then disconnecting from that target database and switching it to become a template.  This was re-done very infrequently -- only when we wanted to use a different testing database, or needed to refresh test content, drop tests or add more tests.  That's right - we have about 100 templates; I've found copying a template is FAR faster than reloading it from SQL.

When each test is run, it creates a new database from the template appropriate for that suite of tests.  When the test suite completes, the new database is dropped.  This sped up our tests by at least 2 orders of magnitude, and it was in a way more reliable because each test gets a binary copy of the test database including exactly how vacuumed each table is, exact state of the indexes, every block in the same place, etc.   Results were definitely more consistent in how long they ran.  Note that while a database is in template mode, it cannot be changed (unless a DBA switches it back to non-template-mode).

This has been in production for a few years, in multiple projects, and hasn't really hit any issues; the one cavaet is that you can't switch a database to 'template mode' if anything's logged into the target.  According to the doc, I don't see an upper limit for the number of template databases but there must be one - I haven't tested it with 1000's of templates.

See doc for 'create database DBNAME template TEMPLATENAME' as well as 'alter database DBNAME set datistemplate = true|false'.

- jay stanley
- https://cyc.com; The Next Generation of Enterprise AI