Re: Thoughts on "Love Your Database" - Mailing list pgsql-general

From Chris Travers
Subject Re: Thoughts on "Love Your Database"
Date
Msg-id CAKt_ZftYDp+FHyKn-QTcgHD6M2KkPQjvUKFyo_LmHRXxKBSb8Q@mail.gmail.com
Whole thread Raw
In response to Re: Thoughts on "Love Your Database"  (Szymon Lipiński <mabewlun@gmail.com>)
List pgsql-general


On Sat, May 21, 2016 at 12:49 PM, Szymon Lipiński <mabewlun@gmail.com> wrote:


Strangely I have never had a problem testing stored procedures.  You have to create a data set for the tests of course and that is the hardest part, but there are some really nice things:

1.  If your test scripts always roll back you can run them on a production database as a troubleshooting step
2.  It is easy to hook things up to a TAP harness (whether using PgTAP or some hand-rolled solution).  I think it would be harder to connect to xunit though.  So use TAP ;-)
3.  I usually create a test results table (in my test case, rolled back after!) which stores the test description and pass status.  That makes it easy to check using other tools.

Usually I set aside a range of things (negative id's for example) for testing purposes.


I had problems, and I'm really interested in making it work for me. I have a couple of questions:
How do you manage versioning of the stored procedures? Especially do you have any problems upgrades?

We reload all stored procedures on every upgrade.  I have seen a lot of projects that do this when the stored procedures are heavy.   Versioning then happens via another program (git mercurial, svn...)

transactional ddl makes this pretty seemless.
 
What about testing logic which is outside the database? Do you use pgtap for testing the schema only, or to test some of the external logic as well?

I actually use a TAP harness with a home-brewed interface for attaching to the stored procedures.  I threw it together in a couple of hours (the translation layer is in Perl).  It isn't perfect but it works generally.  We could use pgTAP and avoid the translation layer but didn't want to add it as a build/test dependency there.  We are considering using pgTAP for schema tests that are not intended to be run on other systems.
 
Do you use logic inside and outside the database at the same time?

Most of the projects I have worked on that have done stored proc testing have been Perl projects and the typical approach there is to use a TAP harness for testing.  The stored procedures are in their own test script.  We have other tests (including integration tests) in other test scripts.  For LedgerSMB our db tests can be run against a live production db as can some of our integration tests, but others (those that rely on actual http queries) cannot be.

But these are all integrated in the test framework so when you do:

make test

your environment variables determine what is tested (and it may include database logic).

I don't know how it would work with xunit but the same basic approach we use could be used elssewhere.

basically, our test adaptor basically runs scripts, each of which:

1.  Loads test adata and creates a schema for storing test results
2.  Run tests, storing the results in the test_results table
3.  Output a report of test results
4.  Rolls back

That output is then parsed, and converted into test output for our test framework.
 
How does this scale to a couple of servers when the load is so huge you need to have e.g. ten physical web servers at front?

If you are asking about scalability testing and load testing, you aren't going to use the same approaches you do for unit tests.  When I have had to do these, I have typically had to write tools to run, for example, a stored procedure with different arguments many times in parallel.  Obviously you don't do this on a production system under load.

For other environments I have never seen one that didn't have staging and testing environments and that;s where you would look initially at some of these.

Now this being said, load affects database query performance in ways that aren't always as one would expect (for example, what is in cache can affect things quite a lot).
 
It seems for me that it would be easier to spread the cpu logic overhead to plenty of servers instead of having just one machine which needs to do all the things. But maybe I'm wrong.

But that is a different question.  If you are dealing with large databases under load (say, for example, an 11TB database which serves a web front-end plus an HPC cluser of many nodes, totalling, say, a commitment to handle up to say 500 cpu cores as client apps running 24/7), performance in your database is something which you will have to pay very close attention to.  Some things you will want to handle outside the database, to be sure.   Some other things you will want to make sure the logic is well contained in the database.

You can't always say pulling the data to the client and processing it there always improves scalability.  Concerns such as effective use of the buffer cache can make it very important to get the logic in the right place.  Additionally some things in PostgreSQL, like deTOASTing data can add a lot of overhead that you might be able to avoid with the right functions.  If you need solid scalability you need people who understand performance costs of various options and can weigh them in a particular context.
 


--
    regards Szymon Lipiński



--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

pgsql-general by date:

Previous
From: Szymon Lipiński
Date:
Subject: Re: Thoughts on "Love Your Database"
Next
From: Tom Lane
Date:
Subject: Re: How to know if SPI or some other API triggered an ERROR.