Re: SQL workflow for crash testing correctness - Mailing list pgsql-admin

From Jeff Janes
Subject Re: SQL workflow for crash testing correctness
Date
Msg-id CAMkU=1yQ5WbsO2EZB3yyOi-G2cXCVe9xG31=JX42Gbm0=LnEyA@mail.gmail.com
Whole thread Raw
In response to SQL workflow for crash testing correctness  (Joseph Hammerman <jhammerman@squarespace.com>)
Responses Re: SQL workflow for crash testing correctness
List pgsql-admin
On Tue, Sep 17, 2019 at 9:27 PM Joseph Hammerman <jhammerman@squarespace.com> wrote:
Good evening PGSQL admin email distribution list,

I have built an HA cluster setup. I would like to instrument a workflow to test for lost or duplicated writes.

Does anyone know of prior art that does this?


I have a testing framework which injects faults under high load, and then tests to see that automatic recovery happens correctly.  I have used it to find several bugs, but haven't turned up any in the last couple releases (likely because improved regression tests are now catching them before I get a chance to).  I've always just tested this as crash recovery within a single instance, but I think there is no reason the technique couldn't be used for multiple instances is well.  You can search for my name and "count.pl" on the hackers list to find multiple example of the testing harness.  The nature of the fault injected (torn page writes) is just a function of what I was working on at the time I wrote it, most of the bugs uncovered had nothing to do with the exact thing which caused the crash.

 
Does anyone have thoughts on how to model this? My initial thoughts were to find the serialization tests in the Postgres project core.

Looking at the core regression tests may also be a good idea.  Of course then you would have to ponder, if you test the same way as they do, will you find different bugs from what they find?  So I would view it more as inspiration than as instructions.
 
Cheers,

Jeff

pgsql-admin by date:

Previous
From: Joseph Hammerman
Date:
Subject: Re: SQL workflow for crash testing correctness
Next
From: Joseph Hammerman
Date:
Subject: Re: SQL workflow for crash testing correctness