Thread: Re: [PATCHES] Testing pg_terminate_backend()

Re: [PATCHES] Testing pg_terminate_backend()

From
Bruce Momjian
Date:
Magnus, others, how is the SIGTERM testing going?

---------------------------------------------------------------------------

Bruce Momjian wrote:
> bruce wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > Tom Lane wrote:
> > > >> The closest thing I can think of to an automated test is to run repeated
> > > >> sets of the parallel regression tests, and each time SIGTERM a randomly
> > > >> chosen backend at a randomly chosen time.  Then see if anything "funny"
> > > 
> > > > Yep, that was my plan, plus running the parallel regression tests you
> > > > get the possibility of >2 backends.
> > > 
> > > I was intentionally suggesting only one kill per test cycle.  Multiple
> > > kills will probably create an O(N^2) explosion in the set of possible
> > > downstream-failure deltas.  I doubt you'd really get any improvement
> > > in testing coverage to justify the much larger amount of hand validation
> > > needed.
> > > 
> > > It also strikes me that you could make some simple alterations to the
> > > regression tests to reduce the set of observable downstream deltas.
> > > For example, anyplace where a test loads a table with successive INSERTs
> > > and that table is used by later tests, wrap the INSERT sequence with
> > > BEGIN/END.  Then there is only one possible downstream delta (empty
> > > table) and not N different possibilities for an N-row table.
> > 
> > I have added pg_terminate_backend() to use SIGTERM and will start
> > running tests as discussed with Tom.  I will post my scripts too.
> 
> Attached is my test script.   I ran it for 14 hours (asserts on),
> running 450 regression tests, with up to seven backends killed per
> regression test.
> 
> I have processed the combined regression.diffs files by pickouting out
> all the new error messages.  I don't see anything unusual in there.
> 
> Should I run it differently?
> 
> -- 
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +

> #!/bin/bash
> 
> REGRESSION_DURATION=80    # average duration of regression test in seconds
> OUTFILE=/rtmp/regression.sigterm
> 
> # To analyze output, use:
> # grep '^\+ *[A-Z][A-Z]*:' /rtmp/regression.sigterm | sort | uniq | less
> 
> 
> cd /pg/test/regress
> 
> while :
> do
>     (
>         SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`
>         echo "Sleeping $SLEEP seconds"
>         sleep "$SLEEP"
>         echo "Trying kill"
>         # send up to 7 kill signals
>         for X in 1 2 3 4 5 6 7
>         do
>             psql -p 55432 -qt -c "
>                 SELECT pg_terminate_backend(stat.procpid)
>                 FROM (SELECT procpid FROM pg_stat_activity
>                 ORDER BY random() LIMIT 1) AS stat
>                 " template1 2> /dev/null
>             if [ "$?" -eq 0 ]
>             then    echo "Kill sent"
>             fi
>             sleep 5
>         done
>     ) &
>     gmake check
>     wait
>     [ -s regression.diffs ] && cat regression.diffs >> "$OUTFILE"
> done


> 
> -- 
> Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-patches

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Testing pg_terminate_backend()

From
Magnus Hagander
Date:
It looks pretty good from here. I have an output of about 50 million
lines, and the only FATAL stuff is the "terminating due to admin
command". All other errors look consistent with things like the backend
that creates a table gets killed, so anybody trying to access that
table later will fail with a does not exist error.


//Magnus


Bruce Momjian wrote:
> 
> Magnus, others, how is the SIGTERM testing going?
> 
> ---------------------------------------------------------------------------
> 
> Bruce Momjian wrote:
> > bruce wrote:
> > > Tom Lane wrote:
> > > > Bruce Momjian <bruce@momjian.us> writes:
> > > > > Tom Lane wrote:
> > > > >> The closest thing I can think of to an automated test is to
> > > > >> run repeated sets of the parallel regression tests, and each
> > > > >> time SIGTERM a randomly chosen backend at a randomly chosen
> > > > >> time.  Then see if anything "funny"
> > > > 
> > > > > Yep, that was my plan, plus running the parallel regression
> > > > > tests you get the possibility of >2 backends.
> > > > 
> > > > I was intentionally suggesting only one kill per test cycle.
> > > > Multiple kills will probably create an O(N^2) explosion in the
> > > > set of possible downstream-failure deltas.  I doubt you'd
> > > > really get any improvement in testing coverage to justify the
> > > > much larger amount of hand validation needed.
> > > > 
> > > > It also strikes me that you could make some simple alterations
> > > > to the regression tests to reduce the set of observable
> > > > downstream deltas. For example, anyplace where a test loads a
> > > > table with successive INSERTs and that table is used by later
> > > > tests, wrap the INSERT sequence with BEGIN/END.  Then there is
> > > > only one possible downstream delta (empty table) and not N
> > > > different possibilities for an N-row table.
> > > 
> > > I have added pg_terminate_backend() to use SIGTERM and will start
> > > running tests as discussed with Tom.  I will post my scripts too.
> > 
> > Attached is my test script.   I ran it for 14 hours (asserts on),
> > running 450 regression tests, with up to seven backends killed per
> > regression test.
> > 
> > I have processed the combined regression.diffs files by pickouting
> > out all the new error messages.  I don't see anything unusual in
> > there.
> > 
> > Should I run it differently?
> > 
> > -- 
> >   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
> >   EnterpriseDB                             http://enterprisedb.com
> > 
> >   + If your life is a hard drive, Christ can be your backup. +
> 
> > #!/bin/bash
> > 
> > REGRESSION_DURATION=80    # average duration of regression test
> > in seconds OUTFILE=/rtmp/regression.sigterm
> > 
> > # To analyze output, use:
> > # grep '^\+ *[A-Z][A-Z]*:' /rtmp/regression.sigterm | sort | uniq |
> > less
> > 
> > 
> > cd /pg/test/regress
> > 
> > while :
> > do
> >     (
> >         SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`
> >         echo "Sleeping $SLEEP seconds"
> >         sleep "$SLEEP"
> >         echo "Trying kill"
> >         # send up to 7 kill signals
> >         for X in 1 2 3 4 5 6 7
> >         do
> >             psql -p 55432 -qt -c "
> >                 SELECT
> > pg_terminate_backend(stat.procpid) FROM (SELECT procpid FROM
> > pg_stat_activity ORDER BY random() LIMIT 1) AS stat
> >                 " template1 2> /dev/null
> >             if [ "$?" -eq 0 ]
> >             then    echo "Kill sent"
> >             fi
> >             sleep 5
> >         done
> >     ) &
> >     gmake check
> >     wait
> >     [ -s regression.diffs ] && cat regression.diffs >>
> > "$OUTFILE" done
> 
> 
> > 
> > -- 
> > Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-patches
> 
> -- 
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +



Re: [PATCHES] Testing pg_terminate_backend()

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> It looks pretty good from here. I have an output of about 50 million
> lines, and the only FATAL stuff is the "terminating due to admin
> command". All other errors look consistent with things like the backend
> that creates a table gets killed, so anybody trying to access that
> table later will fail with a does not exist error.

OK, how long does a regression test take to run, and how long did you
run the script?  Then please compute the number of regression runs.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Testing pg_terminate_backend()

From
Magnus Hagander
Date:
Bruce Momjian wrote:
> Magnus Hagander wrote:
> > It looks pretty good from here. I have an output of about 50 million
> > lines, and the only FATAL stuff is the "terminating due to admin
> > command". All other errors look consistent with things like the
> > backend that creates a table gets killed, so anybody trying to
> > access that table later will fail with a does not exist error.
> 
> OK, how long does a regression test take to run, and how long did you
> run the script?  Then please compute the number of regression runs.

Hmm. This looks like somewhere between 10,000 and 20,000 runs.

//Magnus


Re: [PATCHES] Testing pg_terminate_backend()

From
Bruce Momjian
Date:
Can we conclude this has been tested enough for 8.4?

---------------------------------------------------------------------------

Magnus Hagander wrote:
> Bruce Momjian wrote:
> > Magnus Hagander wrote:
> > > It looks pretty good from here. I have an output of about 50 million
> > > lines, and the only FATAL stuff is the "terminating due to admin
> > > command". All other errors look consistent with things like the
> > > backend that creates a table gets killed, so anybody trying to
> > > access that table later will fail with a does not exist error.
> > 
> > OK, how long does a regression test take to run, and how long did you
> > run the script?  Then please compute the number of regression runs.
> 
> Hmm. This looks like somewhere between 10,000 and 20,000 runs.
> 
> //Magnus

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +