Thread: Testing pg_terminate_backend()

Testing pg_terminate_backend()

From
Bruce Momjian
Date:
bruce wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > Tom Lane wrote:
> > >> The closest thing I can think of to an automated test is to run repeated
> > >> sets of the parallel regression tests, and each time SIGTERM a randomly
> > >> chosen backend at a randomly chosen time.  Then see if anything "funny"
> >
> > > Yep, that was my plan, plus running the parallel regression tests you
> > > get the possibility of >2 backends.
> >
> > I was intentionally suggesting only one kill per test cycle.  Multiple
> > kills will probably create an O(N^2) explosion in the set of possible
> > downstream-failure deltas.  I doubt you'd really get any improvement
> > in testing coverage to justify the much larger amount of hand validation
> > needed.
> >
> > It also strikes me that you could make some simple alterations to the
> > regression tests to reduce the set of observable downstream deltas.
> > For example, anyplace where a test loads a table with successive INSERTs
> > and that table is used by later tests, wrap the INSERT sequence with
> > BEGIN/END.  Then there is only one possible downstream delta (empty
> > table) and not N different possibilities for an N-row table.
>
> I have added pg_terminate_backend() to use SIGTERM and will start
> running tests as discussed with Tom.  I will post my scripts too.

Attached is my test script.   I ran it for 14 hours (asserts on),
running 450 regression tests, with up to seven backends killed per
regression test.

I have processed the combined regression.diffs files by pickouting out
all the new error messages.  I don't see anything unusual in there.

Should I run it differently?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
#!/bin/bash

REGRESSION_DURATION=80    # average duration of regression test in seconds
OUTFILE=/rtmp/regression.sigterm

# To analyze output, use:
# grep '^\+ *[A-Z][A-Z]*:' /rtmp/regression.sigterm | sort | uniq | less


cd /pg/test/regress

while :
do
    (
        SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`
        echo "Sleeping $SLEEP seconds"
        sleep "$SLEEP"
        echo "Trying kill"
        # send up to 7 kill signals
        for X in 1 2 3 4 5 6 7
        do
            psql -p 55432 -qt -c "
                SELECT pg_terminate_backend(stat.procpid)
                FROM (SELECT procpid FROM pg_stat_activity
                ORDER BY random() LIMIT 1) AS stat
                " template1 2> /dev/null
            if [ "$?" -eq 0 ]
            then    echo "Kill sent"
            fi
            sleep 5
        done
    ) &
    gmake check
    wait
    [ -s regression.diffs ] && cat regression.diffs >> "$OUTFILE"
done
+ CONTEXT:  COPY bool_test, line 1: "TRUE    null    FALSE    null"
+ CONTEXT:  COPY create_table_test, line 1: "5    10"
+ CONTEXT:  COPY x, line 0: ""
+ CONTEXT:  COPY x, line 1: "1    test_1"
+ CONTEXT:  COPY x, line 1: "4000:\X:C:\X:\X"
+ CONTEXT:  SQL function "declares_cursor"
+ CONTEXT:  SQL function "max_xacttest"
+ ERROR:  INSERT has more expressions than target columns
+ ERROR:  cannot drop table test1 because other objects depend on it
+ ERROR:  column "a" of relation "xacttest" does not exist
+ ERROR:  column "f1" does not exist
+ ERROR:  column "fooid" does not exist
+ ERROR:  current transaction is aborted, commands ignored until end of transaction block
+ ERROR:  cursor "foo25" does not exist
+ ERROR:  function getfoo(integer) does not exist
+ ERROR:  index "onek_nulltest" does not exist
+ ERROR:  index "wowidx" does not exist
+ ERROR:  no such savepoint
+ ERROR:  prepared statement "q5" does not exist
+ ERROR:  relation "a_star" does not exist
+ ERROR:  relation "aggtest" does not exist
+ ERROR:  relation "array_index_op_test" does not exist
+ ERROR:  relation "array_op_test" does not exist
+ ERROR:  relation "b_star" does not exist
+ ERROR:  relation "bt_f8_heap" does not exist
+ ERROR:  relation "bt_i4_heap" does not exist
+ ERROR:  relation "bt_name_heap" does not exist
+ ERROR:  relation "bt_txt_heap" does not exist
+ ERROR:  relation "c_star" does not exist
+ ERROR:  relation "d_star" does not exist
+ ERROR:  relation "e_star" does not exist
+ ERROR:  relation "emp" does not exist
+ ERROR:  relation "equipment_r" does not exist
+ ERROR:  relation "f_star" does not exist
+ ERROR:  relation "fast_emp4000" does not exist
+ ERROR:  relation "foo" already exists
+ ERROR:  relation "gcircle_tbl" does not exist
+ ERROR:  relation "gpolygon_tbl" does not exist
+ ERROR:  relation "hash_f8_heap" does not exist
+ ERROR:  relation "hash_i4_heap" does not exist
+ ERROR:  relation "hash_name_heap" does not exist
+ ERROR:  relation "hash_txt_heap" does not exist
+ ERROR:  relation "hobbies_r" does not exist
+ ERROR:  relation "ihighway" does not exist
+ ERROR:  relation "int2_tbl" does not exist
+ ERROR:  relation "int8_tbl" does not exist
+ ERROR:  relation "onek" does not exist
+ ERROR:  relation "onek2" does not exist
+ ERROR:  relation "onek_unique1" does not exist
+ ERROR:  relation "onek_with_null" does not exist
+ ERROR:  relation "pcachetest" does not exist
+ ERROR:  relation "pcacheview" does not exist
+ ERROR:  relation "person" does not exist
+ ERROR:  relation "pg_toast_stud_emp" does not exist
+ ERROR:  relation "polygon_tbl" does not exist
+ ERROR:  relation "ramp" does not exist
+ ERROR:  relation "random_tbl" does not exist
+ ERROR:  relation "real_city" does not exist
+ ERROR:  relation "road" does not exist
+ ERROR:  relation "shighway" does not exist
+ ERROR:  relation "six" does not exist
+ ERROR:  relation "slow_emp4000" does not exist
+ ERROR:  relation "stud_emp" does not exist
+ ERROR:  relation "student" does not exist
+ ERROR:  relation "tenk1" does not exist
+ ERROR:  relation "tenk2" does not exist
+ ERROR:  relation "test1" already exists
+ ERROR:  relation "test2" already exists
+ ERROR:  relation "test_tsvector" does not exist
+ ERROR:  relation "tmp" already exists
+ ERROR:  relation "tmp" does not exist
+ ERROR:  relation "tmp_onek_unique1" does not exist
+ ERROR:  relation "tmp_view" does not exist
+ ERROR:  relation "toyemp" does not exist
+ ERROR:  relation "xacttest" does not exist
+ ERROR:  rule "foorule" for relation "foo" does not exist
+ ERROR:  table "onek_with_null" does not exist
+ ERROR:  table "pcachetest" does not exist
+ ERROR:  table "tmp" does not exist
+ ERROR:  table "tmp1" does not exist
+ ERROR:  type "city_budget" is only a shell
+ ERROR:  type "hobbies_r" does not exist
+ ERROR:  type "widget" is only a shell
+ ERROR:  type emp does not exist
+ ERROR:  type hobbies_r does not exist
+ ERROR:  type person does not exist
+ ERROR:  view "tmp_view_new" does not exist
+ ERROR:  view "vw_getfoo" does not exist
+ FATAL:  terminating connection due to administrator command
+ HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
+ HINT:  Use DROP ... CASCADE to drop the dependent objects too.
+ NOTICE:  argument type widget is only a shell
+ NOTICE:  rule _RETURN on view v_test1 depends on table test1
+ NOTICE:  view v_test1 depends on rule _RETURN on view v_test1

Re: Testing pg_terminate_backend()

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Attached is my test script.   I ran it for 14 hours (asserts on),
> running 450 regression tests, with up to seven backends killed per
> regression test.

Hmm, there are something on the order of 10000 SQL commands in our
regression tests, so even assuming perfect randomness you've exercised
SIGTERM on maybe 10% of them --- and of course there's multiple places
in a complex DDL command where SIGTERM might conceivably be a problem.

Who was volunteering to run this 24x7 for awhile?

>         SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`

Uh, where's the randomness coming from?

            regards, tom lane

Re: Testing pg_terminate_backend()

From
Magnus Hagander
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Attached is my test script.   I ran it for 14 hours (asserts on),
> > running 450 regression tests, with up to seven backends killed per
> > regression test.
>
> Hmm, there are something on the order of 10000 SQL commands in our
> regression tests, so even assuming perfect randomness you've exercised
> SIGTERM on maybe 10% of them --- and of course there's multiple places
> in a complex DDL command where SIGTERM might conceivably be a problem.
>
> Who was volunteering to run this 24x7 for awhile?

That was me. As long as the script runs properly on linux, I can get
that started as soon as I'm fed instructions on how to do it :-) Do I
just fix the paths and set it running, or do I need to prepare
something else?


> >         SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`
>
> Uh, where's the randomness coming from?

... but I should probably wait until that one is answered or fixed, I
guess :-)

//Magnus

Re: Testing pg_terminate_backend()

From
Alvaro Herrera
Date:
Magnus Hagander wrote:
> Tom Lane wrote:

> > >         SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`
> >
> > Uh, where's the randomness coming from?
>
> ... but I should probably wait until that one is answered or fixed, I
> guess :-)

bash.

       RANDOM Each time this parameter is referenced, a random integer between
              0 and 32767 is generated.  The sequence of random numbers may be
              initialized by assigning a value to RANDOM.  If RANDOM is unset,
              it loses its special properties,  even  if  it  is  subsequently
              reset.


--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Testing pg_terminate_backend()

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Attached is my test script.   I ran it for 14 hours (asserts on),
> > running 450 regression tests, with up to seven backends killed per
> > regression test.
>
> Hmm, there are something on the order of 10000 SQL commands in our
> regression tests, so even assuming perfect randomness you've exercised
> SIGTERM on maybe 10% of them --- and of course there's multiple places
> in a complex DDL command where SIGTERM might conceivably be a problem.
>
> Who was volunteering to run this 24x7 for awhile?

Yes, that is what it needs.

> >         SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`
>
> Uh, where's the randomness coming from?

In bash $RANDOM returns a random number from 0-32k every time;
#!/bin/bash is specified in the top line.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Testing pg_terminate_backend()

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > Attached is my test script.   I ran it for 14 hours (asserts on),
> > > running 450 regression tests, with up to seven backends killed per
> > > regression test.
> >
> > Hmm, there are something on the order of 10000 SQL commands in our
> > regression tests, so even assuming perfect randomness you've exercised
> > SIGTERM on maybe 10% of them --- and of course there's multiple places
> > in a complex DDL command where SIGTERM might conceivably be a problem.
> >
> > Who was volunteering to run this 24x7 for awhile?
>
> That was me. As long as the script runs properly on linux, I can get
> that started as soon as I'm fed instructions on how to do it :-) Do I
> just fix the paths and set it running, or do I need to prepare
> something else?

Nothing special to prepare.  Compile with asserts enabled, and run the
script.  The comment at the top explains how to analyze the log for
interesting error messages.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Testing pg_terminate_backend()

From
Bruce Momjian
Date:
bruce wrote:
> Magnus Hagander wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > Attached is my test script.   I ran it for 14 hours (asserts on),
> > > > running 450 regression tests, with up to seven backends killed per
> > > > regression test.
> > >
> > > Hmm, there are something on the order of 10000 SQL commands in our
> > > regression tests, so even assuming perfect randomness you've exercised
> > > SIGTERM on maybe 10% of them --- and of course there's multiple places
> > > in a complex DDL command where SIGTERM might conceivably be a problem.
> > >
> > > Who was volunteering to run this 24x7 for awhile?
> >
> > That was me. As long as the script runs properly on linux, I can get
> > that started as soon as I'm fed instructions on how to do it :-) Do I
> > just fix the paths and set it running, or do I need to prepare
> > something else?
>
> Nothing special to prepare.  Compile with asserts enabled, and run the
> script.  The comment at the top explains how to analyze the log for
> interesting error messages.

Oh, you need to set a variable in the script indicating the average
number of seconds it takes to run the regression test on your system.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Testing pg_terminate_backend()

From
Bruce Momjian
Date:
bruce wrote:
> bruce wrote:
> > Magnus Hagander wrote:
> > > Tom Lane wrote:
> > > > Bruce Momjian <bruce@momjian.us> writes:
> > > > > Attached is my test script.   I ran it for 14 hours (asserts on),
> > > > > running 450 regression tests, with up to seven backends killed per
> > > > > regression test.
> > > >
> > > > Hmm, there are something on the order of 10000 SQL commands in our
> > > > regression tests, so even assuming perfect randomness you've exercised
> > > > SIGTERM on maybe 10% of them --- and of course there's multiple places
> > > > in a complex DDL command where SIGTERM might conceivably be a problem.
> > > >
> > > > Who was volunteering to run this 24x7 for awhile?
> > >
> > > That was me. As long as the script runs properly on linux, I can get
> > > that started as soon as I'm fed instructions on how to do it :-) Do I
> > > just fix the paths and set it running, or do I need to prepare
> > > something else?
> >
> > Nothing special to prepare.  Compile with asserts enabled, and run the
> > script.  The comment at the top explains how to analyze the log for
> > interesting error messages.
>
> Oh, you need to set a variable in the script indicating the average
> number of seconds it takes to run the regression test on your system.

And you have to set the location of the output file and where your
regression test directory is located.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Testing pg_terminate_backend()

From
Magnus Hagander
Date:
Bruce Momjian wrote:
> bruce wrote:
> > Magnus Hagander wrote:
> > > Tom Lane wrote:
> > > > Bruce Momjian <bruce@momjian.us> writes:
> > > > > Attached is my test script.   I ran it for 14 hours (asserts
> > > > > on), running 450 regression tests, with up to seven backends
> > > > > killed per regression test.
> > > >
> > > > Hmm, there are something on the order of 10000 SQL commands in
> > > > our regression tests, so even assuming perfect randomness
> > > > you've exercised SIGTERM on maybe 10% of them --- and of course
> > > > there's multiple places in a complex DDL command where SIGTERM
> > > > might conceivably be a problem.
> > > >
> > > > Who was volunteering to run this 24x7 for awhile?
> > >
> > > That was me. As long as the script runs properly on linux, I can
> > > get that started as soon as I'm fed instructions on how to do
> > > it :-) Do I just fix the paths and set it running, or do I need
> > > to prepare something else?
> >
> > Nothing special to prepare.  Compile with asserts enabled, and run
> > the script.  The comment at the top explains how to analyze the log
> > for interesting error messages.
>
> Oh, you need to set a variable in the script indicating the average
> number of seconds it takes to run the regression test on your system.

Done that. Also, I needed to replace "gmake" with "make", since I'm on
linux...

Anyway. It's been running for about 12 hours now, and I have *nothing*
in the output file. That tells me that the script doesn't appear to be
working - correct? It should output *something* there, right? (It's
obviously running, because I've got about 400,000 lines in nohup.out..)

Argh. So here I am looking at it now for details, and it seems the
script should be run from src/test/regress, and I ran it from the root
directory.. Oops. Also, I needed to place psql in the path - it failed
to find it, but hid the error message.

Just a hint if someone else is running it ;-)

//Magnus

Re: Testing pg_terminate_backend()

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> Done that. Also, I needed to replace "gmake" with "make", since I'm on
> linux...
>
> Anyway. It's been running for about 12 hours now, and I have *nothing*
> in the output file. That tells me that the script doesn't appear to be
> working - correct? It should output *something* there, right? (It's
> obviously running, because I've got about 400,000 lines in nohup.out..)
>
> Argh. So here I am looking at it now for details, and it seems the
> script should be run from src/test/regress, and I ran it from the root
> directory.. Oops. Also, I needed to place psql in the path - it failed
> to find it, but hid the error message.
>
> Just a hint if someone else is running it ;-)

Yea, basically you need to rewrite my script.  :-(

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Testing pg_terminate_backend()

From
Magnus Hagander
Date:
Bruce Momjian wrote:
> Magnus Hagander wrote:
> > Done that. Also, I needed to replace "gmake" with "make", since I'm
> > on linux...
> >
> > Anyway. It's been running for about 12 hours now, and I have
> > *nothing* in the output file. That tells me that the script doesn't
> > appear to be working - correct? It should output *something* there,
> > right? (It's obviously running, because I've got about 400,000
> > lines in nohup.out..)
> >
> > Argh. So here I am looking at it now for details, and it seems the
> > script should be run from src/test/regress, and I ran it from the
> > root directory.. Oops. Also, I needed to place psql in the path -
> > it failed to find it, but hid the error message.
> >
> > Just a hint if someone else is running it ;-)
>
> Yea, basically you need to rewrite my script.  :-(

Not really, but it did need a couple of adjustments :-)

It's been running fine now for a number of hours, with output that
looks similar to the stuff you posted. I'll leave it running..

//Magnus

Re: Testing pg_terminate_backend()

From
Alvaro Herrera
Date:
Magnus Hagander wrote:

> It's been running fine now for a number of hours, with output that
> looks similar to the stuff you posted. I'll leave it running..

Perhaps it would be a good idea to leave it running on code with some
bugs on it, just to check if the problems show up.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Testing pg_terminate_backend()

From
"Guillaume Smet"
Date:
On Mon, Apr 21, 2008 at 7:25 PM, Magnus Hagander <magnus@hagander.net> wrote:
>  Not really, but it did need a couple of adjustments :-)
>
>  It's been running fine now for a number of hours, with output that
>  looks similar to the stuff you posted. I'll leave it running..

If you can come up with an easily installable tarball, I can dedicate
1 or 2 boxes to run it 24/7.

--
Guillaume

Re: Testing pg_terminate_backend()

From
Bruce Momjian
Date:
Guillaume Smet wrote:
> On Mon, Apr 21, 2008 at 7:25 PM, Magnus Hagander <magnus@hagander.net> wrote:
> >  Not really, but it did need a couple of adjustments :-)
> >
> >  It's been running fine now for a number of hours, with output that
> >  looks similar to the stuff you posted. I'll leave it running..
>
> If you can come up with an easily installable tarball, I can dedicate
> 1 or 2 boxes to run it 24/7.

Sure.  I updated the script so it will be clearer what you have to
modify.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
#!/bin/bash

REGRESSION_DURATION=    # average duration of regression test, in seconds
OUTFILE=

[ "$REGRESSION_DURATION" -o ! "$OUTFILE" ] &&
    echo "Must set REGRESSION_DURATION and OUTFILE in the script" 1>&2 &&
    exit 1

[ ! -f parallel_schedule ] &&
    echo "This must be from the Postgres src/test/regress directory" 1>&2 &&
    exit 1

if gmake -h > /dev/null 2>&1
then    MAKE=gmake
else    MAKE=make
fi

# To analyze output, use:
echo "Running ..."
echo "To analyze the output log file, use:"
echo "grep '^\+ *[A-Z][A-Z]*:' /rtmp/regression.sigterm | sort | uniq | less"


while :
do
    (
        SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`
        echo "Sleeping $SLEEP seconds"
        sleep "$SLEEP"
        echo "Trying kill"
        # send up to 7 kill signals
        for X in 1 2 3 4 5 6 7
        do
            psql -p 55432 -qt -c "
                SELECT pg_terminate_backend(stat.procpid)
                FROM (SELECT procpid FROM pg_stat_activity
                ORDER BY random() LIMIT 1) AS stat
                " template1 2> /dev/null
            if [ "$?" -eq 0 ]
            then    echo "Kill sent"
            fi
            sleep 5
        done
    ) &
    $MAKE check
    wait
    [ -s regression.diffs ] && cat regression.diffs >> "$OUTFILE"
done