Re: conchuela timeouts since 2021-10-09 system upgrade - Mailing list pgsql-bugs

From Tom Lane
Subject Re: conchuela timeouts since 2021-10-09 system upgrade
Date
Msg-id 4170532.1635195582@sss.pgh.pa.us
Whole thread Raw
In response to Re: conchuela timeouts since 2021-10-09 system upgrade  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: conchuela timeouts since 2021-10-09 system upgrade  (Noah Misch <noah@leadboat.com>)
List pgsql-bugs
Andrey Borodin <x4mmm@yandex-team.ru> writes:
> FWIW it's easy to make the issue reproduce faster with following diff
> -       '--no-vacuum --client=1 --transactions=100',
> +       '--no-vacuum --client=1 --transactions=1',

Hmm, didn't help here.  It seems that even though prairiedog managed to
fail on its first attempt, it's not terribly reproducible there; I've
seen only one failure in about 30 manual attempts.  In the one failure,
the non-background pgbench completed fine (as determined by counting
statements in the postmaster's log); but the background one had only
finished about 90 transactions before seemingly getting stuck.  No new
SQL commands had been issued after about 10 seconds.

Nonetheless, I have a theory and a proposal.  This coding pattern
seems pretty silly:

    $pgbench_h->pump_nb;
    $pgbench_h->finish();

ISTM that if you need to call pump at all, you need a loop not just
one call.  So I'm guessing that when it fails, it's for lack of
pumping.

The other thing I noticed is that at least on prairiedog's host, the
number of invocations of the DROP/CREATE/bt_index_check transaction
is ridiculously out of proportion to the number of invocations of the
other transactions.  It can only get through seven or eight iterations
of the index transaction before the other transactions are all done,
which means the last 190 iterations of that transaction are a complete
waste of cycles.

What I think we should do in these two tests is nuke the use of
background_pgbench entirely; that looks like a solution in search
of a problem, and it seems unnecessary here.  Why not run
the DROP/CREATE/bt_index_check transaction as one of three script
options in the main pgbench run?  Aside from dodging this
maybe-its-a-bug-or-maybe-not behavior in IPC::Run, this would make the
test automatically scale the number of iterations of the different
transactions to appropriate values, so that we'd not be wasting cycles.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Next
From: "K. R."
Date:
Subject: Re: BUG #17245: Index corruption involving deduplicated entries