Thread: Avoid stuck of pbgench due to skipped transactions

Avoid stuck of pbgench due to skipped transactions

From

Yugo NAGATA

Date:

12 June 2021, 19:01:51

Hi,

I found that pgbench could get stuck when every transaction
come to be skipped and the number of transaction is not limitted
by -t option.

For example, when I usee a large rate (-R) for throttling and a
small latency limit (-L) values with a duration (-T), pbbench
got stuck.

 $ pgbench -T 5 -R 100000000 -L 1;

When we specify the number of transactions by -t, it doesn't get
stuck because the number of skipped transactions are counted and
checked during the loop. However, the timer expiration is not
checked in the loop although it is checked before and after a
sleep for throttling. 

I think it is better to check the timer expiration even in the loop
of transaction skips and to finish pgbnech successfully because we
should correcly repport how many transactions are proccessed and
skipped also in this case, and getting stuck would not be good
anyway.

I attached a patch for this fix.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Attachment

pgbench_avoiding_stuck.patch

Re: Avoid stuck of pbgench due to skipped transactions

From

Fabien COELHO

Date:

13 June 2021, 06:56:59

Hello Yugo-san,

> For example, when I usee a large rate (-R) for throttling and a
> small latency limit (-L) values with a duration (-T), pbbench
> got stuck.
>
> $ pgbench -T 5 -R 100000000 -L 1;

Indeed, it does not get out of the catchup loop for a long time because 
even scheduling takes more time than the expected transaction time!

> I think it is better to check the timer expiration even in the loop
> of transaction skips and to finish pgbnech successfully because we
> should correcly repport how many transactions are proccessed and
> skipped also in this case, and getting stuck would not be good
> anyway.
>
> I attached a patch for this fix.

The patch mostly works for me, and I agree that the bench should not be in 
a loop on any parameters, even when "crazy" parameters are given…

However I'm not sure this is the right way to handle this issue.

The catch-up loop can be dropped and the automaton can loop over itself to 
reschedule. Doing that as the attached fixes this issue and also makes 
progress reporting work proprely in more cases, and reduces the number of 
lines of code. I did not add a test case because time sensitive tests have 
been removed (which is too bad, IMHO).

-- 
Fabien.

Attachment

pgbench-stuck-2.patch

Re: Avoid stuck of pbgench due to skipped transactions

From

Yugo NAGATA

Date:

14 June 2021, 02:20:37

Hello Fabien,

On Sun, 13 Jun 2021 08:56:59 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> > I attached a patch for this fix.
> 
> The patch mostly works for me, and I agree that the bench should not be in 
> a loop on any parameters, even when "crazy" parameters are given…
> 
> However I'm not sure this is the right way to handle this issue.
> 
> The catch-up loop can be dropped and the automaton can loop over itself to 
> reschedule. Doing that as the attached fixes this issue and also makes 
> progress reporting work proprely in more cases, and reduces the number of 
> lines of code. I did not add a test case because time sensitive tests have 
> been removed (which is too bad, IMHO).

I agree with your way to fix. However, the progress reporting didn't work
because we cannot return from advanceConnectionState to threadRun and just
break the loop.

+                        /* otherwise loop over PREPARE_THROTTLE */
                         break;

I attached the fixed patch that uses return instead of break, and I confirmed
that this made the progress reporting work property.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Attachment

pgbench-stuck-3.patch

Re: Avoid stuck of pbgench due to skipped transactions

From

Fabien COELHO

Date:

14 June 2021, 06:47:40

>>> I attached a patch for this fix.
>>
>> The patch mostly works for me, and I agree that the bench should not be in
>> a loop on any parameters, even when "crazy" parameters are given…
>>
>> However I'm not sure this is the right way to handle this issue.
>>
>> The catch-up loop can be dropped and the automaton can loop over itself to
>> reschedule. Doing that as the attached fixes this issue and also makes
>> progress reporting work proprely in more cases, and reduces the number of
>> lines of code. I did not add a test case because time sensitive tests have
>> been removed (which is too bad, IMHO).
>
> I agree with your way to fix. However, the progress reporting didn't work
> because we cannot return from advanceConnectionState to threadRun and just
> break the loop.
>
> +                        /* otherwise loop over PREPARE_THROTTLE */
>                         break;
>
> I attached the fixed patch that uses return instead of break, and I confirmed
> that this made the progress reporting work property.

I'm hesitating to do such a strictural change for a degenerate case linked 
to "insane" parameters, as pg is unlikely to reach 100 million tps, ever.
It seems to me enough that the command is not blocked in such cases.

-- 
Fabien.

Re: Avoid stuck of pbgench due to skipped transactions

From

Yugo NAGATA

Date:

14 June 2021, 07:06:10

On Mon, 14 Jun 2021 08:47:40 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> 
> >>> I attached a patch for this fix.
> >>
> >> The patch mostly works for me, and I agree that the bench should not be in
> >> a loop on any parameters, even when "crazy" parameters are given…
> >>
> >> However I'm not sure this is the right way to handle this issue.
> >>
> >> The catch-up loop can be dropped and the automaton can loop over itself to
> >> reschedule. Doing that as the attached fixes this issue and also makes
> >> progress reporting work proprely in more cases, and reduces the number of
> >> lines of code. I did not add a test case because time sensitive tests have
> >> been removed (which is too bad, IMHO).
> >
> > I agree with your way to fix. However, the progress reporting didn't work
> > because we cannot return from advanceConnectionState to threadRun and just
> > break the loop.
> >
> > +                        /* otherwise loop over PREPARE_THROTTLE */
> >                         break;
> >
> > I attached the fixed patch that uses return instead of break, and I confirmed
> > that this made the progress reporting work property.
> 
> I'm hesitating to do such a strictural change for a degenerate case linked 
> to "insane" parameters, as pg is unlikely to reach 100 million tps, ever.
> It seems to me enough that the command is not blocked in such cases.

Sure. The change from "break" to "return" is just for making the progress
reporting work in the loop, as you mentioned. However,  my original intention
is avoiding stuck in a corner-case where a unrealistic parameter was used, and
I agree with you that this change　is not so necessary for handling such a
special situation. 

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Avoid stuck of pbgench due to skipped transactions

From

Yugo NAGATA

Date:

16 June 2021, 16:23:49

On Mon, 14 Jun 2021 16:06:10 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Mon, 14 Jun 2021 08:47:40 +0200 (CEST)
> Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> > > I attached the fixed patch that uses return instead of break, and I confirmed
> > > that this made the progress reporting work property.
> > 
> > I'm hesitating to do such a strictural change for a degenerate case linked 
> > to "insane" parameters, as pg is unlikely to reach 100 million tps, ever.
> > It seems to me enough that the command is not blocked in such cases.
> 
> Sure. The change from "break" to "return" is just for making the progress
> reporting work in the loop, as you mentioned. However,  my original intention
> is avoiding stuck in a corner-case where a unrealistic parameter was used, and
> I agree with you that this change　is not so necessary for handling such a
> special situation. 

I attached the v2 patch to clarify that I withdrew the v3 patch.

Regards
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Attachment

pgbench-stuck-2.patch

Re: Avoid stuck of pbgench due to skipped transactions

From

Greg Sabino Mullane

Date:

22 June 2021, 19:22:38

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       tested, failed
Spec compliant:           not tested
Documentation:            not tested

Looks fine to me, as a way of catching this edge case.

Re: Avoid stuck of pbgench due to skipped transactions

From

Yugo NAGATA

Date:

23 June 2021, 00:36:58

Hello Greg,

On Tue, 22 Jun 2021 19:22:38 +0000
Greg Sabino Mullane <htamfids@gmail.com> wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:       tested, failed
> Spec compliant:           not tested
> Documentation:            not tested
> 
> Looks fine to me, as a way of catching this edge case.

Thank you for looking into this!

'make installcheck-world' and 'Implements feature' are marked "failed",
but did you find any problem on this patch?

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Avoid stuck of pbgench due to skipped transactions

From

Greg Sabino Mullane

Date:

10 August 2021, 14:50:20

Apologies, just saw this. I found no problems, those "failures" were just me missing checkboxes on the commitfest interface. +1 on the patch.

Cheers,

Greg

Re: Avoid stuck of pbgench due to skipped transactions

From

Yugo NAGATA

Date:

12 August 2021, 16:01:44

On Tue, 10 Aug 2021 10:50:20 -0400
Greg Sabino Mullane <htamfids@gmail.com> wrote:

> Apologies, just saw this. I found no problems, those "failures" were just
> me missing checkboxes on the commitfest interface. +1 on the patch.

Thank you!


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Avoid stuck of pbgench due to skipped transactions

From

Fujii Masao

Date:

03 September 2021, 12:56:12


On 2021/06/17 1:23, Yugo NAGATA wrote:
> I attached the v2 patch to clarify that I withdrew the v3 patch.

Thanks for the patch!

+                             * For very unrealistic rates under -T, some skipped
+                             * transactions are not counted because the catchup
+                             * loop is not fast enough just to do the scheduling
+                             * and counting at the expected speed.
+                             *
+                             * We do not bother with such a degenerate case.
+                             */

ISTM that the patch changes pgbench so that it can skip counting
some skipped transactions here even for realistic rates under -T.
Of course, which would happen very rarely. Is this understanding right?

On the other hand, even without the patch, in the first place, there seems
no guarantee that all the skipped transactions are counted under -T.
When the timer is exceeded in CSTATE_END_TX, a client ends without
checking outstanding skipped transactions. Therefore the "issue" that
some skipped transactions are not counted is not one the patch newly introdues.
So that behavior change by the patch would be acceptable.
Is this understanding right?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Avoid stuck of pbgench due to skipped transactions

From

Fabien COELHO

Date:

04 September 2021, 06:27:00

Hello Fujii-san,

> ISTM that the patch changes pgbench so that it can skip counting
> some skipped transactions here even for realistic rates under -T.
> Of course, which would happen very rarely. Is this understanding right?

Yes. The point is to get out of the scheduling loop when time has expired, 
as soon it is known, instead of looping there for some possibly long time.

> On the other hand, even without the patch, in the first place, there seems
> no guarantee that all the skipped transactions are counted under -T.
> When the timer is exceeded in CSTATE_END_TX, a client ends without
> checking outstanding skipped transactions.

Indeed. But that should be very few transactions under latency limit.

> Therefore the "issue" that some skipped transactions are not counted is 
> not one the patch newly introdues.

Yep. The patch counts less of them though, because of the early exit 
introduced in the patch in the scheduling state. Before it could be stuck 
in the "while (late) { count; schedule; }" loop.

> So that behavior change by the patch would be acceptable. Is this 
> understanding right?

I think so.

-- 
Fabien.

Re: Avoid stuck of pbgench due to skipped transactions

From

Fujii Masao

Date:

06 September 2021, 16:10:44


On 2021/09/04 15:27, Fabien COELHO wrote:
> 
> Hello Fujii-san,
> 
>> ISTM that the patch changes pgbench so that it can skip counting
>> some skipped transactions here even for realistic rates under -T.
>> Of course, which would happen very rarely. Is this understanding right?
> 
> Yes. The point is to get out of the scheduling loop when time has expired, as soon it is known, instead of looping
therefor some possibly long time.
 

Thanks for checking my understanding!

+                             * For very unrealistic rates under -T, some skipped
+                             * transactions are not counted because the catchup
+                             * loop is not fast enough just to do the scheduling
+                             * and counting at the expected speed.
+                             *
+                             * We do not bother with such a degenerate case.

So this comment is a bit misleading? What about updating this as follows?

------------------------------
Stop counting skipped transactions under -T as soon as the timer is exceeded.
Because otherwise it can take a very long time to count all of them especially
when quite a lot of them happen with unrealistically high rate setting in -R,
which would prevent pgbench from ending immediately. Because of this behavior,
note that there is no guarantee that all skipped transactions are counted
under -T though there is under -t. This is OK in practice because it's very
unlikely to happen with realistic setting.
------------------------------


>> So that behavior change by the patch would be acceptable. Is this understanding right?
> 
> I think so.

+1

One question is; which version do we want to back-patch to?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Avoid stuck of pbgench due to skipped transactions

From

Fabien COELHO

Date:

07 September 2021, 09:24:39

Hello Fujii-san,

> Stop counting skipped transactions under -T as soon as the timer is 
> exceeded. Because otherwise it can take a very long time to count all of 
> them especially when quite a lot of them happen with unrealistically 
> high rate setting in -R, which would prevent pgbench from ending 
> immediately. Because of this behavior, note that there is no guarantee 
> that all skipped transactions are counted under -T though there is under 
> -t. This is OK in practice because it's very unlikely to happen with 
> realistic setting.

Ok, I find this text quite clear.

> One question is; which version do we want to back-patch to?

If we consider it a "very minor bug fix" which is triggered by somehow 
unrealistic options, so I'd say 14 & dev, or possibly only dev.

-- 
Fabien.

Re: Avoid stuck of pbgench due to skipped transactions

From

Fujii Masao

Date:

08 September 2021, 14:40:35


On 2021/09/07 18:24, Fabien COELHO wrote:
> 
> Hello Fujii-san,
> 
>> Stop counting skipped transactions under -T as soon as the timer is exceeded. Because otherwise it can take a very
longtime to count all of them especially when quite a lot of them happen with unrealistically high rate setting in -R,
whichwould prevent pgbench from ending immediately. Because of this behavior, note that there is no guarantee that all
skippedtransactions are counted under -T though there is under -t. This is OK in practice because it's very unlikely to
happenwith realistic setting.
 
> 
> Ok, I find this text quite clear.

Thanks for the check! So attached is the updated version of the patch.


>> One question is; which version do we want to back-patch to?
> 
> If we consider it a "very minor bug fix" which is triggered by somehow unrealistic options, so I'd say 14 & dev, or
possiblyonly dev.
 

Agreed. Since it's hard to imagine the issue happens in practice,
we don't need to bother back-patch to the stable branches.
So I'm thinking to commit the patch to 15dev and 14.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

pgbench-stuck-3.patch

Re: Avoid stuck of pbgench due to skipped transactions

From

Fujii Masao

Date:

09 September 2021, 16:30:34


On 2021/09/08 23:40, Fujii Masao wrote:
> Agreed. Since it's hard to imagine the issue happens in practice,
> we don't need to bother back-patch to the stable branches.
> So I'm thinking to commit the patch to 15dev and 14.

Pushed. Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION