Re: Improving the isolationtester: fewer failures, less delay - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Improving the isolationtester: fewer failures, less delay
Date
Msg-id 20210616011820.2ktysyfozfhqdtha@alap3.anarazel.de
Whole thread Raw
In response to Re: Improving the isolationtester: fewer failures, less delay  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Improving the isolationtester: fewer failures, less delay
Re: Improving the isolationtester: fewer failures, less delay
List pgsql-hackers
Hi,

Only halfway related: I wonder if we should remove the automatic
permutation stuff - it's practically never useful. Probably not worth
changing...


On 2021-06-15 17:09:00 -0400, Tom Lane wrote:
> +The general form of a permutation entry is
> +
> +    "step_name" [ ( marker [ , marker ... ] ) ]
> +
> +where each marker defines a "blocking condition".  The step will not be
> +reported as completed before all the blocking conditions are satisfied.

Minor suggestion: I think the folliwing would be a bit easier to read if
there first were a list of markers, and then separately the longer
descriptions. Right now it's a bit hard to see which paragraph
introduces a new type of marker, and which just adds further commentary.


> +                /*
> +                 * Check for other steps that have finished.  We must do this
> +                 * if oldstep completed; while if it did not, we need to poll
> +                 * all the active steps in hopes of unblocking oldstep.
> +                 */

Somehow I found the second sentence a bit hard to parse - I think it's
the "while ..." sounding a bit odd to me.


> +                /*
> +                 * If the target session is still busy, apply a timeout to
> +                 * keep from hanging indefinitely, which could happen with
> +                 * incorrect blocker annotations.  Use the same 2 *
> +                 * max_step_wait limit as try_complete_step does for deciding
> +                 * to die.  (We don't bother with trying to cancel anything,
> +                 * since it's unclear what to cancel in this case.)
> +                 */
> +                if (iconn->active_step != NULL)
> +                {
> +                    struct timeval current_time;
> +                    int64        td;
> +
> +                    gettimeofday(¤t_time, NULL);
> +                    td = (int64) current_time.tv_sec - (int64) start_time.tv_sec;
> +                    td *= USECS_PER_SEC;
> +                    td += (int64) current_time.tv_usec - (int64) start_time.tv_usec;
>+                    if (td > 2 * max_step_wait)
> +                    {
> +                        fprintf(stderr, "step %s timed out after %d seconds\n",
> +                                iconn->active_step->name,
> +                                (int) (td / USECS_PER_SEC));
> +                        exit(1);
> +                    }
> +                }
> +            }
>          }

It might be worth printing out the list of steps the being waited for
when reaching the timeout - it seems like it'd now be easier to end up
with a bit hard to debug specs. And one cannot necessarily look at
pg_locks or such anymore to debug em.


> @@ -833,6 +946,19 @@ try_complete_step(TestSpec *testspec, Step *step, int flags)
>          }
>      }
>  
> +    /*
> +     * The step is done, but we won't report it as complete so long as there
> +     * are blockers.
> +     */
> +    if (step_has_blocker(pstep))
> +    {
> +        if (!(flags & STEP_RETRY))
> +            printf("step %s: %s <waiting ...>\n",
> +                   step->name, step->sql);
> +        return true;
> +    }

Might be a bug in my mental state machine: Will this work correctly for
PSB_ONCE, where we'll already returned before?


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: disfavoring unparameterized nested loops
Next
From: Tom Lane
Date:
Subject: Re: Improving the isolationtester: fewer failures, less delay