Re: Improving the isolationtester: fewer failures, less delay - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Improving the isolationtester: fewer failures, less delay |
Date | |
Msg-id | 20210616011820.2ktysyfozfhqdtha@alap3.anarazel.de Whole thread Raw |
In response to | Re: Improving the isolationtester: fewer failures, less delay (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Improving the isolationtester: fewer failures, less delay
Re: Improving the isolationtester: fewer failures, less delay |
List | pgsql-hackers |
Hi, Only halfway related: I wonder if we should remove the automatic permutation stuff - it's practically never useful. Probably not worth changing... On 2021-06-15 17:09:00 -0400, Tom Lane wrote: > +The general form of a permutation entry is > + > + "step_name" [ ( marker [ , marker ... ] ) ] > + > +where each marker defines a "blocking condition". The step will not be > +reported as completed before all the blocking conditions are satisfied. Minor suggestion: I think the folliwing would be a bit easier to read if there first were a list of markers, and then separately the longer descriptions. Right now it's a bit hard to see which paragraph introduces a new type of marker, and which just adds further commentary. > + /* > + * Check for other steps that have finished. We must do this > + * if oldstep completed; while if it did not, we need to poll > + * all the active steps in hopes of unblocking oldstep. > + */ Somehow I found the second sentence a bit hard to parse - I think it's the "while ..." sounding a bit odd to me. > + /* > + * If the target session is still busy, apply a timeout to > + * keep from hanging indefinitely, which could happen with > + * incorrect blocker annotations. Use the same 2 * > + * max_step_wait limit as try_complete_step does for deciding > + * to die. (We don't bother with trying to cancel anything, > + * since it's unclear what to cancel in this case.) > + */ > + if (iconn->active_step != NULL) > + { > + struct timeval current_time; > + int64 td; > + > + gettimeofday(¤t_time, NULL); > + td = (int64) current_time.tv_sec - (int64) start_time.tv_sec; > + td *= USECS_PER_SEC; > + td += (int64) current_time.tv_usec - (int64) start_time.tv_usec; >+ if (td > 2 * max_step_wait) > + { > + fprintf(stderr, "step %s timed out after %d seconds\n", > + iconn->active_step->name, > + (int) (td / USECS_PER_SEC)); > + exit(1); > + } > + } > + } > } It might be worth printing out the list of steps the being waited for when reaching the timeout - it seems like it'd now be easier to end up with a bit hard to debug specs. And one cannot necessarily look at pg_locks or such anymore to debug em. > @@ -833,6 +946,19 @@ try_complete_step(TestSpec *testspec, Step *step, int flags) > } > } > > + /* > + * The step is done, but we won't report it as complete so long as there > + * are blockers. > + */ > + if (step_has_blocker(pstep)) > + { > + if (!(flags & STEP_RETRY)) > + printf("step %s: %s <waiting ...>\n", > + step->name, step->sql); > + return true; > + } Might be a bug in my mental state machine: Will this work correctly for PSB_ONCE, where we'll already returned before? Greetings, Andres Freund
pgsql-hackers by date: