Re: pgbench - minor fix for meta command only scripts - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench - minor fix for meta command only scripts
Date
Msg-id alpine.DEB.2.20.1609241112330.6473@lancre
Whole thread Raw
In response to Re: pgbench - minor fix for meta command only scripts  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: pgbench - minor fix for meta command only scripts
List pgsql-hackers
Hello Heikki,

> Yeah, it really is quite a mess. I tried to review your patch, and I think
> it's correct, but I couldn't totally convince myself, because of the existing
> messiness of the logic. So I bit the bullet and started refactoring.
>
> I came up with the attached. It refactors the logic in doCustom() into a
> state machine. I think this is much clearer, what do you think?

The patch did not apply to master because of you committed the sleep fix
in between. I updated the patch so that the fix is included as well.

I think that this is really needed. The code is much clearer and simple to
understand with the state machines & additional functions. This is a
definite improvement to the code base.

I've done quite some testing with various options (-r, --rate,
--latency-limit, -C...) and got pretty reasonnable results.

Although I cannot be absolutely sure that the refactoring does not
introduce any new bug, I'm convinced that it will be much easier to find
them:-)


Attached are some small changes to your version:

I have added the sleep_until fix.

I have fixed a bug introduced in the patch by changing && by || in the
(min_sec > 0 && maxsock != -1) condition which was inducing errors with
multi-threads & clients...

I have factored out several error messages in "commandFailed", in place of
the "metaCommandFailed", and added the script number as well in the error
messages. All messages are now specific to the failed command.

I have added two states to the machine:

  - CSTATE_CHOOSE_SCRIPT which simplifies threadRun, there is now one call
    to chooseScript instead of two before.

  - CSTATE_END_COMMAND which manages is_latencies and proceeding to the
    next command, thus merging the three instances of updating the stats
    that were in the first version.

The later state means that processing query results is included in the per
statement latency, which is an improvement because before I was getting
some transaction latency significantly larger that the apparent sum of the
per-statement latencies, which did not make much sense...

I have added & updated a few comments. There are some places where the
break could be a pass through instead, not sure how desirable it is, I'm
fine with break.


> Well, the comment right there says "note this is not included in the
> statement latency numbers", so apparently it's intentional. Whether it's a
> good idea or not, I don't know :-). It does seem a bit surprising.

Indeed, it also results in apparently inconsistent numbers, and it creates
a mess for recording the statement latency because it meant that in some
case the latency was collected before the actual end of the command, see
the discussion about CSTATE_END_COMMAND above.

> But what seems more bogus to me is that we do that after recording the
> *transaction* stats, if this was the last command. So the PQgetResult() of
> the last command in the transaction is not included in the transaction stats,
> even though the PQgetResult() calls for any previous commands are. (Perhaps
> that's what you meant too?)
>
> I changed that in my patch, it would've been inconvenient to keep that old
> behavior, and it doesn't make any sense to me anyway.

Fine with me.

--
Fabien.
Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Refactor StartupXLOG?
Next
From: Peter Geoghegan
Date:
Subject: Re: ICU integration