Thread: pgsql: Refactor script execution state machine in pgbench.

pgsql: Refactor script execution state machine in pgbench.

From
Heikki Linnakangas
Date:
Refactor script execution state machine in pgbench.

The doCustom() function had grown into quite a mess. Rewrite it, in a more
explicit state machine style, for readability.

This also fixes one minor bug: if a script consisted entirely of meta
commands, doCustom() never returned to the caller, so progress reports
with the -P option were not printed. I don't want to backpatch this
refactoring, and the bug is quite insignificant, so only commit this to
master, and leave the bug unfixed in back-branches.

Review and original bug report by Fabien Coelho.

Discussion: <alpine.DEB.2.20.1607090850120.3412@sto>

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/12788ae49e1933f463bc59a6efe46c4a01701b76

Modified Files
--------------
src/bin/pgbench/pgbench.c | 1106 +++++++++++++++++++++++++++------------------
1 file changed, 661 insertions(+), 445 deletions(-)


Re: pgsql: Refactor script execution state machine in pgbench.

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@iki.fi> writes:
> Refactor script execution state machine in pgbench.

There's at least one bug in this patch.  It's caused pgbench to go into an
infinite loop during the TAP tests on my buildfarm machine longfin.

$ ps auxww | grep buildfarm | grep pgbench
buildfarm        9033 100.0  0.1  2462896   5936   ??  R     4:06AM 719:14.18
/Users/buildfarm/bf-data/HEAD/inst/bin/pgbench--no-vacuum --client=5 --protocol=prepared --transactions=25 --file
/Users/buildfarm/bf-data/HEAD/pgsql.build/src/bin/pgbench/tmp_check/data_main_JNh2/pgbench_script
buildfarm        9023   0.0  0.2  2603480  18696   ??  S     4:06AM   0:00.34
/Users/buildfarm/bf-data/HEAD/inst/bin/postgres-D
/Users/buildfarm/bf-data/HEAD/pgsql.build/src/bin/pgbench/tmp_check/data_main_JNh2/pgdata
buildfarm        9009   0.0  0.1  2458948  12224   ??  S     4:06AM   0:04.56 /usr/bin/perl t/001_pgbench.pl
buildfarm        9007   0.0  0.0  2433396   2220   ??  S     4:06AM   0:00.00 /bin/sh -c cd . &&
TESTDIR='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/bin/pgbench'PATH="/Users/buildfarm/bf-data/HEAD/inst/bin:$PATH"
PGPORT='65678'top_builddir='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/bin/pgbench/../../..'
PG_REGRESS='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/bin/pgbench/../../../src/test/regress/pg_regress'prove -I
../../../src/test/perl/--verbose t/*.pl 
buildfarm        9002   0.0  0.0  2432936   2108   ??  S     4:06AM   0:00.01
/Applications/Xcode.app/Contents/Developer/usr/bin/make-C pgbench installcheck 

Some other buildfarm critters seem to have got through that successfully,
so it's probably a portability issue not a hard failure.

            regards, tom lane


Re: pgsql: Refactor script execution state machine in pgbench.

From
Fabien COELHO
Date:
Hello Tom,

> There's at least one bug in this patch.

Indeed.

> It's caused pgbench to go into an infinite loop during the TAP tests on
> my buildfarm machine longfin.
>
> buildfarm  719:14.18 /Users/buildfarm/bf-data/HEAD/inst/bin/pgbench --no-vacuum --client=5 --protocol=prepared
--transactions=25--file
/Users/buildfarm/bf-data/HEAD/pgsql.build/src/bin/pgbench/tmp_check/data_main_JNh2/pgbench_script
>
> Some other buildfarm critters seem to have got through that successfully,
> so it's probably a portability issue not a hard failure.

Hmmm, not easy to identify, especially as I do not have a clang on
MacOS...

Could you catch the process to identify the infinite loop?
What are the states of the five clients?

I think that it may be stuck in doCustom or on threadRun.

I'm not sure how it could be stuck in doCustom. Re-reading the code, I
think that the "end_tx_processed" stuff could be removed in favor of a
systematic return at the end of the transaction, but that does not explain
anything wrt to an infinite loop: after 25 transactions it should have got
to FINISHED and have exited.

Maybe it can be stuck in threadRun in the "while (remains>0)" loop, which
suggests that a remains-- has been forgotten... but I do not see how this
is possible to forget a client by reading the code.

So although I hastily declared that bugs would be easier to detect,
without any additional clue it is hard to help.

--
Fabien.


Re: pgsql: Refactor script execution state machine in pgbench.

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> Hello Tom,
>> It's caused pgbench to go into an infinite loop during the TAP tests on
>> my buildfarm machine longfin.

> Hmmm, not easy to identify, especially as I do not have a clang on
> MacOS...
> Could you catch the process to identify the infinite loop?
> What are the states of the five clients?

Yeah, I thought you'd ask for more info ;-).  I am busy with wrapping
the 9.6.0 release, but will go digging as soon as that's done.

            regards, tom lane


Re: pgsql: Refactor script execution state machine in pgbench.

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> Could you catch the process to identify the infinite loop?

I think the problem is here (pgbench.c lines 4550ff):

            bool        ready;
            ...
            ready = FD_ISSET(sock, &input_mask);

The result of FD_ISSET isn't a bool, it's an int.  I can see in the
looping process that the backend connection has socket FD 8, and
I think what is happening is that FD_ISSET is returning 1<<8 and
that's getting truncated to zero to fit in a bool (char), so that
the code just below doesn't call doCustom and we never consume the
waiting result.

I couldn't reproduce this in a manual build, but perhaps the socket
number assignments are a bit different in the buildfarm environment.

Rather than s/bool/int/, I'm inclined to remove the "ready" variable
altogether.  We have exactly zero other places that make any assumptions
about the width of FD_ISSET results, and I don't see a good reason
why this needs to be the first one.

I see a number of other things that look pretty infelicitous in this
code --- for example, why is the loop at lines 4440ff break'ing after
it comes across the first socket to wait for?  That sure looks like it
will fail to deal with more than one wait-able socket, rather destroying
the point of having multiple threads.

Will fix.

            regards, tom lane


Re: pgsql: Refactor script execution state machine in pgbench.

From
Fabien COELHO
Date:
Hello Tom,

> I think the problem is here (pgbench.c lines 4550):

Hmmm... Indeed "typedef char bool;". I thought postgresql bool was an
int, I usually do "typedef enum { false, true } bool;"

> I see a number of other things that look pretty infelicitous in this
> code --- for example, why is the loop at lines 4440 break'ing after
> it comes across the first socket to wait for?

Indeed this seems strange. I will have a look at that as well.

I'll try to look at these tonight, or if I can't by the end of the week.

--
Fabien.


Re: pgsql: Refactor script execution state machine in pgbench.

From
Heikki Linnakangas
Date:
On 09/27/2016 02:25 AM, Tom Lane wrote:
> Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Could you catch the process to identify the infinite loop?
>
> I think the problem is here (pgbench.c lines 4550ff):
>
>             bool        ready;
>             ...
>             ready = FD_ISSET(sock, &input_mask);
>
> The result of FD_ISSET isn't a bool, it's an int.  I can see in the
> looping process that the backend connection has socket FD 8, and
> I think what is happening is that FD_ISSET is returning 1<<8 and
> that's getting truncated to zero to fit in a bool (char), so that
> the code just below doesn't call doCustom and we never consume the
> waiting result.

Hah, good catch. It's quite a landmine that a macro named "is-something"
doesn't return a boolean. I remember we had issues like this in some
Postgres macros too, and added a "(<macro>) != 0)" into the macro
definition to fix. Of course, FD_ISSET is not in our control, and "bool"
is a Postgres thing anyway, so can't do that here.

> Will fix.

Thanks!

- Heikki



Re: pgsql: Refactor script execution state machine in pgbench.

From
Fabien COELHO
Date:
> Will fix.

Oops, I missed this line... Thanks!

--
Fabien.