Re: pgbench's expression parsing & negative numbers - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pgbench's expression parsing & negative numbers
Date
Msg-id 20171212184521.nqnf5iypgps66ltj@alap3.anarazel.de
Whole thread Raw
In response to Re: pgbench's expression parsing & negative numbers  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: pgbench's expression parsing & negative numbers  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On 2017-12-12 07:21:21 +0100, Fabien COELHO wrote:
> 
> Hello Andres,
> 
> > working on overflow correctness in pg I noticed that pgbench isn't quite
> > there.  I assume it won't matter terribly often, but the way it parses
> > integers makes it incorrect for, at least, the negativemost number.
> > 
> > It directly parses the integer input using:
> > {digit}+        {
> >                     yylval->ival = strtoint64(yytext);
> >                     return INTEGER_CONST;
> >                 }
> > 
> > but that unfortunately means that the sign is no included in the call to
> > strtoint64. Which in turn means you can't correctly parse PG_INT64_MIN,
> 
> Indeed. For a benchmarking script which generates a pseudo random load I do
> not see as a big issue, but maybe I'm missing some use case.

IDK, behaving correctly seems like a good idea. Also far from impossible
that that code gets propagated further.  Doesn't seem like an entirely
crazy idea that somebody actually wants to benchmark boundary cases,
which might be slower and such.

> > because that's not representable as a positive number on a two's
> > complement machine (i.e. everywhere).  Preliminary testing shows that
> > that can relatively easily fixed by just prefixing [-+]? to the relevant
> > exprscan.l rules.
> 
> Beware that it must handle the unary/binary plus/minus case consistently:
> 
>   "-123" vs "a -123" vs "a + -123"

Which is a lot easier to solve on the parser side of things...


> > But it might also not be a bad idea to simply defer parsing of the
> > number until exprparse.y has its hand on the number?
> 
> It is usually simpler to let the lexer handle constants.


> > There's plenty other things wrong with overflow handling too, especially
> > evalFunc() basically just doesn't care.
> 
> Sure.
> 
> There are some overflow checking with div and double to int cast, which were
> added because of previous complaints, but which are not very useful to me.

I think handling it inconsistently is the worst of all worlds.


> ISTM that it is a voluntary feature, which handles int64 operations with a
> modulo overflow like C. Generating exceptions on such overflows does not
> look very attractive to me.

No, it's not really voluntary. Signed overflow isn't actually defined
behaviour in C. We kinda make it so on some platforms, but that's not
really a good idea.


> >  I'll come up with a patch for that sometime soon.
> 
> I wish you could provide some motivation: why does it matter much to a
> benchmarking script to handle overflows with more case?

a) I want to be able to compile without -fwrapv in the near
   future. Which means you can't have signed overflows, because they're
   undefined behaviour in C.
b) I want to be able to compile with -ftrapv /
   -fsanitize=signed-integer-overflow, to be sure code is actually
   correct. Currently pgbench crashes with that.
c) Randomly handling things in some but not all places is a bad idea.


> > A third complaint I have is that the tap tests are pretty hard to parse
> > for humans.
> 
> Probably.
> 
> Perl is pretty hard to humans to start with:-)

Meh.


> There is a lot of code factorization to cram many tests together, so that
> one test is more or less one line and there are hundreds of them. Not sure
> it would help if it was expanded.

I don't think expanding it is really a problem, I think they're just
largely not well documented, formatted. With some effort the tests could
be a lot easier to read.


Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Chapman Flack
Date:
Subject: Re: proposal: alternative psql commands quit and exit
Next
From: Andres Freund
Date:
Subject: Re: Using ProcSignal to get memory context stats from a runningbackend