Re: [HACKERS] pgbench regression test failure - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] pgbench regression test failure
Date
Msg-id alpine.DEB.2.20.1711200703240.11926@lancre
Whole thread Raw
In response to Re: [HACKERS] pgbench regression test failure  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] pgbench regression test failure  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hello Tom,

Thanks for having a look at this bug fix.

> So we fixed the reported TPS rate, which was nowhere near reality,
> and the per-script stats are sane now.  Good so far, but this
> still has two problems IMO:
>
> 1. The per-script stats shouldn't be printed at all if there's
> only one script.  They're redundant with the overall stats.

Indeed.

I think the output should tend to be the same for possible automatic 
processing, whether there is one script or more, even at the price of some 
redundancy.

Obviously this is highly debatable.

> 2. ISTM that we should report that 100% of the transactions were
> above the latency limit, not 33%; that is, the appropriate base
> for the "number of transactions above the latency limit" percentage
> is the number of actual transactions not the number of scheduled
> transactions.

Hmmm. Allow me to disagree.

If the user asked for something, then this is the reference to take and 
the whole report should be relative to that. Also, changing this makes the 
report figures not adding up:

before:  number of transactions skipped: 417 (90.260 %)  number of transactions above the 1.0 ms latency limit: 45
(9.740%)
 

Both above percent are on the same reference, fine. Can be added, 100% of 
transactions we required had issues.

after:  number of transactions skipped: 457 (90.855 %)  number of transactions above the 1.0 ms latency limit: 46
(100.000%)
 

Percents do not add up anymore. I do not think that this is an 
improvement.

> I also noticed that if I specify "-f sleep-100.sql" more than once,
> the per-script TPS reports are out of line.  This is evidently because
> that calculation isn't excluding skipped xacts; but if we're going to
> define tps as excluding skipped xacts, surely we should do so there too.

I do not think that we should exclude skipped xacts.

> I'm also still exceedingly unhappy about the NaN business.
> You have this comment in printSimpleStats:
>     /* print NaN if no transactions where executed */
> but I find that unduly optimistic.  It should really read more like
> "if no transactions were executed, at best we'll get some platform-
> dependent spelling of NaN.  At worst we'll get a SIGFPE."

Hmmm. Alas you must be right about spelling. There has been no report of 
SIGFPE issue, so I would not bother with that.

> I do not think that a pedantic argument about NaN being the "correct" 
> answer justifies taking such portability risks, even if I bought that 
> argument, which I don't particularly.

ISTM that the portability risk, i.e. about SIGFPE issue, would require a 
non IEEE 754 conforming box, and there has been no complaint yet about 
that.

> It's also inconsistent with this basic decision in printResults:
>
>     /* Remaining stats are nonsensical if we failed to execute any xacts */
>     if (total->cnt <= 0)
>         return;

According to praise, this decision was by Tom Lane. It is about the whole 
execution not executing anything, which may be seen as a special error 
case, whereas a progress report without xact execution within some period 
can be perfectly normal.

> If we thought NaN was the "correct" answer for no xacts then we'd just
> bull ahead and print NaNs all over the place.

Why not, but for me the issue wrt to the final report is distinct.

> I think we would be best advised to deal with this by printing zeroes in 
> the progress-report case, and by skipping the output altogether in 
> printSimpleStats.  (Or we could make printSimpleStats print zeroes; I'm 
> indifferent to that choice.)

I disagree with printing a zero if the measure is undefined. Zero does not 
mean undefined. "Zero means zero":-)

Maybe we could replace platform-specific printing for "NaN" with something 
else, either a deterministic "NaN" or some other string, eg "-" or "?" or 
whatever. The benefit with sticking to some "NaN" or some platform 
specific NaN is that processing the output can retrieve it as a float.

So basically I would have stayed with the current version which seems 
consistent to me, but I agree that it is debatable. I'm not sure it is 
worth being "exceedingly unhappy" about it, because these are special 
corner cases anyway, not likely to be seen without somehow looking for it 
or having pretty strange pgbench runs.

> Attached is an updated patch addressing these points.  Comments?

I somehow disagree with your points above, for various reasons.

However, you are a committer, and I think best that the bug is fixed even 
if I happen to disagree with unrelated changes. As an academic, I probably 
have a number of silly ideas about a lot of things. This is part of the 
job description:-)

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Proposal: Local indexes for partitioned table
Next
From: Ildar Musin
Date:
Subject: Re: [HACKERS] Repetitive code in RI triggers