Re: [HACKERS] pgbench regression test failure - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: [HACKERS] pgbench regression test failure |
Date | |
Msg-id | alpine.DEB.2.20.1711200703240.11926@lancre Whole thread Raw |
In response to | Re: [HACKERS] pgbench regression test failure (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] pgbench regression test failure
|
List | pgsql-hackers |
Hello Tom, Thanks for having a look at this bug fix. > So we fixed the reported TPS rate, which was nowhere near reality, > and the per-script stats are sane now. Good so far, but this > still has two problems IMO: > > 1. The per-script stats shouldn't be printed at all if there's > only one script. They're redundant with the overall stats. Indeed. I think the output should tend to be the same for possible automatic processing, whether there is one script or more, even at the price of some redundancy. Obviously this is highly debatable. > 2. ISTM that we should report that 100% of the transactions were > above the latency limit, not 33%; that is, the appropriate base > for the "number of transactions above the latency limit" percentage > is the number of actual transactions not the number of scheduled > transactions. Hmmm. Allow me to disagree. If the user asked for something, then this is the reference to take and the whole report should be relative to that. Also, changing this makes the report figures not adding up: before: number of transactions skipped: 417 (90.260 %) number of transactions above the 1.0 ms latency limit: 45 (9.740%) Both above percent are on the same reference, fine. Can be added, 100% of transactions we required had issues. after: number of transactions skipped: 457 (90.855 %) number of transactions above the 1.0 ms latency limit: 46 (100.000%) Percents do not add up anymore. I do not think that this is an improvement. > I also noticed that if I specify "-f sleep-100.sql" more than once, > the per-script TPS reports are out of line. This is evidently because > that calculation isn't excluding skipped xacts; but if we're going to > define tps as excluding skipped xacts, surely we should do so there too. I do not think that we should exclude skipped xacts. > I'm also still exceedingly unhappy about the NaN business. > You have this comment in printSimpleStats: > /* print NaN if no transactions where executed */ > but I find that unduly optimistic. It should really read more like > "if no transactions were executed, at best we'll get some platform- > dependent spelling of NaN. At worst we'll get a SIGFPE." Hmmm. Alas you must be right about spelling. There has been no report of SIGFPE issue, so I would not bother with that. > I do not think that a pedantic argument about NaN being the "correct" > answer justifies taking such portability risks, even if I bought that > argument, which I don't particularly. ISTM that the portability risk, i.e. about SIGFPE issue, would require a non IEEE 754 conforming box, and there has been no complaint yet about that. > It's also inconsistent with this basic decision in printResults: > > /* Remaining stats are nonsensical if we failed to execute any xacts */ > if (total->cnt <= 0) > return; According to praise, this decision was by Tom Lane. It is about the whole execution not executing anything, which may be seen as a special error case, whereas a progress report without xact execution within some period can be perfectly normal. > If we thought NaN was the "correct" answer for no xacts then we'd just > bull ahead and print NaNs all over the place. Why not, but for me the issue wrt to the final report is distinct. > I think we would be best advised to deal with this by printing zeroes in > the progress-report case, and by skipping the output altogether in > printSimpleStats. (Or we could make printSimpleStats print zeroes; I'm > indifferent to that choice.) I disagree with printing a zero if the measure is undefined. Zero does not mean undefined. "Zero means zero":-) Maybe we could replace platform-specific printing for "NaN" with something else, either a deterministic "NaN" or some other string, eg "-" or "?" or whatever. The benefit with sticking to some "NaN" or some platform specific NaN is that processing the output can retrieve it as a float. So basically I would have stayed with the current version which seems consistent to me, but I agree that it is debatable. I'm not sure it is worth being "exceedingly unhappy" about it, because these are special corner cases anyway, not likely to be seen without somehow looking for it or having pretty strange pgbench runs. > Attached is an updated patch addressing these points. Comments? I somehow disagree with your points above, for various reasons. However, you are a committer, and I think best that the bug is fixed even if I happen to disagree with unrelated changes. As an academic, I probably have a number of silly ideas about a lot of things. This is part of the job description:-) -- Fabien.
pgsql-hackers by date: