Re: [HACKERS] pgbench - allow to store select results intovariables - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] pgbench - allow to store select results intovariables
Date
Msg-id alpine.DEB.2.20.1701290846130.13068@lancre
Whole thread Raw
In response to Re: [HACKERS] pgbench - allow to store select results intovariables  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers

<APOLOGY>  Please pardon the redondance: this is a slightly edited repost  from another thread where motivation for
thispatch was discussed, so  that it appear in the relevant thread.
 
</APOLOGY>


Tom> [...] there was immediately objection as to whether his idea of TPC-B 
Tom> compliance was actually right.

From my point of view TPC-* are simply objective examples of typical 
benchmark requirements to show which features are needed in a tool for 
doing this activity. Once features are available, I think that pgbench 
should also be a show-case for their usage. Currently a few functions (for 
implementing the bench as specified) and actually extracting results into 
variables (for suspicious auditors and bench relevance, see below) are 
missing.

Tom> I remember complaining that he had a totally artificial idea of what 
Tom> "fetching a data value" requires.

Yep.

I think that the key misunderstanding is that you are honest and assume 
that other people are honest too. This is naïve: There is a long history 
of vendors creatively "cheating" to get better than deserve benchmark 
results. Benchmark specifications try to prevent such behaviors by laying 
careful requirements and procedures.

In this instance, you "know" that when pg has returned the result of the 
query the data is actually on the client side, so you considered it is 
fetched. That is fine for you, but from a benchmarking perspective with 
external auditors your belief/knowledge is not good enough.

For instance, the vendor could implement a new version of the protocol 
where the data are only transfered on demand, and the result just tells 
that the data is indeed somewhere on the server (eg on "SELECT abalance" 
it could just check that the key exists, no need to actually fetch the 
data from the table, so no need to read the table, the index is 
enough...). That would be pretty stupid for real application performance, 
but the benchmark would get better tps by doing so.

Without even intentionnaly cheating, this could be part of a useful 
"streaming mode" protocol option which make sense for very large results 
but would be activated for a small result.

Another point is that decoding the message may be a little expensive, so 
that by not actually extracting the data into the client but just keeping 
it in the connection/OS one gets better performance.

Thus, TPC-B 2.0.0 benchmark specification says:

"1.3.2 Each transaction shall return to the driver the Account_Balance 
resulting from successful commit of the transaction.

Comment: It is the intent of this clause that the account balance in the 
database be returned to the driver, i.e., that the application retrieve 
the account balance."

For me the correct interpretation of "the APPLICATION retrieve the account 
balance" is that the client application code, pgbench in this context, did 
indeed get the value from the vendor code, here "libpq" which is handling 
the connection.

Having the value discarded from libpq by calling PQclear instead of 
PQntuples/PQgetvalue/... skips a key part of the client code that no real 
application would skip. This looks strange and is not representative of 
real client code: as a potential auditor, because of this performance 
impact doubt and lack of relevance, I would not check the corresponding 
item in the audit check list:
  "11.3.1.2 Verify that transaction inputs and outputs satisfy Clause 1.3."

So the benchmark implementation would not be validated.


Another trivial reason to be able to actually retrieve data is that for 
benchmarking purpose it is very easy to want to test a scenario where you 
do different things based on data received, which imply that the data can 
be manipulated somehow on the benchmarking client side, which is currently 
not possible.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Ashutosh Sharma
Date:
Subject: Re: [HACKERS] pageinspect: Hash index support
Next
From: Fabien COELHO
Date:
Subject: Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands:\quit_if, \quit_unless)