Re: pgbench throttling latency limit - Mailing list pgsql-hackers

From Gregory Smith
Subject Re: pgbench throttling latency limit
Date
Msg-id 54133B15.8060800@gmail.com
Whole thread Raw
In response to Re: pgbench throttling latency limit  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: pgbench throttling latency limit
List pgsql-hackers
On 9/10/14, 10:57 AM, Fabien COELHO wrote:
> Indeed. I think that people do not like it to change. I remember that 
> I suggested to change timestamps to "xxxx.yyyyyy" instead of the 
> unreadable "xxxx yyy", and be told not to, because some people have 
> tool which process the output so the format MUST NOT CHANGE. So my 
> behavior is not to avoid touching anything in this area.

That somewhat hysterical version of events isn't what I said. Heikki has 
the right idea for backpatching, so let me expand on that rationale, 
with an eye toward whether 9.5 is the right time to deal with this.

Not all software out there will process epoch timestamps with 
milliseconds added as a fraction at the end.  Being able to read an 
epoch time in seconds as an integer is a well defined standard; the 
fraction part is not.

Here's an example of the problem, from a Mac OS X system:

$ date -j -f "%a %b %d %T %Z %Y" "`date`" "+%s"
1410544903
$ date -r 1410544903
Fri Sep 12 14:01:43 EDT 2014
$ date -r 1410544903.532
usage: date [-jnu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...            [-f fmt date |
[[[mm]dd]HH]MM[[cc]yy][.ss]][+format]
 

The current file format allows any random shell script to use a tool 
like cut to pull out the second resolution timestamp column as an epoch 
integer field, then pass it through even a utility as simple as date to 
reformat that.  And for a lot of people, second resolution is perfectly 
fine anyway.

The change you propose will make that job harder for some people, in 
order to make the job you're interested in easier.  I picked the 
simplest possible example, but there are more.  Whether epoch timestamps 
can have millisecond parts depends on your time library in Java, in 
Python some behavior depends on whether you have 2.6 or earlier, I don't 
think gnuplot handles milllisecond ones at all yet; the list goes on and 
on.  Some people will just have to apply a second split for timestamp 
string pgbench outputs, at the period and use the left side, where right 
now they can just split the whole thing on a space.

What you want to do is actually fine with me--and as far as I know, I'm 
the producer of the most popular pgbench latency parsing script 
around--but it will be a new sort of headache.  I just wanted the 
benefit to outweigh that.  Breaking the existing scripts and burning 
compatibility with simple utilities like date was not worth the tiny 
improvement you wanted in your personal workflow.  That's just not how 
we do things in PostgreSQL.

If there's a good case that the whole format needs to be changed anyway, 
like adding a new field, then we might as well switch to fractional 
epoch timestamps too now though.  When I added timestamps to the latency 
log in 8.3, parsers that handled milliseconds were even more rare.  
Today it's still inconsistent, but the workarounds are good enough to me 
now.  There's a lot more people using things like Python instead of bash 
pipelines here in 2014 too.

-- 
Greg Smith greg.smith@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/



pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: pgcrypto: PGP signatures
Next
From: Robert Haas
Date:
Subject: Re: Support for N synchronous standby servers