Re: How can we make beta testing better? - Mailing list pgsql-hackers

From Jehan-Guillaume de Rorthais
Subject Re: How can we make beta testing better?
Date
Msg-id 20140423075514.2ae7ee84@erg
Whole thread Raw
In response to Re: How can we make beta testing better?  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On Thu, 17 Apr 2014 16:42:21 -0700
Josh Berkus <josh@agliodbs.com> wrote:

> On 04/15/2014 09:53 PM, Rod Taylor wrote:
> > A documented beta test process/toolset which does the following would help:
> > 1) Enables full query logging
> > 2) Creates a replica of a production DB, record $TIME when it stops.
> > 3) Allow user to make changes (upgrade to 9.4, change hardware, change
> > kernel settings, ...)
> > 4) Plays queries from the CSV logs starting from $TIME mimicking actual
> > timing and transaction boundaries
> > 
> > If Pg can make it easy to duplicate activities currently going on in
> > production inside another environment, I would be pleased to fire a couple
> > billion queries through it over the next few weeks.
> > 
> > #4 should include reporting useful to the project, such as a sampling of
> > queries which performed significantly worse and a few relative performance
> > stats for overall execution time.
> 
> So we have some software we've been procrastinating on OSS'ing, which does:
> 
> 1) Takes full query CSV logs from a running postgres instance
> 2) Runs them against a target instance in parallel
> 3) Records response times for all queries
> 
> tsung and pgreplay also do this, but have some limitations which make
> them impractical for a general set of logs to replay.

I've been working on another tool able to replay scenario recorded directly
from a network dump (see [pgshark]). It works, can be totally transparent from
the application point of view, the tcpdump can run anywhere, and **ALL** the
real traffic can be replayed...but it needs some more work for reporting and
handling parallel sessions. The drawback of using libpcap is that you can lost
packets while capturing and a very large capture buffer can not keep you safe
for hours of high-speed scenario. So it might require multiple capture and
adjusting the buffer size to capture 100% of the traffic on the required period.

I tried to quickly write a simple proxy using Perl POE to capture ALL the
traffic safely. My POC was doing nothing but forwarding packets and IIRC a 30s
stress test with 10 or 20 sessions using pgbench showed a drop of ~60% of
performances. But it was a very quick POC with a mono-processus/mono-thread
POC.

Maybe another path would be to be able to generate some this traffic dump
from PostgreSQL (which only have the application level to deal with) itself in a
format we can feed to pgbench. 

> What it would need is:
> 
> A) scripting around coordinated backups
> B) Scripting for single-command runs, including changing pg.conf to
> record data.

Changing the pg.conf is pretty easy with alter system now. But I'm sure we all
have some scripts out there doing this (at least I do)

> C) tools to *analyze* the output data, including error messages.

That's what I lack in pgshark so far.

[pgshark] https://github.com/dalibo/pgshark

Cheers,
-- 
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com



pgsql-hackers by date:

Previous
From: yamt@netbsd.org (YAMAMOTO Takashi)
Date:
Subject: Re: Perfomance degradation 9.3 (vs 9.2) for FreeBSD
Next
From: Simon Riggs
Date:
Subject: 9.4 Proposal: Initdb creates a single table