Re: How can we make beta testing better? - Mailing list pgsql-hackers
From | Jehan-Guillaume de Rorthais |
---|---|
Subject | Re: How can we make beta testing better? |
Date | |
Msg-id | 20140423075514.2ae7ee84@erg Whole thread Raw |
In response to | Re: How can we make beta testing better? (Josh Berkus <josh@agliodbs.com>) |
List | pgsql-hackers |
On Thu, 17 Apr 2014 16:42:21 -0700 Josh Berkus <josh@agliodbs.com> wrote: > On 04/15/2014 09:53 PM, Rod Taylor wrote: > > A documented beta test process/toolset which does the following would help: > > 1) Enables full query logging > > 2) Creates a replica of a production DB, record $TIME when it stops. > > 3) Allow user to make changes (upgrade to 9.4, change hardware, change > > kernel settings, ...) > > 4) Plays queries from the CSV logs starting from $TIME mimicking actual > > timing and transaction boundaries > > > > If Pg can make it easy to duplicate activities currently going on in > > production inside another environment, I would be pleased to fire a couple > > billion queries through it over the next few weeks. > > > > #4 should include reporting useful to the project, such as a sampling of > > queries which performed significantly worse and a few relative performance > > stats for overall execution time. > > So we have some software we've been procrastinating on OSS'ing, which does: > > 1) Takes full query CSV logs from a running postgres instance > 2) Runs them against a target instance in parallel > 3) Records response times for all queries > > tsung and pgreplay also do this, but have some limitations which make > them impractical for a general set of logs to replay. I've been working on another tool able to replay scenario recorded directly from a network dump (see [pgshark]). It works, can be totally transparent from the application point of view, the tcpdump can run anywhere, and **ALL** the real traffic can be replayed...but it needs some more work for reporting and handling parallel sessions. The drawback of using libpcap is that you can lost packets while capturing and a very large capture buffer can not keep you safe for hours of high-speed scenario. So it might require multiple capture and adjusting the buffer size to capture 100% of the traffic on the required period. I tried to quickly write a simple proxy using Perl POE to capture ALL the traffic safely. My POC was doing nothing but forwarding packets and IIRC a 30s stress test with 10 or 20 sessions using pgbench showed a drop of ~60% of performances. But it was a very quick POC with a mono-processus/mono-thread POC. Maybe another path would be to be able to generate some this traffic dump from PostgreSQL (which only have the application level to deal with) itself in a format we can feed to pgbench. > What it would need is: > > A) scripting around coordinated backups > B) Scripting for single-command runs, including changing pg.conf to > record data. Changing the pg.conf is pretty easy with alter system now. But I'm sure we all have some scripts out there doing this (at least I do) > C) tools to *analyze* the output data, including error messages. That's what I lack in pgshark so far. [pgshark] https://github.com/dalibo/pgshark Cheers, -- Jehan-Guillaume de Rorthais Dalibo http://www.dalibo.com
pgsql-hackers by date: