Thread: How can we make beta testing better?
Hackers, I think 9.3 has given us evidence that our users aren't giving new versions of PostgreSQL substantial beta testing, or if they are, they aren't sharing the results with us. How can we make beta testing better and more effective? How can we get more users to actually throw serious workloads at new versions and share the results? I've tried a couple of things over the last two years and they haven't worked all that well. Since we're about to go into another beta testing period, we need something new. Ideas? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Apr 15, 2014 at 5:47 PM, Josh Berkus <josh@agliodbs.com> wrote:
Hackers,
I think 9.3 has given us evidence that our users aren't giving new
versions of PostgreSQL substantial beta testing, or if they are, they
aren't sharing the results with us.
How can we make beta testing better and more effective? How can we get
more users to actually throw serious workloads at new versions and share
the results?
I've tried a couple of things over the last two years and they haven't
worked all that well. Since we're about to go into another beta testing
period, we need something new. Ideas?
I think it boils down to making it really easy to create a workload generator. Most companies have simple single-threaded regression tests for functionality but very few companies have good parallel workload generators which reflect activities in their production environment.
A documented beta test process/toolset which does the following would help:
1) Enables full query logging
2) Creates a replica of a production DB, record $TIME when it stops.
3) Allow user to make changes (upgrade to 9.4, change hardware, change kernel settings, ...)
4) Plays queries from the CSV logs starting from $TIME mimicking actual timing and transaction boundaries
If Pg can make it easy to duplicate activities currently going on in production inside another environment, I would be pleased to fire a couple billion queries through it over the next few weeks.
#4 should include reporting useful to the project, such as a sampling of queries which performed significantly worse and a few relative performance stats for overall execution time.
On Wed, Apr 16, 2014 at 12:53 AM, Rod Taylor <rod.taylor@gmail.com> wrote: > 4) Plays queries from the CSV logs starting from $TIME mimicking actual > timing and transaction boundaries This ^^ But I recall a number of previous attempts including plugins for general load testing systems, what happened to them? Honestly if you really want to load test properly though what you really want to do is deploy a copy of your entire application and feed it requests simulating user traffic. That results in more accurate representation and gives you data that's easier to act on. -- greg
On 04/17/2014 05:39 AM, Greg Stark wrote: > On Wed, Apr 16, 2014 at 12:53 AM, Rod Taylor <rod.taylor@gmail.com> wrote: >> 4) Plays queries from the CSV logs starting from $TIME mimicking actual >> timing and transaction boundaries > > This ^^ > > But I recall a number of previous attempts including plugins for > general load testing systems, what happened to them? > > Honestly if you really want to load test properly though what you > really want to do is deploy a copy of your entire application and feed > it requests simulating user traffic. That results in more accurate > representation and gives you data that's easier to act on. Software is available which can do this. The problem is getting the workload in the first place. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Apr 15, 2014 at 4:47 PM, Josh Berkus <josh@agliodbs.com> wrote: > Hackers, > > I think 9.3 has given us evidence that our users aren't giving new > versions of PostgreSQL substantial beta testing, or if they are, they > aren't sharing the results with us. > > How can we make beta testing better and more effective? How can we get > more users to actually throw serious workloads at new versions and share > the results? > > I've tried a couple of things over the last two years and they haven't > worked all that well. Since we're about to go into another beta testing > period, we need something new. Ideas? I've seen lots of bugs reported and fixed in the beta period over the years. My take is that it's basically unrealistic to expect volunteer beta testers to replace bone fide regression testing. I think it's a pretty fair statement that we've had some QC issues in the general area of replication technologies. What this is indicating to me is that replication needs substantially more coverage in 'make check'. Since I'm wishing for things, it would be nice to see an expansion of the buildfarm so that we could [optionally] run various performance tests as well as various replication scenarios. Then we could go back to users and say, please donate 'repeatable tests and machines to run them on' and reap the long term value. Not at all making light out of any of this...it's a huge project. merlin
On Tue, Apr 15, 2014 at 2:47 PM, Josh Berkus <josh@agliodbs.com> wrote:
Hackers,
I think 9.3 has given us evidence that our users aren't giving new
versions of PostgreSQL substantial beta testing, or if they are, they
aren't sharing the results with us.
A lot of the bugs that turned up are not the kind I would expect to have been found in most beta testing done by non-hacking users. Weren't they mostly around rare race conditions, crash recovery, and freezing?
How can we make beta testing better and more effective? How can we get
more users to actually throw serious workloads at new versions and share
the results?
If we are interested in positive results as well as negative, we should change https://wiki.postgresql.org/wiki/HowToBetaTest
"pgsql-hackers: bugs, questions, and successful test reports are welcome here if you are already subscribed to pgsql-hackers. Note that pgsql-hackers is a high-traffic mailing list with a lot of development discussion."
So successful reports are welcome, provided that you are willing to subscribe to a list that generates tons of noise you won't understand. That doesn't sound all that welcoming. (I already am subscribed, but I still usually don't report successful tests, because "yeah, I did a bunch of stuff, and nothing failed in an obvious way" just doesn't sound very useful, and it is hard to get motivated to write up an exhaustive description of a test that doesn't prove anything anyway--maybe if I did for a few more hours, it would have found a problem.)
If we want to know how much beta testing is really going on, perhaps we could do a survey asking people whether they did any beta testing, and if so whether they reported the results. Otherwise it would be hard to distinguish "We aren't doing enough testing" from "We do lots of testing, but it isn't strenuous enough to find the problems, or is testing the wrong aspects of the system".
Cheers,
Jeff
On 04/17/14 15:16, Merlin Moncure wrote: > On Tue, Apr 15, 2014 at 4:47 PM, Josh Berkus <josh@agliodbs.com> wrote: >> Hackers, >> >> I think 9.3 has given us evidence that our users aren't giving new >> versions of PostgreSQL substantial beta testing, or if they are, they >> aren't sharing the results with us. >> >> How can we make beta testing better and more effective? How can we get >> more users to actually throw serious workloads at new versions and share >> the results? >> >> I've tried a couple of things over the last two years and they haven't >> worked all that well. Since we're about to go into another beta testing >> period, we need something new. Ideas? > > I've seen lots of bugs reported and fixed in the beta period over the > years. My take is that it's basically unrealistic to expect volunteer > beta testers to replace bone fide regression testing. > > I think it's a pretty fair statement that we've had some QC issues in > the general area of replication technologies. What this is indicating > to me is that replication needs substantially more coverage in 'make > check'. Since I'm wishing for things, it would be nice to see an > expansion of the buildfarm so that we could [optionally] run various > performance tests as well as various replication scenarios. Then we > could go back to users and say, please donate 'repeatable tests and > machines to run them on' and reap the long term value. > > Not at all making light out of any of this...it's a huge project. The problem with testing replication is that it doesn't fit well into our standard regression testing. There are way too many moving parts as well as dependencies on the underlying OS and network topology. You will discover a ton of race conditions once you actually move from testing with multiple postmasters (so you can kill one) on the same box to using multiple virtual machines and instead of completely severing a network connection using some packet shaping/filtering to introduce packet loss, limited bandwidth, async routing and so on. At least that is my experience from throwing that sort of sh*t at Slony at full speed. Not trying to discourage anyone from trying. Just saying that it doesn't fit into our existing regression test framework. Jan -- Jan Wieck Senior Software Engineer http://slony.info
On 04/15/2014 09:53 PM, Rod Taylor wrote: > A documented beta test process/toolset which does the following would help: > 1) Enables full query logging > 2) Creates a replica of a production DB, record $TIME when it stops. > 3) Allow user to make changes (upgrade to 9.4, change hardware, change > kernel settings, ...) > 4) Plays queries from the CSV logs starting from $TIME mimicking actual > timing and transaction boundaries > > If Pg can make it easy to duplicate activities currently going on in > production inside another environment, I would be pleased to fire a couple > billion queries through it over the next few weeks. > > #4 should include reporting useful to the project, such as a sampling of > queries which performed significantly worse and a few relative performance > stats for overall execution time. So we have some software we've been procrastinating on OSS'ing, which does: 1) Takes full query CSV logs from a running postgres instance 2) Runs them against a target instance in parallel 3) Records response times for all queries tsung and pgreplay also do this, but have some limitations which make them impractical for a general set of logs to replay. What it would need is: A) scripting around coordinated backups B) Scripting for single-command runs, including changing pg.conf to record data. C) tools to *analyze* the output data, including error messages. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Apr 17, 2014 at 5:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > A lot of the bugs that turned up are not the kind I would expect to have > been found in most beta testing done by non-hacking users. Weren't they > mostly around rare race conditions, crash recovery, and freezing? Actually I was struck by how the bugs in 9.3 were the kinds of bugs that should have turned up pretty quickly by user testing. They certainly turned up pretty quickly after users put their production applications on it. They *didn't* require rare race conditions, just certain patterns of workloads for long enough to reliably reproduce. They were specifically *not* the kinds of bugs that regression testing would have found. Regression testing only finds bugs you anticipate and think to put in the specification of correct behaviour. If you had thought of these problems you would have tested them manually and in any case you would have seen the omissions immediately on inspected the code. Crash recovery and freezing aren't rare things once you have hot standbys everywhere and run 24x7 applications (or load tests) on your systems. We could make freezing more frequent by having a mode that bumps the xid by a few million randomly. That would still be pretty hit and miss whether it happens to wrap around in any particular state. -- greg
From: "Josh Berkus" <josh@agliodbs.com> > How can we make beta testing better and more effective? How can we get > more users to actually throw serious workloads at new versions and share > the results? > > I've tried a couple of things over the last two years and they haven't > worked all that well. Since we're about to go into another beta testing > period, we need something new. Ideas? I've had a look at the mail for 9.3 beta 1: http://www.postgresql.org/message-id/517C64F4.6000006@agliodbs.com Please excuse me for my misunderstanding and cheekiness. I feel the followings may need to be addressed: * Existing and (more importantly) new users don't know about the beta release. I remember I saw the news about new features of MySQL on the daily mail news of some famous IT media, such as IDG's Computerworld, Infoworld, and Japanese IT industry news media. On the other hand, I'm afraid I hadn't see news about new features of PostgreSQL until its final release was out. Is subscribing to pgsql-announce the only way to know the beta release? * Existing users are satisfied with the version they are using for their current use cases, so they don't have motivation to try a new release. To increase the use cases of existing users and get more users, more attractive features may be necessary such as in-memory database, columnar database, MPP for scaleout, database compression, integration with Hadoop, multi-tenant database, Oracle compatibility (this should be very important to get many users in practice), etc. I think eye-catching features like streaming replication and materialized views are necessary to widen PostgreSQL use. * Explain new features by associating them with the new trends like cloud, mobile, social, big data. Regards MauMau
MauMau wrote: > From: "Josh Berkus" <josh@agliodbs.com> > >How can we make beta testing better and more effective? How can we get > >more users to actually throw serious workloads at new versions and share > >the results? > > > >I've tried a couple of things over the last two years and they haven't > >worked all that well. Since we're about to go into another beta testing > >period, we need something new. Ideas? > > I've had a look at the mail for 9.3 beta 1: > > http://www.postgresql.org/message-id/517C64F4.6000006@agliodbs.com This call for testing is interesting: note there was not a single mention of foreign key locking getting a huge change, and please test it and see whether it still works at all for you. I asked Josh specifically to mention it in a followup to this message which you can see in that thread. There was no reply, which I took as "this isn't a new feature and isn't user visible anyway, so what would be the point?" But hey, what the hell do I know about advocacy anyway, huh? So I shut up. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Apr 18, 2014 at 4:15 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > There was no reply, which I took as "this isn't a > new feature and isn't user visible anyway, so what would be the point?" To be fair the list was pretty long already. And like regression testing, coming up with a list of things to test is kind of beside the point. If we knew which things had to be tested then we would have tested them already. The point of beta testing is to have people run their applications on it in case their applications are doing stuff we *didn't* anticipate needing to test. I think the take-away here is that testing for long periods of time under load and while using the full suite of complex configurations simultaneously that they use in production (such as hot standby replicas to spread load) is what's important. -- greg
On 04/18/2014 08:15 AM, Alvaro Herrera wrote: > and see whether it still works at all for you. I asked Josh > specifically to mention it in a followup to this message which you can > see in that thread. There was no reply, which I took as "this isn't a > new feature and isn't user visible anyway, so what would be the point?" It's not possible for a single document to be both a press release (which that was) and a technical list of things to test. That needs to be two different documents ... and the technical one needs to somehow evolve, so that we can add things to it as we discover potential problem areas. This does show how we're not really doing anything to publicize betas as "please test this" though. For that matter, our /beta page doesn't really inspire someone to jump up and test things. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, 17 Apr 2014 16:42:21 -0700 Josh Berkus <josh@agliodbs.com> wrote: > On 04/15/2014 09:53 PM, Rod Taylor wrote: > > A documented beta test process/toolset which does the following would help: > > 1) Enables full query logging > > 2) Creates a replica of a production DB, record $TIME when it stops. > > 3) Allow user to make changes (upgrade to 9.4, change hardware, change > > kernel settings, ...) > > 4) Plays queries from the CSV logs starting from $TIME mimicking actual > > timing and transaction boundaries > > > > If Pg can make it easy to duplicate activities currently going on in > > production inside another environment, I would be pleased to fire a couple > > billion queries through it over the next few weeks. > > > > #4 should include reporting useful to the project, such as a sampling of > > queries which performed significantly worse and a few relative performance > > stats for overall execution time. > > So we have some software we've been procrastinating on OSS'ing, which does: > > 1) Takes full query CSV logs from a running postgres instance > 2) Runs them against a target instance in parallel > 3) Records response times for all queries > > tsung and pgreplay also do this, but have some limitations which make > them impractical for a general set of logs to replay. I've been working on another tool able to replay scenario recorded directly from a network dump (see [pgshark]). It works, can be totally transparent from the application point of view, the tcpdump can run anywhere, and **ALL** the real traffic can be replayed...but it needs some more work for reporting and handling parallel sessions. The drawback of using libpcap is that you can lost packets while capturing and a very large capture buffer can not keep you safe for hours of high-speed scenario. So it might require multiple capture and adjusting the buffer size to capture 100% of the traffic on the required period. I tried to quickly write a simple proxy using Perl POE to capture ALL the traffic safely. My POC was doing nothing but forwarding packets and IIRC a 30s stress test with 10 or 20 sessions using pgbench showed a drop of ~60% of performances. But it was a very quick POC with a mono-processus/mono-thread POC. Maybe another path would be to be able to generate some this traffic dump from PostgreSQL (which only have the application level to deal with) itself in a format we can feed to pgbench. > What it would need is: > > A) scripting around coordinated backups > B) Scripting for single-command runs, including changing pg.conf to > record data. Changing the pg.conf is pretty easy with alter system now. But I'm sure we all have some scripts out there doing this (at least I do) > C) tools to *analyze* the output data, including error messages. That's what I lack in pgshark so far. [pgshark] https://github.com/dalibo/pgshark Cheers, -- Jehan-Guillaume de Rorthais Dalibo http://www.dalibo.com
On 4/17/14, 6:42 PM, Josh Berkus wrote: > So we have some software we've been procrastinating on OSS'ing, which does: > > 1) Takes full query CSV logs from a running postgres instance > 2) Runs them against a target instance in parallel > 3) Records response times for all queries Is that the stuff you'd worked on for us forever ago? I thought that was just pgreplay based, but now I don't remember. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 04/26/2014 12:39 PM, Jim Nasby wrote: > On 4/17/14, 6:42 PM, Josh Berkus wrote: >> So we have some software we've been procrastinating on OSS'ing, which >> does: >> >> 1) Takes full query CSV logs from a running postgres instance >> 2) Runs them against a target instance in parallel >> 3) Records response times for all queries > > Is that the stuff you'd worked on for us forever ago? I thought that was > just pgreplay based, but now I don't remember. Yes. It's based around the same principles as pgreplay -- a replay from logs -- but used python and multiprocess to allow for much higher levels of concurrency as well as not completely halting due to deadlock errors. However, this is all a bit of a tangent. While we want to give users better tools for testing, we also need some reasonable way to get them *involved* in testing. That's the main thing I was looking for feedback on. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com