Thread: How can we make beta testing better?

How can we make beta testing better?

From
Josh Berkus
Date:
Hackers,

I think 9.3 has given us evidence that our users aren't giving new
versions of PostgreSQL substantial beta testing, or if they are, they
aren't sharing the results with us.

How can we make beta testing better and more effective?  How can we get
more users to actually throw serious workloads at new versions and share
the results?

I've tried a couple of things over the last two years and they haven't
worked all that well.  Since we're about to go into another beta testing
period, we need something new.  Ideas?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: How can we make beta testing better?

From
Rod Taylor
Date:
On Tue, Apr 15, 2014 at 5:47 PM, Josh Berkus <josh@agliodbs.com> wrote:
Hackers,

I think 9.3 has given us evidence that our users aren't giving new
versions of PostgreSQL substantial beta testing, or if they are, they
aren't sharing the results with us.

How can we make beta testing better and more effective?  How can we get
more users to actually throw serious workloads at new versions and share
the results?

I've tried a couple of things over the last two years and they haven't
worked all that well.  Since we're about to go into another beta testing
period, we need something new.  Ideas?

I think it boils down to making it really easy to create a workload generator. Most companies have simple single-threaded regression tests for functionality but very few companies have good parallel workload generators which reflect activities in their production environment.

A documented beta test process/toolset which does the following would help:
1) Enables full query logging
2) Creates a replica of a production DB, record $TIME when it stops.
3) Allow user to make changes (upgrade to 9.4, change hardware, change kernel settings, ...)
4) Plays queries from the CSV logs starting from $TIME mimicking actual timing and transaction boundaries

If Pg can make it easy to duplicate activities currently going on in production inside another environment, I would be pleased to fire a couple billion queries through it over the next few weeks.

#4 should include reporting useful to the project, such as a sampling of queries which performed significantly worse and a few relative performance stats for overall execution time.




Re: How can we make beta testing better?

From
Greg Stark
Date:
On Wed, Apr 16, 2014 at 12:53 AM, Rod Taylor <rod.taylor@gmail.com> wrote:
> 4) Plays queries from the CSV logs starting from $TIME mimicking actual
> timing and transaction boundaries

This ^^

But I recall a number of previous attempts including plugins for
general load testing systems, what happened to them?

Honestly if you really want to load test properly though what you
really want to do is deploy a copy of your entire application and feed
it requests simulating user traffic. That results in more accurate
representation and gives you data that's easier to act on.


-- 
greg



Re: How can we make beta testing better?

From
Josh Berkus
Date:
On 04/17/2014 05:39 AM, Greg Stark wrote:
> On Wed, Apr 16, 2014 at 12:53 AM, Rod Taylor <rod.taylor@gmail.com> wrote:
>> 4) Plays queries from the CSV logs starting from $TIME mimicking actual
>> timing and transaction boundaries
> 
> This ^^
> 
> But I recall a number of previous attempts including plugins for
> general load testing systems, what happened to them?
> 
> Honestly if you really want to load test properly though what you
> really want to do is deploy a copy of your entire application and feed
> it requests simulating user traffic. That results in more accurate
> representation and gives you data that's easier to act on.

Software is available which can do this.  The problem is getting the
workload in the first place.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: How can we make beta testing better?

From
Merlin Moncure
Date:
On Tue, Apr 15, 2014 at 4:47 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Hackers,
>
> I think 9.3 has given us evidence that our users aren't giving new
> versions of PostgreSQL substantial beta testing, or if they are, they
> aren't sharing the results with us.
>
> How can we make beta testing better and more effective?  How can we get
> more users to actually throw serious workloads at new versions and share
> the results?
>
> I've tried a couple of things over the last two years and they haven't
> worked all that well.  Since we're about to go into another beta testing
> period, we need something new.  Ideas?

I've seen lots of bugs reported and fixed in the beta period over the
years.  My take is that it's basically unrealistic to expect volunteer
beta testers to replace bone fide regression testing.

I think it's a pretty fair statement that we've had some QC issues in
the general area of replication technologies.  What this is indicating
to me is that replication needs substantially more coverage in 'make
check'.  Since I'm wishing for things, it would be nice to see an
expansion of the buildfarm so that we could [optionally] run various
performance tests as well as various replication scenarios.  Then we
could go back to users and say, please donate 'repeatable tests and
machines to run them on' and reap the long term value.

Not at all making light out of any of this...it's a huge project.

merlin



Re: How can we make beta testing better?

From
Jeff Janes
Date:
On Tue, Apr 15, 2014 at 2:47 PM, Josh Berkus <josh@agliodbs.com> wrote:
Hackers,

I think 9.3 has given us evidence that our users aren't giving new
versions of PostgreSQL substantial beta testing, or if they are, they
aren't sharing the results with us.

A lot of the bugs that turned up are not the kind I would expect to have been found in most beta testing done by non-hacking users.  Weren't they mostly around rare race conditions, crash recovery, and freezing?

 

How can we make beta testing better and more effective?  How can we get
more users to actually throw serious workloads at new versions and share
the results?

If we are interested in positive results as well as negative, we should change https://wiki.postgresql.org/wiki/HowToBetaTest

"pgsql-hackers: bugs, questions, and successful test reports are welcome here if you are already subscribed to pgsql-hackers. Note that pgsql-hackers is a high-traffic mailing list with a lot of development discussion."

So successful reports are welcome, provided that you are willing to subscribe to a list that generates tons of noise you won't understand.  That doesn't sound all that welcoming.  (I already am subscribed, but I still usually don't report successful tests, because "yeah, I did a bunch of stuff, and nothing failed in an obvious way" just doesn't sound very useful, and it is hard to get motivated to write up an exhaustive description of a test that doesn't prove anything anyway--maybe if I did for a few more hours, it would have found a problem.)

If we want to know how much beta testing is really going on, perhaps we could do a survey asking people whether they did any beta testing, and if so whether they reported the results.  Otherwise it would be hard to distinguish "We aren't doing enough testing" from "We do lots of testing, but it isn't strenuous enough to find the problems, or is testing the wrong aspects of the system".

Cheers,

Jeff

Re: How can we make beta testing better?

From
Jan Wieck
Date:
On 04/17/14 15:16, Merlin Moncure wrote:
> On Tue, Apr 15, 2014 at 4:47 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Hackers,
>>
>> I think 9.3 has given us evidence that our users aren't giving new
>> versions of PostgreSQL substantial beta testing, or if they are, they
>> aren't sharing the results with us.
>>
>> How can we make beta testing better and more effective?  How can we get
>> more users to actually throw serious workloads at new versions and share
>> the results?
>>
>> I've tried a couple of things over the last two years and they haven't
>> worked all that well.  Since we're about to go into another beta testing
>> period, we need something new.  Ideas?
>
> I've seen lots of bugs reported and fixed in the beta period over the
> years.  My take is that it's basically unrealistic to expect volunteer
> beta testers to replace bone fide regression testing.
>
> I think it's a pretty fair statement that we've had some QC issues in
> the general area of replication technologies.  What this is indicating
> to me is that replication needs substantially more coverage in 'make
> check'.  Since I'm wishing for things, it would be nice to see an
> expansion of the buildfarm so that we could [optionally] run various
> performance tests as well as various replication scenarios.  Then we
> could go back to users and say, please donate 'repeatable tests and
> machines to run them on' and reap the long term value.
>
> Not at all making light out of any of this...it's a huge project.

The problem with testing replication is that it doesn't fit well into 
our standard regression testing. There are way too many moving parts as 
well as dependencies on the underlying OS and network topology.

You will discover a ton of race conditions once you actually move from 
testing with multiple postmasters (so you can kill one) on the same box 
to using multiple virtual machines and instead of completely severing a 
network connection using some packet shaping/filtering to introduce 
packet loss, limited bandwidth, async routing and so on. At least that 
is my experience from throwing that sort of sh*t at Slony at full speed.

Not trying to discourage anyone from trying. Just saying that it doesn't 
fit into our existing regression test framework.


Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info



Re: How can we make beta testing better?

From
Josh Berkus
Date:
On 04/15/2014 09:53 PM, Rod Taylor wrote:
> A documented beta test process/toolset which does the following would help:
> 1) Enables full query logging
> 2) Creates a replica of a production DB, record $TIME when it stops.
> 3) Allow user to make changes (upgrade to 9.4, change hardware, change
> kernel settings, ...)
> 4) Plays queries from the CSV logs starting from $TIME mimicking actual
> timing and transaction boundaries
> 
> If Pg can make it easy to duplicate activities currently going on in
> production inside another environment, I would be pleased to fire a couple
> billion queries through it over the next few weeks.
> 
> #4 should include reporting useful to the project, such as a sampling of
> queries which performed significantly worse and a few relative performance
> stats for overall execution time.

So we have some software we've been procrastinating on OSS'ing, which does:

1) Takes full query CSV logs from a running postgres instance
2) Runs them against a target instance in parallel
3) Records response times for all queries

tsung and pgreplay also do this, but have some limitations which make
them impractical for a general set of logs to replay.

What it would need is:

A) scripting around coordinated backups
B) Scripting for single-command runs, including changing pg.conf to
record data.
C) tools to *analyze* the output data, including error messages.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: How can we make beta testing better?

From
Greg Stark
Date:
On Thu, Apr 17, 2014 at 5:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> A lot of the bugs that turned up are not the kind I would expect to have
> been found in most beta testing done by non-hacking users.  Weren't they
> mostly around rare race conditions, crash recovery, and freezing?

Actually I was struck by how the bugs in 9.3 were the kinds of bugs
that should have turned up pretty quickly by user testing. They
certainly turned up pretty quickly after users put their production
applications on it. They *didn't* require rare race conditions, just
certain patterns of workloads for long enough to reliably reproduce.

They were specifically *not* the kinds of bugs that regression testing
would have found. Regression testing only finds bugs you anticipate
and think to put in the specification of correct behaviour. If you had
thought of these problems you would have tested them manually and in
any case you would have seen the omissions immediately on inspected
the code.

Crash recovery and freezing aren't rare things once you have hot
standbys everywhere and run 24x7 applications (or load tests) on your
systems. We could make freezing more frequent by having a mode that
bumps the xid by a few million randomly. That would still be pretty
hit and miss whether it happens to wrap around in any particular
state.

-- 
greg



Re: How can we make beta testing better?

From
"MauMau"
Date:
From: "Josh Berkus" <josh@agliodbs.com>
> How can we make beta testing better and more effective?  How can we get
> more users to actually throw serious workloads at new versions and share
> the results?
>
> I've tried a couple of things over the last two years and they haven't
> worked all that well.  Since we're about to go into another beta testing
> period, we need something new.  Ideas?

I've had a look at the mail for 9.3 beta 1:

http://www.postgresql.org/message-id/517C64F4.6000006@agliodbs.com

Please excuse me for my misunderstanding and cheekiness.  I feel the 
followings may need to be addressed:

* Existing and (more importantly) new users don't know about the beta 
release.
I remember I saw the news about new features of MySQL on the daily mail news 
of some famous IT media, such as IDG's Computerworld, Infoworld, and 
Japanese IT industry news media.  On the other hand, I'm afraid I hadn't see 
news about new features of PostgreSQL until its final release was out.  Is 
subscribing to pgsql-announce the only way to know the beta release?

* Existing users are satisfied with the version they are using for their 
current use cases, so they don't have motivation to try a new release.
To increase the use cases of existing users and get more users, more 
attractive features may be necessary such as in-memory database, columnar 
database, MPP for scaleout, database compression, integration with Hadoop, 
multi-tenant database, Oracle compatibility (this should be very important 
to get many users in practice), etc.  I think eye-catching features like 
streaming replication and materialized views are necessary to widen 
PostgreSQL use.

* Explain new features by associating them with the new trends like cloud, 
mobile, social, big data.

Regards
MauMau






Re: How can we make beta testing better?

From
Alvaro Herrera
Date:
MauMau wrote:
> From: "Josh Berkus" <josh@agliodbs.com>
> >How can we make beta testing better and more effective?  How can we get
> >more users to actually throw serious workloads at new versions and share
> >the results?
> >
> >I've tried a couple of things over the last two years and they haven't
> >worked all that well.  Since we're about to go into another beta testing
> >period, we need something new.  Ideas?
> 
> I've had a look at the mail for 9.3 beta 1:
> 
> http://www.postgresql.org/message-id/517C64F4.6000006@agliodbs.com

This call for testing is interesting: note there was not a single
mention of foreign key locking getting a huge change, and please test it
and see whether it still works at all for you.  I asked Josh
specifically to mention it in a followup to this message which you can
see in that thread.  There was no reply, which I took as "this isn't a
new feature and isn't user visible anyway, so what would be the point?"

But hey, what the hell do I know about advocacy anyway, huh?  So I shut
up.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: How can we make beta testing better?

From
Greg Stark
Date:
On Fri, Apr 18, 2014 at 4:15 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>  There was no reply, which I took as "this isn't a
> new feature and isn't user visible anyway, so what would be the point?"

To be fair the list was pretty long already. And like regression
testing, coming up with a list of things to test is kind of beside the
point. If we knew which things had to be tested then we would have
tested them already. The point of beta testing is to have people run
their applications on it in case their applications are doing stuff we
*didn't* anticipate needing to test.

I think the take-away here is that testing for long periods of time
under load and while using the full suite of complex configurations
simultaneously that they use in production (such as hot standby
replicas to spread load) is what's important.

-- 
greg



Re: How can we make beta testing better?

From
Josh Berkus
Date:
On 04/18/2014 08:15 AM, Alvaro Herrera wrote:
> and see whether it still works at all for you.  I asked Josh
> specifically to mention it in a followup to this message which you can
> see in that thread.  There was no reply, which I took as "this isn't a
> new feature and isn't user visible anyway, so what would be the point?"

It's not possible for a single document to be both a press release
(which that was) and a technical list of things to test.  That needs to
be two different documents ... and the technical one needs to somehow
evolve, so that we can add things to it as we discover potential problem
areas.

This does show how we're not really doing anything to publicize betas as
"please test this" though.  For that matter, our /beta page doesn't
really inspire someone to jump up and test things.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: How can we make beta testing better?

From
Jehan-Guillaume de Rorthais
Date:
On Thu, 17 Apr 2014 16:42:21 -0700
Josh Berkus <josh@agliodbs.com> wrote:

> On 04/15/2014 09:53 PM, Rod Taylor wrote:
> > A documented beta test process/toolset which does the following would help:
> > 1) Enables full query logging
> > 2) Creates a replica of a production DB, record $TIME when it stops.
> > 3) Allow user to make changes (upgrade to 9.4, change hardware, change
> > kernel settings, ...)
> > 4) Plays queries from the CSV logs starting from $TIME mimicking actual
> > timing and transaction boundaries
> > 
> > If Pg can make it easy to duplicate activities currently going on in
> > production inside another environment, I would be pleased to fire a couple
> > billion queries through it over the next few weeks.
> > 
> > #4 should include reporting useful to the project, such as a sampling of
> > queries which performed significantly worse and a few relative performance
> > stats for overall execution time.
> 
> So we have some software we've been procrastinating on OSS'ing, which does:
> 
> 1) Takes full query CSV logs from a running postgres instance
> 2) Runs them against a target instance in parallel
> 3) Records response times for all queries
> 
> tsung and pgreplay also do this, but have some limitations which make
> them impractical for a general set of logs to replay.

I've been working on another tool able to replay scenario recorded directly
from a network dump (see [pgshark]). It works, can be totally transparent from
the application point of view, the tcpdump can run anywhere, and **ALL** the
real traffic can be replayed...but it needs some more work for reporting and
handling parallel sessions. The drawback of using libpcap is that you can lost
packets while capturing and a very large capture buffer can not keep you safe
for hours of high-speed scenario. So it might require multiple capture and
adjusting the buffer size to capture 100% of the traffic on the required period.

I tried to quickly write a simple proxy using Perl POE to capture ALL the
traffic safely. My POC was doing nothing but forwarding packets and IIRC a 30s
stress test with 10 or 20 sessions using pgbench showed a drop of ~60% of
performances. But it was a very quick POC with a mono-processus/mono-thread
POC.

Maybe another path would be to be able to generate some this traffic dump
from PostgreSQL (which only have the application level to deal with) itself in a
format we can feed to pgbench. 

> What it would need is:
> 
> A) scripting around coordinated backups
> B) Scripting for single-command runs, including changing pg.conf to
> record data.

Changing the pg.conf is pretty easy with alter system now. But I'm sure we all
have some scripts out there doing this (at least I do)

> C) tools to *analyze* the output data, including error messages.

That's what I lack in pgshark so far.

[pgshark] https://github.com/dalibo/pgshark

Cheers,
-- 
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com



Re: How can we make beta testing better?

From
Jim Nasby
Date:
On 4/17/14, 6:42 PM, Josh Berkus wrote:
> So we have some software we've been procrastinating on OSS'ing, which does:
>
> 1) Takes full query CSV logs from a running postgres instance
> 2) Runs them against a target instance in parallel
> 3) Records response times for all queries

Is that the stuff you'd worked on for us forever ago? I thought that was just pgreplay based, but now I don't
remember.
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: How can we make beta testing better?

From
Josh Berkus
Date:
On 04/26/2014 12:39 PM, Jim Nasby wrote:
> On 4/17/14, 6:42 PM, Josh Berkus wrote:
>> So we have some software we've been procrastinating on OSS'ing, which
>> does:
>>
>> 1) Takes full query CSV logs from a running postgres instance
>> 2) Runs them against a target instance in parallel
>> 3) Records response times for all queries
> 
> Is that the stuff you'd worked on for us forever ago? I thought that was
> just pgreplay based, but now I don't remember.

Yes.  It's based around the same principles as pgreplay -- a replay from
logs -- but used python and multiprocess to allow for much higher levels
of concurrency as well as not completely halting due to deadlock errors.

However, this is all a bit of a tangent.  While we want to give users
better tools for testing, we also need some reasonable way to get them
*involved* in testing.  That's the main thing I was looking for feedback on.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com