Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend] - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]
Date
Msg-id 199809220447.AAA09217@candle.pha.pa.us
Whole thread Raw
In response to Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]  ("J. Michael Roberts" <mirobert@cs.indiana.edu>)
List pgsql-hackers
>
> Yeek.  Some hackles seem to getting raised.
>
> I am one of the people who support a more stringent regression test.
> Bruce, I don't think anyone in their right mind could possibly accuse you
> of doing less than a superhuman job here.  So I think there's no need for
> you to react defensively.

I'll accept the 'superhuman' compliment, though I really think it
belongs to Vadim.  Vadim, here it is.

> But the fact remains that I, for one, am not going to recommend PG for any
> app that I'm not going to check myself on a daily basis.  Not when normal
> queries like the one that started this mess can cause crashes that will
> never be detected, even if they always did do that.

OK, this has me confused.  This is BETA period.  You are checking on a
daily basis?  I assume you are referring to the beta code as it is
patched, right?  Is this something you did not expect?  Are we fixing
the bugs too quickly?  I don't understand what your expectations are.

> And yes, there has been support from the peanut gallery, as I think Tom
> pointed out, and no, nobody's asked for money.  And yes, the "big guys"
> can be far more cavalier about saying "Oh, yes, we knew about that
> problem, it'll be fixed in the next release hopefully."  But what we're
> really proposing is better documentation of known bugs, and the
> construction of a test suite that will not only check basic functionality,
> but everything anyone can think of that could be considered sort of normal
> usage, and we certainly all have different ideas about what is "normal."
> This, no matter what changes are made, we know where we stand.  That's all
> that has been said.
>
> The idea of separating a more complete "stability test" from the present
> development-time "regression" test, I think, is a valid one.  By the way,
> can anyone tell me why it's called a regression test?  What are we
> regressing from, or are we regretting having tested?  OK, OK, just a
> little humor.
>
> I am perfectly willing to organize a stability test, and I am also more
> than willing to start improving the documentation because I've got to
> anyway to get this beast working well under Windows -- but I'm not ready
> yet, because of that damnable requirement of keeping the family fed and
> the bank from repossessing the house. Towards the end of the year, I hope
> that the curve will take me back towards free time, and then we'll see
> where we stand.
>
> In the meantime, I would hope that all the people doing this incredible
> work don't take all this amiss.  You really are doing a bang-up job.

OK, I have no problem with expanding the regression test to test more
things.  However, I want to clearly outline my expectations.  Of the
past few bugs we have fixed in the beta:

    multi-key system table indexing bug
    bad pfree system table indexing bug
    pg_user access failure
    AND/OR crash

Three of these showed up only on certain platforms, and not the platform
of the coder(me).  Second, the top three did show up in the regression
tests, again only on some platforms.  The other one(the AND/OR) requires
two tables to be joined by index columns, and one of the indexed columns
has to be used in an OR.

So three of the four were already caught by regression, but
unfortunately, only on certain platforms, and the last one is clearly
something related to new OR indexing code.  You could add a regression
test, but I doubt it is going to catch future bugs any more than the
current regression tests.

Thomas maintains the regression tests, and I am sure he would LOVE some
help, or even give the whole area to someone else.   But basically, I
don't see how additional SQL's in the regression suite are going to make
PostgreSQL _that_ much more stable.  Sure, it may catch a few more items
than the current code, but only a few.  Because the query input domain
is nearly infinite, we will still have a vast number of queries that
could potentially fail.

So basically, I am saying, let's beef up the regression suite, if you
think is going to help, and it is going to help, but it is not going to
make _major_ improvements in stability.  You are still going to have to
test the beta at certain intervals in the beta cycle to be sure the
final is going to work 100%.  You could also wait for the final, then
test that and submit bug reports.  We usually have patches or minor
releases after the final to fix those bugs.

Basically, I have a problem with the comment that we need to focus more
on stability.  We focus a _ton_ on stability, because we are not a word
processor that can be restarted if it crashes.  We hold people's data,
and they expect that data to remain stable.  We have had very few
reports of data loss or corruption.

We have been focusing on performance and features, but I don't think we
have sacrificed stability.  In fact, all the bugs reported above are
related to new features added (multi-key system indexes, rewrite system
overhaul, OR indexing).  We get bugs in new features, and they have to
be ironed out.  Many times, the bugs are related to things people had
never had before, i.e. why test the OR indexing code, since we never had
it, so as we add new features like SERIAL, there is going to be NO
regression test for it, because it did not exist before the developer
added it.

regression test additions are not a silver bullet to fix stability
problems.  Having people involved in real-world testing, like we have
now, is what we need.  Yes, it takes time to test things, but we can't
possibly test all the things people are going to do, and taking
developers time away from improving the system to add regression test to
try and approach that infinite input query domain is not really going to
help.

Having clean code that is explained/documented and having developers who
can understand the code, and having people who can test those new
features and changes it the way to go.  I can see this giving far more
benefit to stability than adding queries to the regression suite.

I guess I have seen too many bug reports where someone sends in a query
that I would never have thought to try in a 100 years.  It is that type
of testing that really improves stability.

And the beauty of the system is that once we cut a final, like 6.2.1 or
6.3.2, we have _very_ few bug reports.

I can't see even a 10x increase in a regression test eliminating the
need for a rigirous beta test cycle.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle              |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

pgsql-hackers by date:

Previous
From: "Thomas G. Lockhart"
Date:
Subject: Re: NOT boolfield kills backend
Next
From: Andreas Zeugswetter
Date:
Subject: Re: [HACKERS] Errors inside transactions