Thread: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

[sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
Christopher Oliver
Date:
Sferacarta Software <sferac@bo.nettuno.it> writes:
> ZV> create table t1 ( b bool );
> ZV> insert into t1 values ( 'T' );
>
> ZV> select not b from t1;
>
> Wrong syntax.

Not if he is trying to display the complement of a logical field
rather than restrict a selection.  And regardless, is backend
failure an acceptable out?  How about the following in 6.4beta?

  create table mine (a bool);
  insert into mine values (true);
  select * from mine as t1,mine as t2 where t1.a or not t2.a;

Look folk.  We're falling over on variations of expressions in
both our result attributes and our selection criteria, and smug-
ness won't remove bugs.  A session with gdb showed that at least
in the case of:

  create table t0 (a_id int4 not null, a varchar, a_t1_id int4);
  insert into t0 values (1, 'at0', 0);
  insert into t0 values (2, 'at0', 0);
  create index a_id_t0 on t0 (a_id);
  create index a_t1_id_t0 on t0 (a_t1_id);
  select * from t0 where (a_id = 1 or a_id = 2) and a_t1_id  < 1;

we are dereferencing NULL in the 6.4beta1 query optimizer, and I
suspect the same in my first example.

My I, a humble newcomer, make a suggestion?  Should we place any
legitimate query set we've discovered to cause crashes into our
regression suite?

--
Christopher Oliver                     Traverse Internet
Systems Coordinator                    223 Grandview Pkwy, Suite 108
oliver@traverse.net                    Traverse City, Michigan, 49684
  "What good is a can of worms if you never open it?"  -Bob Arning

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
"J. Michael Roberts"
Date:
On Fri, 18 Sep 1998, Christopher Oliver wrote:
> My I, a humble newcomer, make a suggestion?  Should we place any
> legitimate query set we've discovered to cause crashes into our
> regression suite?

As another newcomer, let me say that I fervently support this notion.  I
don't want to discover these bugs when I build them into production
software!

Michael


Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
Bruce Momjian
Date:
> On Fri, 18 Sep 1998, Christopher Oliver wrote:
> > My I, a humble newcomer, make a suggestion?  Should we place any
> > legitimate query set we've discovered to cause crashes into our
> > regression suite?
>
> As another newcomer, let me say that I fervently support this notion.  I
> don't want to discover these bugs when I build them into production
> software!

We keep the current crash items until they are fixed during beta.  They
go on to the current items list.  If they don't get fixed, they are
moved to the TODO list, where you will see the unfixed ones mentioned.


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle              |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
"Thomas G. Lockhart"
Date:
> > > My I, a humble newcomer, make a suggestion?  Should we place any
> > > legitimate query set we've discovered to cause crashes into our
> > > regression suite?
> We keep the current crash items until they are fixed during beta.
> They go on to the current items list.  If they don't get fixed, they
> are moved to the TODO list, where you will see the unfixed ones
> mentioned.

The regression suite should reasonably be expected to pass on a system
which has an "as good as it gets" installation. We haven't been very
good about moving new features and fixed breakage _into_ the regression
suite once it is fixed, but we shouldn't move known breakage into the
regression suite.

Bruce keeps the ToDo list, and does a good job of filtering problem
statements down to the one line per problem allowed in that list.
Occasionally it would be helpful to carry along more verbiage to
describe a problem, but that would open a can of worms in making the
ToDo list unreadable because it is too verbose.

I'm not sure of the best way to document these kinds of problem
statements. If we generate a "catchall file" of queries which crash, it
runs the great risk of becoming stale and useless. To undertake this we
probably need a volunteer from the developer's list who would be willing
to babysit this file of problem queries, and even better to make sure
that new features move from this file to the regression test.

Hmm, we could have a "breakage" regression suite which could illustrate
broken features and which would fail if a broken feature is fixed. Would
be most effective if we had a volunteer maintainer for it...

btw, for many kinds of problem statements (such as the one which started
this thread), the problem was not known until it was mentioned. It's
likely to be picked up by one of the active developers/maintainers,
though since it has been broken forever it doesn't count as a "must-fix"
bug and may not make it into the v6.4 release. But it does count as a
"should-fix" bug, and it's possible it could be fixed for v6.4...

We were all newcomers to Postgres at one time or another, and ended up
contributing in areas in which we had the most interest. There is always
room for more contributors, especially in areas which aren't at the top
of someone else's list since that area is probably not getting covered
at all.

                      - Tom

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
Christopher Oliver
Date:
> The regression suite should reasonably be expected to pass on a system
> which has an "as good as it gets" installation. We haven't been very
> good about moving new features and fixed breakage _into_ the regression
> suite once it is fixed, but we shouldn't move known breakage into the
> regression suite.

If the queries which produce the problem are not exotic, then why
should they not go there?  I would prefer a regression that took half
an hour or longer on modern hardware yet screened the code to a degree
that I could have reasonable faith in an installation if it passed the
regression.  As it stands, I had to back off of a recommendation even
with a promise of my maintenance.  I'll reiterate that quite common
queries are causing NULL pointers to get chased even in non-beta code.
When someone snoozes, our tests should alert us to that.  I think to
ensure reliability, the regression must be as complete as possible.
It should work as an acceptance criterion.  As it stands, it doesn't
test severely enough for much if any confidence in deployment.

> I'm not sure of the best way to document these kinds of problem
> statements. If we generate a "catchall file" of queries which crash, it
> runs the great risk of becoming stale and useless.

Even vaccinations require periodic renewal.  The only degree to which
a test becomes stale is the degree that a feature is completely removed.

> though since it has been broken forever it doesn't count as a "must-fix"
> bug and may not make it into the v6.4 release. But it does count as a
> "should-fix" bug, and it's possible it could be fixed for v6.4...

Seg faults in the backend for standard queries would count as must-fix
from my view.  The only justification for failure in the non-exotic
queries is that we are ignorant of the bug's existence.  Once we are
aware, fixing these should become priority one.

\begin{soapbox}

At the very time ESR is lecturing the software community that open-
source is the way things should work, we have the opportunity to
support or refute his claims based on how seriously we take repair
and maintenance where non-exotic queries induce damage and failure.

Think about the free UNIX-like kernels.  They are now gaining accept-
ance mostly because they keep running even when folk beat on them.
We have glass jaws, and it shows.  Let's take the required steps to
firm them up.

\end{soapbox}

--
Christopher Oliver                     Traverse Internet
Systems Coordinator                    223 Grandview Pkwy, Suite 108
oliver@traverse.net                    Traverse City, Michigan, 49684
  "What good is a can of worms if you never open it?"  -Bob Arning

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
Bruce Momjian
Date:
> > The regression suite should reasonably be expected to pass on a system
> > which has an "as good as it gets" installation. We haven't been very
> > good about moving new features and fixed breakage _into_ the regression
> > suite once it is fixed, but we shouldn't move known breakage into the
> > regression suite.
>
> If the queries which produce the problem are not exotic, then why
> should they not go there?  I would prefer a regression that took half
> an hour or longer on modern hardware yet screened the code to a degree
> that I could have reasonable faith in an installation if it passed the
> regression.  As it stands, I had to back off of a recommendation even
> with a promise of my maintenance.  I'll reiterate that quite common
> queries are causing NULL pointers to get chased even in non-beta code.
> When someone snoozes, our tests should alert us to that.  I think to
> ensure reliability, the regression must be as complete as possible.
> It should work as an acceptance criterion.  As it stands, it doesn't
> test severely enough for much if any confidence in deployment.
>
> > I'm not sure of the best way to document these kinds of problem
> > statements. If we generate a "catchall file" of queries which crash, it
> > runs the great risk of becoming stale and useless.
>
> Even vaccinations require periodic renewal.  The only degree to which
> a test becomes stale is the degree that a feature is completely removed.
>
> > though since it has been broken forever it doesn't count as a "must-fix"
> > bug and may not make it into the v6.4 release. But it does count as a
> > "should-fix" bug, and it's possible it could be fixed for v6.4...
>
> Seg faults in the backend for standard queries would count as must-fix
> from my view.  The only justification for failure in the non-exotic
> queries is that we are ignorant of the bug's existence.  Once we are
> aware, fixing these should become priority one.
>
> \begin{soapbox}
>
> At the very time ESR is lecturing the software community that open-
> source is the way things should work, we have the opportunity to
> support or refute his claims based on how seriously we take repair
> and maintenance where non-exotic queries induce damage and failure.
>
> Think about the free UNIX-like kernels.  They are now gaining accept-
> ance mostly because they keep running even when folk beat on them.
> We have glass jaws, and it shows.  Let's take the required steps to
> firm them up.
>
> \end{soapbox}

OK, what do you suggest we do?

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle              |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
"Thomas G. Lockhart"
Date:
> If the queries which produce the problem are not exotic, then why
> should they not go there?  I would prefer a regression that took half
> an hour or longer on modern hardware yet screened the code to a degree
> that I could have reasonable faith in an installation if it passed the
> regression.

Well, we shouldn't get caught up in semantics, but in this case it gets
to the heart of the intent: the regression test is designed to test if
known-good features are still good and behavior on one installation
matches behavior on another, not to ensure or document that known-bad
features are still bad.

Another test (or other docs, or ??) might be a good idea, and we could
certainly benefit from it I'm sure, but the regression test is not where
that should go.

As you point out above, "I could have reasonable faith in an
installation if it passed the regression", and that is true; if the
regression test passes you can have reasonable faith that your
installation is a good Postgres installation. It won't ensure that your
Postgres installation does everything you want it to, and no test can do
that :)

>  As it stands, I had to back off of a recommendation even
> with a promise of my maintenance.  I'll reiterate that quite common
> queries are causing NULL pointers to get chased even in non-beta code.

Sure. And thanks for pointing it out. And we'll work on it. And we'll be
glad to accept patches from any source which would fix it immediately.

> When someone snoozes, our tests should alert us to that.  I think to
> ensure reliability, the regression must be as complete as possible.
> It should work as an acceptance criterion.  As it stands, it doesn't
> test severely enough for much if any confidence in deployment.

?? See above. The regression test does exactly what is intended for it.
If there is another kind of test you would like to propose or
contribute, we'd certainly consider it.

> > though since it has been broken forever it doesn't count as a
> > "must-fix" bug and may not make it into the v6.4 release. But it
> > does count as a "should-fix" bug, and it's possible it could be
> fixed for v6.4...
> Seg faults in the backend for standard queries would count as must-fix
> from my view.  The only justification for failure in the non-exotic
> queries is that we are ignorant of the bug's existence.  Once we are
> aware, fixing these should become priority one.
>
> \begin{soapbox}
>
> At the very time ESR is lecturing the software community that open-
> source is the way things should work, we have the opportunity to
> support or refute his claims based on how seriously we take repair
> and maintenance where non-exotic queries induce damage and failure.
>
> Think about the free UNIX-like kernels.  They are now gaining accept-
> ance mostly because they keep running even when folk beat on them.
> We have glass jaws, and it shows.  Let's take the required steps to
> firm them up.
>
> \end{soapbox}

Well, that's all well and good. The way Open Source works is that
_everyone_ has the tools available to them to make the product better,
and some folks will use those tools to contribute back. Not everyone has
to or can, and there is a role for those offering encouragement as well
as those offering actual code. But Open Source doesn't happen without
user participation, and the next version is never the last version.

Postgres has undergone a tremendous evolutionary improvement over the
last two years, and it's actually encouraging that we get darn few
reports of problems with simple (though uncommon) cases such as yours.

Don't panic: yours is the first problem report of this failure, and if
you're disappointed in the response please remember that you brought it
up only three days ago. And that you've gotten two or three responses
from the "support team" since then. And we haven't asked for money. All
in all, a great bargain :)

Regards.

                       - Tom

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
Christopher Oliver
Date:
> OK, what do you suggest we do?

What I'm basically suggesting is to increase the stringency of the
regression suite.  Several posters supported my idea of keeping crash-
ing tests around.  I think they should stick around indefinitely.
I.e. 6.4 BETA1 has some pointer errors that turn up as crashes in the
query optimizer.  They seem to be of the form CLAUSE AND (CLAUSE OR
CLAUSE) or the same with the disjunction and conjunction reversed.
This sort of construct or the crashing negation construct someone
complained of earlier are likely to arise in real life.  It's not
unreasonable therefore that tests for such patterns be added to the
regression suite.  I.e. we collect the past queries from the real
world that crash our system with a mind to validate the software
should they recur.  You mentioned that your old code somehow crept
into oid8types(), so it isn't simply a waste of testing time to
test against "old friends."  Also, this defends against someone down
the road getting some "bright" idea that was discarded in the past
on grounds of workability or soundness.  Fundamentally, I'm advocating
defensive coding as base practice, and I think perhaps a strong valid-
ation suite might be a step toward enforcing this.  I might also advo-
cate that authors show the discipline of running a full regression
before submitting to the code base.  This is mostly a cultural thing.
I think it was Jolly Chen who said that while he hadn't lost any
data, he was glad the system wasn't managing his payroll.  Imagine it
IS your bank balance riding on the lines of code you write.  I don't
think I'm being as draconian as some software engineering folk, but
I think these sort of steps might help PGSQL move from strength to
strength.

--
Christopher Oliver                     Traverse Internet
Systems Coordinator                    223 Grandview Pkwy, Suite 108
oliver@traverse.net                    Traverse City, Michigan, 49684
  "What good is a can of worms if you never open it?"  -Bob Arning

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
Oleg Bartunov
Date:
Hi,

sorry for interference,
this is a very good topic for discussing and in spite of some stress
it indicates Postgres are really coming into production and most
important question becomes a stability.

On Mon, 21 Sep 1998, Christopher Oliver wrote:

> Date: Mon, 21 Sep 1998 01:34:08 -0400
> From: Christopher Oliver <oliver@fritz.traverse.net>
> To: Bruce Momjian <maillist@candle.pha.pa.us>
> Cc: pgsql-hackers@postgreSQL.org
> Subject: Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]
>
> > OK, what do you suggest we do?
>
> What I'm basically suggesting is to increase the stringency of the
> regression suite.  Several posters supported my idea of keeping crash-
> ing tests around.  I think they should stick around indefinitely.
> I.e. 6.4 BETA1 has some pointer errors that turn up as crashes in the
> query optimizer.  They seem to be of the form CLAUSE AND (CLAUSE OR
> CLAUSE) or the same with the disjunction and conjunction reversed.
> This sort of construct or the crashing negation construct someone
> complained of earlier are likely to arise in real life.  It's not
> unreasonable therefore that tests for such patterns be added to the
> regression suite.  I.e. we collect the past queries from the real
> world that crash our system with a mind to validate the software

I'm using postgres in commecrcial applications  since beginning of 1995 and
certainly without home-made test-suite I couldn't do that.
And I put new version into production only if my test-sute passes
for me on specific machine and works at least month.
Sometimes it works but I didnt' satisfied by speed and I'm looking
for workaround ( robust workaround ). I would be happy if someone
could works on general test suite which cover my specific data,
operational system and environment and I would prefer to run regression tests
for a 30 minutes, one hour or more if it would guarantee my applications will
run ok, but I understand how different may be data, OSes and especially
appl. environment. i.e. fast developing Linux, compilers, locale, libc etc...

> should they recur.  You mentioned that your old code somehow crept
> into oid8types(), so it isn't simply a waste of testing time to
> test against "old friends."  Also, this defends against someone down
> the road getting some "bright" idea that was discarded in the past
> on grounds of workability or soundness.  Fundamentally, I'm advocating
> defensive coding as base practice, and I think perhaps a strong valid-
> ation suite might be a step toward enforcing this.  I might also advo-
> cate that authors show the discipline of running a full regression
> before submitting to the code base.  This is mostly a cultural thing.
> I think it was Jolly Chen who said that while he hadn't lost any
> data, he was glad the system wasn't managing his payroll.  Imagine it
> IS your bank balance riding on the lines of code you write.  I don't
> think I'm being as draconian as some software engineering folk, but
> I think these sort of steps might help PGSQL move from strength to
> strength.
>
> --
> Christopher Oliver                     Traverse Internet
> Systems Coordinator                    223 Grandview Pkwy, Suite 108
> oliver@traverse.net                    Traverse City, Michigan, 49684
>   "What good is a can of worms if you never open it?"  -Bob Arning
>

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
Bruce Momjian
Date:
> Hi,
>
> sorry for interference,
> this is a very good topic for discussing and in spite of some stress
> it indicates Postgres are really coming into production and most
> important question becomes a stability.
>
> On Mon, 21 Sep 1998, Christopher Oliver wrote:
>
> > Date: Mon, 21 Sep 1998 01:34:08 -0400
> > From: Christopher Oliver <oliver@fritz.traverse.net>
> > To: Bruce Momjian <maillist@candle.pha.pa.us>
> > Cc: pgsql-hackers@postgreSQL.org
> > Subject: Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]
> >
> > > OK, what do you suggest we do?
> >
> > What I'm basically suggesting is to increase the stringency of the
> > regression suite.  Several posters supported my idea of keeping crash-
> > ing tests around.  I think they should stick around indefinitely.
> > I.e. 6.4 BETA1 has some pointer errors that turn up as crashes in the
> > query optimizer.  They seem to be of the form CLAUSE AND (CLAUSE OR
> > CLAUSE) or the same with the disjunction and conjunction reversed.
> > This sort of construct or the crashing negation construct someone
> > complained of earlier are likely to arise in real life.  It's not
> > unreasonable therefore that tests for such patterns be added to the
> > regression suite.  I.e. we collect the past queries from the real
> > world that crash our system with a mind to validate the software
>
> I'm using postgres in commecrcial applications  since beginning of 1995 and
> certainly without home-made test-suite I couldn't do that.
> And I put new version into production only if my test-sute passes
> for me on specific machine and works at least month.
> Sometimes it works but I didnt' satisfied by speed and I'm looking
> for workaround ( robust workaround ). I would be happy if someone
> could works on general test suite which cover my specific data,
> operational system and environment and I would prefer to run regression tests
> for a 30 minutes, one hour or more if it would guarantee my applications will
> run ok, but I understand how different may be data, OSes and especially
> appl. environment. i.e. fast developing Linux, compilers, locale, libc etc...
>

This may sound obvious, but isn't that what we are doing in the beta
test cycle.  Everyone is testing their applications/platforms, and
reporting problems back to us.

I don't see how we can ever duplicate all the tests everyone runs with
their applications.  I suppose it is possible, but as applications
change to use new features, we would still be running the old
application tests.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle              |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
Bruce Momjian
Date:
> > OK, what do you suggest we do?
>
> What I'm basically suggesting is to increase the stringency of the
> regression suite.  Several posters supported my idea of keeping crash-
> ing tests around.  I think they should stick around indefinitely.
> I.e. 6.4 BETA1 has some pointer errors that turn up as crashes in the
> query optimizer.  They seem to be of the form CLAUSE AND (CLAUSE OR
> CLAUSE) or the same with the disjunction and conjunction reversed.
> This sort of construct or the crashing negation construct someone
> complained of earlier are likely to arise in real life.  It's not
> unreasonable therefore that tests for such patterns be added to the
> regression suite.  I.e. we collect the past queries from the real
> world that crash our system with a mind to validate the software
> should they recur.  You mentioned that your old code somehow crept

Basically, there are an infinite number of queries that can be sent to
the backend.  Even if we test 10 time the current number of queries, we
don't approach the total possible number of queries.

The AND/OR thing you mentioned is a new problem only because we now
use indexes on OR's, which we never did before.  It has been found, and
will be fixed very soon.

There may be other problems, but I doubt that one will come return as a
bug again once it is fixed.  Some NEW one will.

> into oid8types(), so it isn't simply a waste of testing time to
> test against "old friends."  Also, this defends against someone down
> the road getting some "bright" idea that was discarded in the past
> on grounds of workability or soundness.  Fundamentally, I'm advocating
> defensive coding as base practice, and I think perhaps a strong valid-
> ation suite might be a step toward enforcing this.  I might also advo-

The old code was put back because for some reason, the new code did not
work.  No one has told me the diffence between:

    int **var

and

    int (*var)[]

but that was the fix.  The problem is that people want features added
AND more stable code, and we just can't do both.

The old versions like 6.3.2 are around for people who want them, and no
one is suggesting they have to upgrade, but if they want OR indexing,
they will  have to take some instability while I get things working
100%.

There are no shortcuts, and I don't see a regression test finding them
that much faster considering the odds that the regression test may not
have the test that shows the failure.

We also have to consider the cost of trying to approach an infinite
number of query possibilities vs. benefit.

> cate that authors show the discipline of running a full regression
> before submitting to the code base.  This is mostly a cultural thing.
> I think it was Jolly Chen who said that while he hadn't lost any
> data, he was glad the system wasn't managing his payroll.  Imagine it
> IS your bank balance riding on the lines of code you write.  I don't
> think I'm being as draconian as some software engineering folk, but
> I think these sort of steps might help PGSQL move from strength to
> strength.

I don't think we can be accused of throwing code into the backend
without considering the consequences.  In fact, I think we are more
strict about it than most/any open software project.  It is just the
amount of code and complexity, and the infinite query possibilities that
can bite us.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle              |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
Bruce Momjian
Date:
> > No one has told me the diffence between:
> >
> >     int **var
> >
> > and
> >
> >     int (*var)[]
> >
> > but that was the fix.
>
> If anyone does - please let me know! I thought the two were
> equivalent.

Somehow different.  It may be the difference between a pointer to a list
of int pointers, and a pointer to a pointer to a list of ints.  Does
that make sense?

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle              |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
"J. Michael Roberts"
Date:
Yeek.  Some hackles seem to getting raised.

I am one of the people who support a more stringent regression test.
Bruce, I don't think anyone in their right mind could possibly accuse you
of doing less than a superhuman job here.  So I think there's no need for
you to react defensively.

But the fact remains that I, for one, am not going to recommend PG for any
app that I'm not going to check myself on a daily basis.  Not when normal
queries like the one that started this mess can cause crashes that will
never be detected, even if they always did do that.

And yes, there has been support from the peanut gallery, as I think Tom
pointed out, and no, nobody's asked for money.  And yes, the "big guys"
can be far more cavalier about saying "Oh, yes, we knew about that
problem, it'll be fixed in the next release hopefully."  But what we're
really proposing is better documentation of known bugs, and the
construction of a test suite that will not only check basic functionality,
but everything anyone can think of that could be considered sort of normal
usage, and we certainly all have different ideas about what is "normal."
This, no matter what changes are made, we know where we stand.  That's all
that has been said.

The idea of separating a more complete "stability test" from the present
development-time "regression" test, I think, is a valid one.  By the way,
can anyone tell me why it's called a regression test?  What are we
regressing from, or are we regretting having tested?  OK, OK, just a
little humor.

I am perfectly willing to organize a stability test, and I am also more
than willing to start improving the documentation because I've got to
anyway to get this beast working well under Windows -- but I'm not ready
yet, because of that damnable requirement of keeping the family fed and
the bank from repossessing the house. Towards the end of the year, I hope
that the curve will take me back towards free time, and then we'll see
where we stand.

In the meantime, I would hope that all the people doing this incredible
work don't take all this amiss.  You really are doing a bang-up job.

Michael


Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
"Thomas G. Lockhart"
Date:
> I'm using postgres in commecrcial applications  since beginning of
> 1995 and certainly without home-made test-suite I couldn't do that.
> And I put new version into production only if my test-sute passes
> for me on specific machine and works at least month.
> Sometimes it works but I didnt' satisfied by speed and I'm looking
> for workaround ( robust workaround ). I would be happy if someone
> could works on general test suite which cover my specific data,
> operational system and environment and I would prefer to run
> regression tests for a 30 minutes, one hour or more if it would
> guarantee my applications will run ok, but I understand how different
> may be data, OSes and especially appl. environment. i.e. fast
> developing Linux, compilers, locale, libc etc...

Well, it sounds like you just volunteered :)

That is how we build Postgres: people contribute features and support
which addresses their needs or requirements.

I agree that a large/long db regression test would be interesting and
helpful, and you are already providing that, and contributing back the
results, for your specific platform. How large is the dataset for your
test? If it is relatively small it could be packaged into the Postgres
distribution. If it is a megabyte or larger then it could be packaged
into a tar file and placed on the Postgres ftp or web site. That would
let others run the same large tests, and would encourage them to
contribute other large test suites or to augment yours.

A complication is that you are compiling with locale features and using
the Russian/Cyrillic character set. I don't know if that combination is
testable on other machines; it probably is but others would need to be
able to interpret the results too.

Tatsuo has also been very active with the locale features, and you might
want to coordinate with him on ideas for a test suite which addresses
those features specifically.

It helps to have a test coordinator, which you might be willing to do.
For example, during alpha or early beta, you can be running the test and
reporting results, then during mid- to late-beta you can be soliciting
others to run the same test on other platforms.

Another interesting regression test would involve multiple clients
running on a shared database, with transactions, etc. Should be possible
to do from a shell script so we could run it ad infinitum.

Cheers.

                      - Tom

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
"Thomas G. Lockhart"
Date:
> But what we're
> really proposing is better documentation of known bugs, and the
> construction of a test suite that will not only check basic
> functionality, but everything anyone can think of that could be
> considered sort of normal usage, and we certainly all have different
> ideas about what is "normal."
> This, no matter what changes are made, we know where we stand.  That's
> all that has been said.

And that is exactly what a regression test is for.

I think the issues here are what the proper response to a _new_ problem
report should be, and how that new problem should be documented if it is
not going to be fixed right away.

I should mention that Jose' De Silva, who is somehow connected with this
thread since his address shows up in the subject line, did a great job
of writing reference documentation which will appear in the next
Postgres release. He included several examples of deficient Postgres
features, and as I transcribed that documentation was able to fix
several of them. He went to the trouble to write it down, and that
resulted in fixes.

We shouldn't blow this problem out of proportion; although in hindsight
a query as simple as "select not b from t" should be an obvious one, it
has not been reported as a problem in the entire history of Postgres.
That's either because no one had ever tried it, or they did but didn't
bother reporting it (imho letting all of us down in the process). So
there is no way to have guessed that it should have been in the current
regression test. If it had been, it would have never been broken, which
is sort of circular but true...

Now that it's been reported, it will likely be fixed (though don't get
the wrong impression; this long thread has actually gotten in the way of
that rather than helped :). This thread is helpful though because it may
result in a better way of tracking features and problems. What you see
with the Postgres product is what is possible with the current set of
active developers and maintainers. We are _always_ open to others
contributing, but we can't just say "Oh, sure we'll do that" since we
are already putting in as much time as our real lives allow.

There are several people who have expressed interest in this, and Bruce
and I have been responding as best we can, but we don't mean to stifle
innovation or good ideas. I've tried to help the discussion by moving it
away from "regression test", which has a specific goal, to some "bad
features test" or "bad features list" which would document the things
that don't work right. It would be helpful if folks are interested in
maintaining it, and not so helpful if they aren't.

Cheers.

                      - Tom

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
"Thomas G. Lockhart"
Date:
> > ZV> create table t1 ( b bool );
> > ZV> insert into t1 values ( 'T' );
> > ZV> select not b from t1;
> > Wrong syntax.
> Not if he is trying to display the complement of a logical field
> rather than restrict a selection.

(I've lost who started this thread, so if it isn't "techguys" sorry for
the misdirection...)

It looks like "NOT anything" is pretty broken at the moment. However,
try

  select b is false from t1;

to get what you want. The only problem with this is that it would mask
NULL fields if you have any.

                  - Tom

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
The Hermit Hacker
Date:
On Mon, 21 Sep 1998, Bruce Momjian wrote:

> I don't see how we can ever duplicate all the tests everyone runs with
> their applications.  I suppose it is possible, but as applications
> change to use new features, we would still be running the old
> application tests.

    Just a side note on this thread...nothing prevents *add* to the
regression test, and, in fact, I see this all the time, especially as new
features are added.  If someone comes up with a test that they feel
stresses an aspect of the server that we haven't thought of (so long as it
doesn't include adding 80meg of data to our backend), by all means, let us
know and throw patches our way that we can include.

    If you want to 'volume' test it, regression tests aren't for
that...they are only meant to catch changes between versions or snapshots
as a result of a change somewhere else in the system...

 Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
The Hermit Hacker
Date:
On Mon, 21 Sep 1998, Thomas G. Lockhart wrote:

> A complication is that you are compiling with locale features and using
> the Russian/Cyrillic character set. I don't know if that combination is
> testable on other machines; it probably is but others would need to be
> able to interpret the results too.

    What about 'language dictionaries'?  On FreeBSD, we have:

  234946 web2

    That would give a pretty good basis for a table, no?  Why not have
it so that when the person installs, they can stipulate a simple file of
'words' to load in for performing various tests?

    That would also test out the various 'character sets', since I
don't imagine the only dictionary file out there is just English, no?

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: NOT boolfield kills backend

From
"Thomas G. Lockhart"
Date:
Hey Bruce! I've got some questions for the original topic, which was the
failure of

  select not b from t;

where b is a boolean type in table t.

It looks to me that the NOT_EXPR expression node is being constructed
correctly in the parser, but somewhere between the parser and the
executor (at ExecSeqScan()) it is being transformed into a deficient
OP_EXPR expression node. Any hints on other places where this query tree
might be getting transformed or manipulated? If I brute force substitute
the expression type to be NOT_EXPR in ExecEvalExpr() then the query
completes successfully:

postgres=> select not b from t1;
?column?
--------
f
t
(2 rows)

Hints are appreciated, or if this is touching on areas you are already
familiar with I can send my debugging patches...

                       - Tom

Re: [sferac@bo.nettuno.it: Re: [HACKERS] BUG: NOT boolfield kills backend]

From
Bruce Momjian
Date:
>
> Yeek.  Some hackles seem to getting raised.
>
> I am one of the people who support a more stringent regression test.
> Bruce, I don't think anyone in their right mind could possibly accuse you
> of doing less than a superhuman job here.  So I think there's no need for
> you to react defensively.

I'll accept the 'superhuman' compliment, though I really think it
belongs to Vadim.  Vadim, here it is.

> But the fact remains that I, for one, am not going to recommend PG for any
> app that I'm not going to check myself on a daily basis.  Not when normal
> queries like the one that started this mess can cause crashes that will
> never be detected, even if they always did do that.

OK, this has me confused.  This is BETA period.  You are checking on a
daily basis?  I assume you are referring to the beta code as it is
patched, right?  Is this something you did not expect?  Are we fixing
the bugs too quickly?  I don't understand what your expectations are.

> And yes, there has been support from the peanut gallery, as I think Tom
> pointed out, and no, nobody's asked for money.  And yes, the "big guys"
> can be far more cavalier about saying "Oh, yes, we knew about that
> problem, it'll be fixed in the next release hopefully."  But what we're
> really proposing is better documentation of known bugs, and the
> construction of a test suite that will not only check basic functionality,
> but everything anyone can think of that could be considered sort of normal
> usage, and we certainly all have different ideas about what is "normal."
> This, no matter what changes are made, we know where we stand.  That's all
> that has been said.
>
> The idea of separating a more complete "stability test" from the present
> development-time "regression" test, I think, is a valid one.  By the way,
> can anyone tell me why it's called a regression test?  What are we
> regressing from, or are we regretting having tested?  OK, OK, just a
> little humor.
>
> I am perfectly willing to organize a stability test, and I am also more
> than willing to start improving the documentation because I've got to
> anyway to get this beast working well under Windows -- but I'm not ready
> yet, because of that damnable requirement of keeping the family fed and
> the bank from repossessing the house. Towards the end of the year, I hope
> that the curve will take me back towards free time, and then we'll see
> where we stand.
>
> In the meantime, I would hope that all the people doing this incredible
> work don't take all this amiss.  You really are doing a bang-up job.

OK, I have no problem with expanding the regression test to test more
things.  However, I want to clearly outline my expectations.  Of the
past few bugs we have fixed in the beta:

    multi-key system table indexing bug
    bad pfree system table indexing bug
    pg_user access failure
    AND/OR crash

Three of these showed up only on certain platforms, and not the platform
of the coder(me).  Second, the top three did show up in the regression
tests, again only on some platforms.  The other one(the AND/OR) requires
two tables to be joined by index columns, and one of the indexed columns
has to be used in an OR.

So three of the four were already caught by regression, but
unfortunately, only on certain platforms, and the last one is clearly
something related to new OR indexing code.  You could add a regression
test, but I doubt it is going to catch future bugs any more than the
current regression tests.

Thomas maintains the regression tests, and I am sure he would LOVE some
help, or even give the whole area to someone else.   But basically, I
don't see how additional SQL's in the regression suite are going to make
PostgreSQL _that_ much more stable.  Sure, it may catch a few more items
than the current code, but only a few.  Because the query input domain
is nearly infinite, we will still have a vast number of queries that
could potentially fail.

So basically, I am saying, let's beef up the regression suite, if you
think is going to help, and it is going to help, but it is not going to
make _major_ improvements in stability.  You are still going to have to
test the beta at certain intervals in the beta cycle to be sure the
final is going to work 100%.  You could also wait for the final, then
test that and submit bug reports.  We usually have patches or minor
releases after the final to fix those bugs.

Basically, I have a problem with the comment that we need to focus more
on stability.  We focus a _ton_ on stability, because we are not a word
processor that can be restarted if it crashes.  We hold people's data,
and they expect that data to remain stable.  We have had very few
reports of data loss or corruption.

We have been focusing on performance and features, but I don't think we
have sacrificed stability.  In fact, all the bugs reported above are
related to new features added (multi-key system indexes, rewrite system
overhaul, OR indexing).  We get bugs in new features, and they have to
be ironed out.  Many times, the bugs are related to things people had
never had before, i.e. why test the OR indexing code, since we never had
it, so as we add new features like SERIAL, there is going to be NO
regression test for it, because it did not exist before the developer
added it.

regression test additions are not a silver bullet to fix stability
problems.  Having people involved in real-world testing, like we have
now, is what we need.  Yes, it takes time to test things, but we can't
possibly test all the things people are going to do, and taking
developers time away from improving the system to add regression test to
try and approach that infinite input query domain is not really going to
help.

Having clean code that is explained/documented and having developers who
can understand the code, and having people who can test those new
features and changes it the way to go.  I can see this giving far more
benefit to stability than adding queries to the regression suite.

I guess I have seen too many bug reports where someone sends in a query
that I would never have thought to try in a 100 years.  It is that type
of testing that really improves stability.

And the beauty of the system is that once we cut a final, like 6.2.1 or
6.3.2, we have _very_ few bug reports.

I can't see even a 10x increase in a regression test eliminating the
need for a rigirous beta test cycle.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle              |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

Re: NOT boolfield kills backend

From
Bruce Momjian
Date:
> Hey Bruce! I've got some questions for the original topic, which was the
> failure of
>
>   select not b from t;
>
> where b is a boolean type in table t.
>
> It looks to me that the NOT_EXPR expression node is being constructed
> correctly in the parser, but somewhere between the parser and the
> executor (at ExecSeqScan()) it is being transformed into a deficient
> OP_EXPR expression node. Any hints on other places where this query tree
> might be getting transformed or manipulated? If I brute force substitute
> the expression type to be NOT_EXPR in ExecEvalExpr() then the query
> completes successfully:
>
> postgres=> select not b from t1;
> ?column?
> --------
> f
> t
> (2 rows)
>
> Hints are appreciated, or if this is touching on areas you are already
> familiar with I can send my debugging patches...

I would guess something in the optimizer.  Let me take a look at it.

Debugging patches, ah.  No gdb yet?

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle              |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

Re: NOT boolfield kills backend

From
Bruce Momjian
Date:
> Hey Bruce! I've got some questions for the original topic, which was the
> failure of
>
>   select not b from t;
>
> where b is a boolean type in table t.
>
> It looks to me that the NOT_EXPR expression node is being constructed
> correctly in the parser, but somewhere between the parser and the
> executor (at ExecSeqScan()) it is being transformed into a deficient
> OP_EXPR expression node. Any hints on other places where this query tree
> might be getting transformed or manipulated? If I brute force substitute
> the expression type to be NOT_EXPR in ExecEvalExpr() then the query
> completes successfully:
>
> postgres=> select not b from t1;
> ?column?
> --------
> f
> t
> (2 rows)
>
> Hints are appreciated, or if this is touching on areas you are already
> familiar with I can send my debugging patches...
>
>                        - Tom
>

Fixed.  The last part of the 'if' statement in flatten_tlistentry() was
creating an expr that was assumed to be an OP_EXPR, while it was not in
this case.

---------------------------------------------------------------------------


    test=> select x from test;
    x
    -
    f
    (1 row)

    test=> select not x from test;
    ?column?
    --------
    t
    (1 row)

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle              |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

Re: NOT boolfield kills backend

From
"Thomas G. Lockhart"
Date:
> Debugging patches, ah.  No gdb yet?

(I see that you fixed it. Great!)

I just cleaned up a little code; enabling some of the debugging
statements already in this area caused some breakage. Yeah, I'm now
gdb-enabled with Postgres, and it is very nice, but not if you don't
know where the code is headed. I could see the parse tree at the back of
the parser, and I could see the tree at the executor, but had no idea
where it went in between. Stepping through code didn't seem like a good
bet since afaik there are hundreds of calls in between...

                   - Tom

Re: NOT boolfield kills backend

From
Bruce Momjian
Date:
> > Debugging patches, ah.  No gdb yet?
>
> (I see that you fixed it. Great!)
>
> I just cleaned up a little code; enabling some of the debugging
> statements already in this area caused some breakage. Yeah, I'm now
> gdb-enabled with Postgres, and it is very nice, but not if you don't
> know where the code is headed. I could see the parse tree at the back of
> the parser, and I could see the tree at the executor, but had no idea
> where it went in between. Stepping through code didn't seem like a good
> bet since afaik there are hundreds of calls in between...
>
>                    - Tom
>

I suspected the optimizer, and the postmaster -d output showed the parse
and rewrite were ok, but the plan was wrong, so it had to be the
optimizer.  (New debugging levels cause -d 99 to have to be used to see
the query trees.)

I then went to the optimizer and looked for a reference to OP_EXPR,
which is the new value the expression was getting.  I saw it in
make_opclause().   I set a breakpoint on that function and executed the
query inside gdb.  It showed only one call to the function for that
query, and a backtrace on the breakpoint showed flatten_tlistentry()
making the call, and it was clear after that.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle              |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |