Re: Testbed for predtest.c ... and some arguable bugs therein - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Testbed for predtest.c ... and some arguable bugs therein
Date
Msg-id 2629.1520551499@sss.pgh.pa.us
Whole thread Raw
In response to Re: Testbed for predtest.c ... and some arguable bugs therein  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Testbed for predtest.c ... and some arguable bugs therein  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Mar 8, 2018 at 12:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A bit of hacking later, I have the attached.  The set of test cases it
>> includes at the moment were mostly developed with an eye to getting to
>> full code coverage of predtest.c, but we could add more later.  What's
>> really interesting is that it proves that the "weak refutation" logic,
>> i.e. predicate_refuted_by() with clause_is_check = true, is not
>> self-consistent.

> Oops.

After further navel-gazing, I've concluded that my initial opinion of the
"correct" semantics for weak refutation was wrong.  When I revise the
test logic to follow my revised understanding, it no longer claims that
any proofs are mistaken, although there are still things that could be
proven and aren't being.  Let me summarize my thoughts here for the
record.

For predicate_implied_by, we have two plausible definitions of
implication:

* "strong" form: A implies B if truth of A implies truth of B.
We need this to prove that a row satisfying one WHERE clause or
index predicate must satisfy another one.

* "weak" form: A implies B if non-falsity of A implies non-falsity of B.
We need this to prove that a row satisfying one CHECK constraint
must satisfy another one.

We could conceivably use a third proof rule that tries to prove
non-falsity of B from truth of A, but AFAICS there are no places in
the current code that need exactly that behavior, and the "strong" rule
seems like an adequate substitute unless such a need becomes mainstream.
(There aren't that many cases where we could prove something per this rule
but not per the strong rule, anyway.)

The fourth possible proof rule is to prove truth of B from non-falsity
of A, but the number of cases where you can make such a proof is so
small that this option isn't actually interesting in practice.
(For instance, you can't even conclude "X implies X" with such a rule.)

On the predicate_refuted_by side, I was confused about how to choose
among some similar options:

* R1: A refutes B if truth of A implies falsity of B.

* R2: A refutes B if truth of A implies non-truth of B.

* R3: A refutes B if non-falsity of A implies non-truth of B.

* R4: A refutes B if non-falsity of A implies falsity of B.

We can discard R4, again because there are too few cases where we
could prove anything per that rule, eg NOT X wouldn't refute X
or vice versa.

R1 is the traditional behavior of predicate_refuted_by, and as its
comment says, is needed because we want to disprove a CHECK
constraint (ie prove it would be violated) given a WHERE clause.

R2 is of use for disproving one WHERE clause given another one.
It turns out that of the two extant calls to predicate_refuted_by,
one of them would actually like to have that semantics, because
it's looking for mutually contradictory WHERE clauses.  We've been
getting by with R1 for that, but R2 would let us optimize some
more cases.

R3 could be of use for disproving a WHERE clause given a CHECK constraint,
and is the one I'd supposed should be the alternate behavior of
predicate_refuted_by.  However, we have no actual use cases for that;
relation_excluded_by_constraints goes the other way.

Moreover, although R3 seems like it'd be the more tractable rule on
grounds of symmetry, it's looking to me like R2 is about equivalent
for proof purposes.  For instance see the code near predtest.c:700,

             * If A is a strong NOT-clause, A R=> B if B equals A's arg
             *
             * We cannot make the stronger conclusion that B is refuted if B
             * implies A's arg; that would only prove that B is not-TRUE, not
             * that it's not NULL either.  Hence use equal() rather than
             * predicate_implied_by_recurse().  We could do the latter if we
             * ever had a need for the weak form of refutation.

The previous patch to add weak implication rules missed the opportunity
for improvement here.  If we are using R1 or R2, then we have the
assumption "NOT A is true", allowing us to conclude A is false.  Then,
if we can prove B implies A under weak implication (not-false B implies
not-false A), we have proven B must be false, satisfying the proof rule for
R1.  But also, if we can prove B implies A under strong implication (true
B implies true A), we have proven B is not true, satisfying the proof rule
for R2.  So in either case there's potential to recurse to
predicate_implied_by rather than only being able to handle the tautology
"NOT A refutes A".

Now, the same logic would go through if we were using R3.  We'd be
starting with non-falsity of "NOT A", only allowing us to conclude
non-truth of A, but a proof of strong B => A would still let us conclude
B is not true, satisfying R3.

Alternatively, consider if we were starting with A and trying to refute
NOT B.  Intuitively you'd figure that if you can prove A => B, that oughta
be enough to refute NOT B.  That certainly works for R1 with strong
implication: truth of A implies truth of B implies falsity of NOT B.
And it works for R2 with either type of implication: truth of A implies
(at least) non-falsity of B which implies non-truth of NOT B.  For R3,
you'd need weak implication: non-falsity of A implies non-falsity of B
implies non-truth of NOT B.  Now generally, strong implication is easier
to prove than weak implication (cf. existing handling of clause_is_check
in predtest.c), so R2 is marginally better in this case.

So my initial feeling that we ought to define predicate_refuted_by's
alternate behavior as R3 seems wrong: we're better off with R2, which
matches present needs better and is roughly equally tractable as far
as the actual proof logic goes.  

I propose to change test_predtest so that what it checks for as
"w_r_holds" is R2 not R3, and to improve the documentation in predtest.c
about this, and maybe to fix some of the cases where we can do better
by adding mutual recursion between the "implied" and "refuted" cases.
I feel a little guilty about taking time out of the commitfest to deal
with this, but I'd rather get it done while this reasoning is still
swapped in.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Next
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] MERGE SQL Statement for PG11