Thread: Limiting the number of parameterized indexpaths created

Limiting the number of parameterized indexpaths created

From

Tom Lane

Date:

30 October 2012, 21:57:07

I looked into the complaint of unreasonable planner runtime in bug #7626,
http://archives.postgresql.org/pgsql-bugs/2012-10/msg00232.php

In the given example, the indexed relation "foo" has join clauses with
30 other relations.  The code that I added in commit
3b8968f25232ad09001bf35ab4cc59f5a501193e will try all 2^30 combinations
of those rels as possible outer relations for a parameterized indexscan
:-(.  So clearly, the idea that we can just try everything and not have
some kind of heuristic restriction isn't going to work.

This particular example, and probably a lot of similar examples, does
have a principled non-heuristic fix, which comes from the fact that we
recognize all the join clauses as belonging to the same equivalence
class.  Therefore, using more than one of them in an indexscan is
useless.  (The code already does know that and discards the extra
clauses, but that happens too far downstream to prevent the exponential
search time here.)  The extra function eclass_already_used() added in
the attached patch attacks the problem this way, and is sufficient to
resolve the bug for the case presented.

However, we still have an issue for cases where the join clauses aren't
equivalence clauses.  (For instance, if you just change all the "="
signs to "<" in Brian's example, it still takes forever.)  So we also
need some heuristic rule to limit the number of cases considered.

I spent a good deal of time thinking about how the indexscan code might
make use of the joinlist data structure (which prevents exponential
planning time growth overall by limiting which joins we'll consider)
to fix this; the idea being to not consider outer-relation sets that
couldn't actually be presented for use because of joinlist restrictions.
But this looked messy, and probably expensive in itself.  Moreover it
seemed like practical implementations of the idea would carry the
restriction that we could never generate parameterized paths for
joinlist subproblems, which would be a real shame.  And we'd be doing a
lot of work for what is probably a very unusual corner case, given that
I think eclass_already_used() will fix the problem in most real cases.

So what I'm proposing instead, which is implemented in the other half of
the attached patch, is that we simply put an arbitrary limit on how many
outer-relation sets we'll consider while generating parameterized
indexscans.  As written, the patch limits the number of relid sets to 10
times the number of join clauses it's working with, which is small
enough to keep the runtime of this test case to something reasonable.
I think that in practice this will still allow useful
combination-outer-rel cases to be found.  I wonder though if anyone has
a better idea?

            regards, tom lane

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 444ec2a40e4ac2b0ed8435017149c7664869e34c..78aaca4f279d6ba580643c75d82d82bfc88fba38 100644
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
*************** static void consider_index_join_outer_re
*** 92,97 ****
--- 92,98 ----
                                 IndexClauseSet *eclauseset,
                                 List **bitindexpaths,
                                 List *indexjoinclauses,
+                                int considered_clauses,
                                 List **considered_relids);
  static void get_join_index_paths(PlannerInfo *root, RelOptInfo *rel,
                       IndexOptInfo *index,
*************** static void get_join_index_paths(Planner
*** 101,106 ****
--- 102,109 ----
                       List **bitindexpaths,
                       Relids relids,
                       List **considered_relids);
+ static bool eclass_already_used(EquivalenceClass *parent_ec, Relids oldrelids,
+                     List *indexjoinclauses);
  static bool bms_equal_any(Relids relids, List *relids_list);
  static void get_index_paths(PlannerInfo *root, RelOptInfo *rel,
                  IndexOptInfo *index, IndexClauseSet *clauses,
*************** consider_index_join_clauses(PlannerInfo
*** 447,452 ****
--- 450,456 ----
                              IndexClauseSet *eclauseset,
                              List **bitindexpaths)
  {
+     int            considered_clauses = 0;
      List       *considered_relids = NIL;
      int            indexcol;

*************** consider_index_join_clauses(PlannerInfo
*** 460,467 ****
       * filter (qpqual); which is where an available clause would end up being
       * applied if we omit it from the indexquals.
       *
!      * This looks expensive, but in practical cases there won't be very many
!      * distinct sets of outer rels to consider.
       *
       * For simplicity in selecting relevant clauses, we represent each set of
       * outer rels as a maximum set of clause_relids --- that is, the indexed
--- 464,474 ----
       * filter (qpqual); which is where an available clause would end up being
       * applied if we omit it from the indexquals.
       *
!      * This looks expensive, but in most practical cases there won't be very
!      * many distinct sets of outer rels to consider.  As a safety valve when
!      * that's not true, we use a heuristic: limit the number of outer rel sets
!      * considered to a multiple of the number of clauses considered.  (We'll
!      * always consider using each individual join clause, though.)
       *
       * For simplicity in selecting relevant clauses, we represent each set of
       * outer rels as a maximum set of clause_relids --- that is, the indexed
*************** consider_index_join_clauses(PlannerInfo
*** 471,486 ****
--- 478,497 ----
      for (indexcol = 0; indexcol < index->ncolumns; indexcol++)
      {
          /* Consider each applicable simple join clause */
+         considered_clauses += list_length(jclauseset->indexclauses[indexcol]);
          consider_index_join_outer_rels(root, rel, index,
                                         rclauseset, jclauseset, eclauseset,
                                         bitindexpaths,
                                         jclauseset->indexclauses[indexcol],
+                                        considered_clauses,
                                         &considered_relids);
          /* Consider each applicable eclass join clause */
+         considered_clauses += list_length(eclauseset->indexclauses[indexcol]);
          consider_index_join_outer_rels(root, rel, index,
                                         rclauseset, jclauseset, eclauseset,
                                         bitindexpaths,
                                         eclauseset->indexclauses[indexcol],
+                                        considered_clauses,
                                         &considered_relids);
      }
  }
*************** consider_index_join_clauses(PlannerInfo
*** 494,499 ****
--- 505,511 ----
   * 'rel', 'index', 'rclauseset', 'jclauseset', 'eclauseset', and
   *        'bitindexpaths' as above
   * 'indexjoinclauses' is a list of RestrictInfos for join clauses
+  * considered_clauses is the total number of clauses considered so far
   * '*considered_relids' is a list of all relids sets already considered
   */
  static void
*************** consider_index_join_outer_rels(PlannerIn
*** 504,509 ****
--- 516,522 ----
                                 IndexClauseSet *eclauseset,
                                 List **bitindexpaths,
                                 List *indexjoinclauses,
+                                int considered_clauses,
                                 List **considered_relids)
  {
      ListCell   *lc;
*************** consider_index_join_outer_rels(PlannerIn
*** 522,528 ****
          /*
           * Generate the union of this clause's relids set with each
           * previously-tried set.  This ensures we try this clause along with
!          * every interesting subset of previous clauses.
           *
           * Note: get_join_index_paths adds entries to *considered_relids, but
           * it prepends them to the list, so that we won't visit new entries
--- 535,543 ----
          /*
           * Generate the union of this clause's relids set with each
           * previously-tried set.  This ensures we try this clause along with
!          * every interesting subset of previous clauses.  However, to avoid
!          * exponential growth of planning time when there are many clauses,
!          * limit the number of relid sets accepted to 10 * considered_clauses.
           *
           * Note: get_join_index_paths adds entries to *considered_relids, but
           * it prepends them to the list, so that we won't visit new entries
*************** consider_index_join_outer_rels(PlannerIn
*** 543,548 ****
--- 558,584 ----
              if (bms_subset_compare(clause_relids, oldrelids) != BMS_DIFFERENT)
                  continue;

+             /*
+              * If this clause was derived from an equivalence class, the
+              * clause list may contain other clauses derived from the same
+              * eclass.  We should not consider that combining this clause with
+              * one of those clauses generates a usefully different
+              * parameterization; so skip if any clause derived from the same
+              * eclass would already have been included by oldrelids.
+              */
+             if (rinfo->parent_ec &&
+                 eclass_already_used(rinfo->parent_ec, oldrelids,
+                                     indexjoinclauses))
+                 continue;
+
+             /*
+              * If the number of relid sets considered exceeds our heuristic
+              * limit, stop considering combinations of clauses.  We'll still
+              * consider the current clause alone, though (below this loop).
+              */
+             if (list_length(*considered_relids) >= 10 * considered_clauses)
+                 break;
+
              /* OK, try the union set */
              get_join_index_paths(root, rel, index,
                                   rclauseset, jclauseset, eclauseset,
*************** get_join_index_paths(PlannerInfo *root,
*** 648,653 ****
--- 684,711 ----
  }

  /*
+  * eclass_already_used
+  *        True if any join clause usable with oldrelids was generated from
+  *        the specified equivalence class.
+  */
+ static bool
+ eclass_already_used(EquivalenceClass *parent_ec, Relids oldrelids,
+                     List *indexjoinclauses)
+ {
+     ListCell   *lc;
+
+     foreach(lc, indexjoinclauses)
+     {
+         RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+         if (rinfo->parent_ec == parent_ec &&
+             bms_is_subset(rinfo->clause_relids, oldrelids))
+             return true;
+     }
+     return false;
+ }
+
+ /*
   * bms_equal_any
   *        True if relids is bms_equal to any member of relids_list
   *

Re: Limiting the number of parameterized indexpaths created

From

Simon Riggs

Date:

31 October 2012, 08:37:17

On 30 October 2012 21:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> So what I'm proposing instead, which is implemented in the other half of
> the attached patch, is that we simply put an arbitrary limit on how many
> outer-relation sets we'll consider while generating parameterized
> indexscans.  As written, the patch limits the number of relid sets to 10
> times the number of join clauses it's working with, which is small
> enough to keep the runtime of this test case to something reasonable.
> I think that in practice this will still allow useful
> combination-outer-rel cases to be found.  I wonder though if anyone has
> a better idea?

That seems like a useful limit, but it is arbitrary and not
controllable by user.

An example like that is actually fairly common when you put back
together a deep class hierarchy with one table per class. I'm a little
surprised the whole thing doesn't collapse anyway, when using
constants as shown since aid = 2 AND aid =3 will be no rows. But I
guess that's no really important.

We should be able to do a better job of recognising some other
aspect/symmetry of this situation and then simplifying the problem
from that. Calculating all paths treats the problem as a complete
unknown and we must be able to do better than that. If we look at the
problem from the perspective of how we would handle it if one of the
tables was very large, how would that change things? Can we use a
greedy algorithm starting with largest table?

This is hand-waving, but it is useful to discuss these things. I'd be
happy if you could give a fuller explanation of this issue so we can
all learn.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Limiting the number of parameterized indexpaths created

From

Tom Lane

Date:

31 October 2012, 19:44:30

Simon Riggs <simon@2ndQuadrant.com> writes:
> We should be able to do a better job of recognising some other
> aspect/symmetry of this situation and then simplifying the problem
> from that. Calculating all paths treats the problem as a complete
> unknown and we must be able to do better than that. If we look at the
> problem from the perspective of how we would handle it if one of the
> tables was very large, how would that change things? Can we use a
> greedy algorithm starting with largest table?

Doesn't seem particularly relevant.  The issue is which parameterized
paths to generate for the current index of the current table.  It could
possibly make sense to try harder if the current table is large than
if it's small --- but no matter how large it is, we can't afford to
enumerate all 2^N possibilities if we have join clauses linking to N
other relations and N is more than a handful.

The reason this is a problem is that you might have indexes, or even
individual join clauses, that require multiple other rels to be used
effectively.  Suppose we have an index t1ab on t1(a, b) and a query
WHERE t1.a = t2.x AND t1.b = t3.y

If t1 is large while t2 and t3 are both small, the best plan is to join
t2 and t3 first (even if that's a cartesian join) and then use both t2.x
and t3.y as parameters for an indexscan on t1ab.  The same observation
can apply even for single index columns and single join clauses:
WHERE t1.a = (t2.x + t3.y)

So the context where this is an issue is that we're considering the
specific index t1ab and wondering what parameterized indexscan paths
to create using it.  To find a good plan in these examples, we have to
create a path that's parameterized by both t2 and t3.  It's not so hard
to consider that possibility here, but what if there are 30 different
join clauses (linking to 30 other rels) that could be used with the same
index?  We can't afford to try all the combinations exhaustively, and
even if we could, it's just about certain that most of them aren't worth
our time.

Before 9.2, we didn't have this problem in this guise because we didn't
generate parameterized paths bottom-up; instead, given that we had
already decided to join t2 and t3 for some reason, we would look to see
what indexpaths we could make for t1 using values from t2 and/or t3.
However, that approach had big problems of its own: it couldn't find
plans that involved pushing parameters down through an intervening join.

Another point here is that no such path can get chosen unless the
planner decides to consider joining t2 and t3 to begin with --- and if
that's a cartesian join, it will reject the idea out of hand.  In the
previous coding there was basically no realistic fix for that; if we let
the thing try useless cartesian joins, planning time on large queries
will go to hell in a handbasket.  With the new code, we could in
principle make note of interesting parameterized paths that require
multiple outer relations, and then allow those specific combinations of
relations to get cartesian-joined.  I've not had time to pursue this yet
but I think it should happen sometime.

So I think bottom-up creation of parameterized paths is better than what
we were doing pre-9.2, but we do have to stop it from going crazy when
there are too many join clauses for the same index.

Now the $64 question is how hard do we have to try to find the "best"
alternative when there are too many.  I think queries in which there are
a lot of join clauses touching the same index but they are not related
by equivalence classes are probably pretty pathological.  It's
questionable whether we should spend a lot of skull sweat on such cases,
or for that matter that we'd even recognize an especially good solution
if we saw it.

One idea that occurred to me while typing this up is to restrict the
maximum number of other-rels that we'll consider together, ie, when we
have 30 other rels, don't try to form all 2^30 subsets but only those
containing at most K rels, for some pretty small K (say, the number of
index columns).  But I'm not really sure it's worth the effort.
        regards, tom lane

Re: Limiting the number of parameterized indexpaths created

From

Simon Riggs

Date:

31 October 2012, 19:59:10

On 31 October 2012 19:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Before 9.2, we didn't have this problem in this guise because we didn't
> generate parameterized paths bottom-up; instead, given that we had
> already decided to join t2 and t3 for some reason, we would look to see
> what indexpaths we could make for t1 using values from t2 and/or t3.
> However, that approach had big problems of its own: it couldn't find
> plans that involved pushing parameters down through an intervening join.

Given that most joins patterns are repeated consistently across
multiple SQL statements, is it possible to pre-calculate paths at
ANALYZE time, or perhaps cache them (persistently?) when first
generated? That way we just look up best paths, which would make join
planning much faster.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Limiting the number of parameterized indexpaths created

From

Robert Haas

Date:

05 November 2012, 18:35:33

On Tue, Oct 30, 2012 at 5:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I looked into the complaint of unreasonable planner runtime in bug #7626,
> http://archives.postgresql.org/pgsql-bugs/2012-10/msg00232.php
>
> In the given example, the indexed relation "foo" has join clauses with
> 30 other relations.  The code that I added in commit
> 3b8968f25232ad09001bf35ab4cc59f5a501193e will try all 2^30 combinations
> of those rels as possible outer relations for a parameterized indexscan
> :-(.  So clearly, the idea that we can just try everything and not have
> some kind of heuristic restriction isn't going to work.

You know, when I read this, my first thought was ... why is this an
exponential relationship instead of a linear one?  Even now, I'm not
sure I quite understand that.  With a parameterized path, we get an
index scan (or index-only scan) with a.id taking its value from some
outer scan, but it can't take its value from more than one outer scan.Can it?  So what does it mean to parameterize the
scanof foo by both

ag1 (aid) and ag2 (aid)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Limiting the number of parameterized indexpaths created

From

Tom Lane

Date:

05 November 2012, 19:05:46

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Oct 30, 2012 at 5:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I looked into the complaint of unreasonable planner runtime in bug #7626,
>> http://archives.postgresql.org/pgsql-bugs/2012-10/msg00232.php

> You know, when I read this, my first thought was ... why is this an
> exponential relationship instead of a linear one?

Because it's considering *combinations* of outer relations for a
parameterized scan.  For instance consider an index on t(a,b)
and a queryWHERE t.a = x.c1 AND t.b = y.c2
There are three different parameterized paths we could create: one
relying on x only, one relying on y only, one relying on both.
The one relying on y only is probably going to suck, if this is a
btree index, but at the level we're working at here that's not yet
apparent.  The other two are definitely both worthy of consideration,
since it might or might not be worth it to join x and y first in order
to use both conditions in scanning t.

So in general, given join clauses that reference N different outer
relations, you could have as many as 2^N-1 sets of outer relations that
could possibly generate usefully-different parameterized paths.  In
practice, since all these clauses must be usable with the same index,
there's probably not nearly that many useful combinations --- but again,
it's hard to know exactly which ones are winners in advance of doing any
cost calculations.
        regards, tom lane

Re: Limiting the number of parameterized indexpaths created

From

Robert Haas

Date:

05 November 2012, 19:09:48

On Mon, Nov 5, 2012 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Oct 30, 2012 at 5:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I looked into the complaint of unreasonable planner runtime in bug #7626,
>>> http://archives.postgresql.org/pgsql-bugs/2012-10/msg00232.php
>
>> You know, when I read this, my first thought was ... why is this an
>> exponential relationship instead of a linear one?
>
> Because it's considering *combinations* of outer relations for a
> parameterized scan.  For instance consider an index on t(a,b)
> and a query
>         WHERE t.a = x.c1 AND t.b = y.c2
> There are three different parameterized paths we could create: one
> relying on x only, one relying on y only, one relying on both.

Sure, but that example is different from the test case provided in the
bug report.  I agree that here we need to try paths parameterized by
a, b, or both a and b.  Things might blow up multiplicatively, because
we have join clauses referencing both t.a and t.b.  But they shouldn't
blow up exponentially, because each of t.a and t.b can only be
parameterized by ONE thing (I think).  And in the example in the bug
report, only one column of the table (foo.id) is mentioned.  foo.id
can be driven by ag1.aid OR ag2.aid OR ag3.aid OR ..., but not more
than one of those at a time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Limiting the number of parameterized indexpaths created

From

Tom Lane

Date:

05 November 2012, 19:44:19

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Nov 5, 2012 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There are three different parameterized paths we could create: one
>> relying on x only, one relying on y only, one relying on both.

> Sure, but that example is different from the test case provided in the
> bug report.  I agree that here we need to try paths parameterized by
> a, b, or both a and b.  Things might blow up multiplicatively, because
> we have join clauses referencing both t.a and t.b.  But they shouldn't
> blow up exponentially, because each of t.a and t.b can only be
> parameterized by ONE thing (I think).

Um, no.  This is a useful counterexample:
WHERE t.a > x.c1 AND t.a < y.c2

With a range constraint like this one, it's possible for the
doubly-parameterized path to be quite useful while either
singly-parameterized path is basically useless.  And these examples
aren't even going into cases you might get with non-btree indexes,
where clauses could interact in much more complicated ways.

> And in the example in the bug
> report, only one column of the table (foo.id) is mentioned.  foo.id
> can be driven by ag1.aid OR ag2.aid OR ag3.aid OR ..., but not more
> than one of those at a time.

In the example, we do figure out that the clauses are redundant, but
only further downstream.  The code that's got the problem can't really
assume such a thing.  As patched, it will indeed limit what it considers
to at most one additional clause per index column, once it's hit the
heuristic limit --- but it's entirely possible for it to miss useful
combinations because of that.
        regards, tom lane

Re: Limiting the number of parameterized indexpaths created

From

Robert Haas

Date:

05 November 2012, 19:46:38

On Mon, Nov 5, 2012 at 2:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Nov 5, 2012 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> There are three different parameterized paths we could create: one
>>> relying on x only, one relying on y only, one relying on both.
>
>> Sure, but that example is different from the test case provided in the
>> bug report.  I agree that here we need to try paths parameterized by
>> a, b, or both a and b.  Things might blow up multiplicatively, because
>> we have join clauses referencing both t.a and t.b.  But they shouldn't
>> blow up exponentially, because each of t.a and t.b can only be
>> parameterized by ONE thing (I think).
>
> Um, no.  This is a useful counterexample:
>
>         WHERE t.a > x.c1 AND t.a < y.c2
>
> With a range constraint like this one, it's possible for the
> doubly-parameterized path to be quite useful while either
> singly-parameterized path is basically useless.  And these examples
> aren't even going into cases you might get with non-btree indexes,
> where clauses could interact in much more complicated ways.

Well, OK.  So maybe you also need the operator to be the same as well.

>> And in the example in the bug
>> report, only one column of the table (foo.id) is mentioned.  foo.id
>> can be driven by ag1.aid OR ag2.aid OR ag3.aid OR ..., but not more
>> than one of those at a time.
>
> In the example, we do figure out that the clauses are redundant, but
> only further downstream.  The code that's got the problem can't really
> assume such a thing.  As patched, it will indeed limit what it considers
> to at most one additional clause per index column, once it's hit the
> heuristic limit --- but it's entirely possible for it to miss useful
> combinations because of that.

Seems unfortunate, but I don't understand the code well enough to know
how to do better.


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Limiting the number of parameterized indexpaths created

From

Tom Lane

Date:

05 November 2012, 20:07:07

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Nov 5, 2012 at 2:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Um, no.  This is a useful counterexample:
>>    WHERE t.a > x.c1 AND t.a < y.c2

> Well, OK.  So maybe you also need the operator to be the same as well.

Nope.  A counterexample to that claim is a GIN index on an array column:
WHERE t.arraycol @> array[1,2,3] AND t.arraycol @> array[4,5,6]

This restriction is equivalent to
WHERE t.arraycol @> array[1,2,3,4,5,6]

which is substantially more selective than either constraint alone.
If the two RHS arrays are not constants, but are coming from different
tables x and y, then we have something isomorphic to the previous
example (at least from the perspective of indxpath.c), but it would
not be good for indxpath.c to assume that these clauses couldn't be
useful together.

We *can* make a simplifying assumption of the kind you suggest when
we know that the clauses were all generated from the same equivalence
class, because then we have very strong assumptions about what the
clauses' semantics are.  (And indeed the patch does take care of that
case separately.)  But for the general case of non-equijoin clauses
we can't assume very much at all about whether clauses are redundant,
at least not without knowledge that indxpath.c hasn't got.

>> ....  As patched, it will indeed limit what it considers
>> to at most one additional clause per index column, once it's hit the
>> heuristic limit --- but it's entirely possible for it to miss useful
>> combinations because of that.

> Seems unfortunate, but I don't understand the code well enough to know
> how to do better.

Me either.  What I will say is that as patched, the code will still
find all useful clause combinations as long as there aren't too many
other relations involved.  I've not been able to think of another way
of restricting the search that doesn't reject possibly-useful
combinations even in otherwise-very-simple queries.
        regards, tom lane

Re: Limiting the number of parameterized indexpaths created

From

Robert Haas

Date:

05 November 2012, 20:19:07

On Mon, Nov 5, 2012 at 3:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Nov 5, 2012 at 2:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Um, no.  This is a useful counterexample:
>>>      WHERE t.a > x.c1 AND t.a < y.c2
>
>> Well, OK.  So maybe you also need the operator to be the same as well.
>
> Nope.  A counterexample to that claim is a GIN index on an array column:
>
>         WHERE t.arraycol @> array[1,2,3] AND t.arraycol @> array[4,5,6]
>
> This restriction is equivalent to
>
>         WHERE t.arraycol @> array[1,2,3,4,5,6]
>
> which is substantially more selective than either constraint alone.
> If the two RHS arrays are not constants, but are coming from different
> tables x and y, then we have something isomorphic to the previous
> example (at least from the perspective of indxpath.c), but it would
> not be good for indxpath.c to assume that these clauses couldn't be
> useful together.

Neat example.

> We *can* make a simplifying assumption of the kind you suggest when
> we know that the clauses were all generated from the same equivalence
> class, because then we have very strong assumptions about what the
> clauses' semantics are.  (And indeed the patch does take care of that
> case separately.)  But for the general case of non-equijoin clauses
> we can't assume very much at all about whether clauses are redundant,
> at least not without knowledge that indxpath.c hasn't got.

OK.  Fortunately, I don't think we need to care too much about that
case, since non-equijoins are pretty rare.  A reasonable heuristic
restriction seems fine for that case ... at least until the next
problem case shows up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company