Re: Extending constraint exclusion for implied constraints/conditions - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Extending constraint exclusion for implied constraints/conditions
Date
Msg-id CAFjFpRdK27_D9_ahPfLVpLjog1njr+eKix_23sQaVALDXa7G4w@mail.gmail.com
Whole thread Raw
In response to Re: Extending constraint exclusion for implied constraints/conditions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Extending constraint exclusion for implied constraints/conditions
List pgsql-hackers



On Mon, Jul 7, 2014 at 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> Right now, constraint exclusion code looks only at the provided conditions.
> If we want avoid table scan based on constraints in the above example, it
> will need to look at the implied conditions as well. E.g. val2 < 30 AND val
> = val2 => val < 30. Then the constraint exclusion can see that val < 30 AND
> val > 30 are contradictory and infer that the result is going to be empty.
> We will need to add information about the transitive inferences between
> operators. Can we do that in PostgreSQL? Will the benefits be worth the
> code, that needs to be added?

I doubt it.  The extra code isn't the problem so much, it's the extra
planning cycles spent trying to make proofs that will usually fail.
What you propose will create a combinatorial explosion in the number
of proof paths to be considered.

I can understand that it will create combinatorial explosion in the number of predicates that need to be examined by the constraint exclusion. I do not understand where come the paths gets involved here. The constraint exclusion kicks in before paths are created

 220 static void
 221 set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 222              Index rti, RangeTblEntry *rte)
 223 {
 224     if (rel->reloptkind == RELOPT_BASEREL &&
 225         relation_excluded_by_constraints(root, rel, rte))
 226     {
 227         /*
 228          * We proved we don't need to scan the rel via constraint exclusion,
 229          * so set up a single dummy path for it.  Here we only check this for
 230          * regular baserels; if it's an otherrel, CE was already checked in
 231          * set_append_rel_pathlist().
 232          *
 233          * In this case, we go ahead and set up the relation's path right away
 234          * instead of leaving it for set_rel_pathlist to do.  This is because
 235          * we don't have a convention for marking a rel as dummy except by
 236          * assigning a dummy path to it.
 237          */
 238         set_dummy_rel_pathlist(rel);
 239     }

 and does not create paths for relations excluded by constraints

 295 /*     
 296  * set_rel_pathlist
 297  *    Build access paths for a base relation
 298  */
 299 static void
 300 set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 301                  Index rti, RangeTblEntry *rte)
 302 {
 303     if (IS_DUMMY_REL(rel))
 304     {
 305         /* We already proved the relation empty, so nothing more to do */
 306     }

Same in the case of join in mark_join_rel()

 663      * JOIN_INNER.
 664      */
 665     switch (sjinfo->jointype)
 666     {
 667         case JOIN_INNER:
 668             if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
 669                 restriction_is_constant_false(restrictlist, false))
 670             {
 671                 mark_dummy_rel(joinrel);
 672                 break;
 673             }
 674             add_paths_to_joinrel(root, joinrel, rel1, rel2,
 675                                  JOIN_INNER, sjinfo,
 676                                  restrictlist);
 677             add_paths_to_joinrel(root, joinrel, rel2, rel1,
 678                                  JOIN_INNER, sjinfo,
 679                                  restrictlist);
 680             break;

BTW, on a side note, I noticed, we use mark_dummy_rel() for joinrels and set_dummy_rel_pathlist() for baserel. Shouldn't we be using the same function everywhere for the same functionality (e.g. adding an append path with no child-paths).

For larger relations, the time spent in constraint exclusion might be lesser than the time taken by actual table/index scan and that to when such a scan is not going to produce any rows. So, we might want to apply the technique only when the estimated number of rows/pages are above a certain threshold and may be when the GUC is ON (like we do it today).


> I can see some more benefits. We can use it to eliminate joins where the
> constraints on joining relations may cause the join to be empty e.g.

... and applying constraint exclusion to join relations would make that
problem even worse.


The same case goes with joins, where potentially, the crossproduct of two tables is huge.
 
                        regards, tom lane



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [PATCH] Improve bytea error messages
Next
From: Ashoke
Date:
Subject: Modifying update_attstats of analyze.c for C Strings