Re: Improve selectivity estimate for range queries - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Improve selectivity estimate for range queries
Date
Msg-id 20190109.115557.33860764.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Improve selectivity estimate for range queries  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
At Tue, 08 Jan 2019 16:26:38 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20190108.162638.106314087.horiguchi.kyotaro@lab.ntt.co.jp>
> Hello.
> 
> At Fri, 21 Dec 2018 11:50:28 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <28533.1545411028@sss.pgh.pa.us>
> > seem that that's just moving the problem around, but I think it
> > might be possible to show that such a value couldn't be computed
> > by scalarltsel given a histogram with no more than 10000 members.
> > (I haven't tried to actually prove that, but it seems intuitive
> > that the set of possible results would be quantized with no more
> > than about 5 digits precision.)

I think we don't need a perfect proof for that. The fact that
exactly 1/3 is quite natural and common but 1/3 + ε is not would
be enough.

> FWIW, I got the following result on my environment. It seems
> different enough if this holds on all supported platforms, though
> there still is a case where the result of a sequence of
> arithmetics makes false match.

Simple selectivity of a relation theoretically cannot match with
the epsilon. (Of couse on *my* environment.)

(0.333..)
binary format: 3f d5 55 55 55 55 55 55
x = 0.333333333333333315
   231 matches, 79 no_matches

(0.3{13}42..)
binary format: 3f d5 55 55 55 55 55 f1
x = 0.333333333333341975
   0 matches, 310 no_matches

(0.3{15}42..)
binary format: 3f d5 55 55 55 55 55 57
x = 0.333333333333333426
   0 matches, 310 no_matches


It seems that 0.3{13}42 is correctly 0.3{15}42, which makes just
two LSBs difference from 1/3. I believe C is well standardized on
the translation. Other DEFAULT_*_SELs are not compared in this
way.

The attached small patch fixes the case mentioned in this thread,
but I'm not sure where to put a test. Static assertion is not
usable. Assertion in somewhere perhaps in clauselist_selectivity
seems somewhat overdone.. I don't find a suitable place in
regression test..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#include <stdio.h>
#include <math.h>

int test(double x)
{
  double d = 1.0;
  double d0 = 0;
  unsigned char *c_x = (unsigned char *) &x;
  int nmatches = 0;
  int nnomatches = 0;
  int i;
  
  fprintf(stderr, "binary format: ");
  for (i = 7 ; i >= 0 ; i--)
    fprintf(stderr, "%s%02x", i < 7 ? " " : "", c_x[i]);
  fprintf(stderr, "\n");

  fprintf(stderr, "x = %20.18f\n", x);
  while (d != d0)
  {
    double z = floor(d * 3);
    double z1 = z + 1.0;
    double y = d / z;
    double y1 = d / z1;

    /* Check if both sides of d * 3 doesn't make match */
    if (y == x || y1 == x)
      nmatches++;
    else
      nnomatches++;

    d0 = d;
    d = d * 10;
  }

  fprintf(stderr, "   %d matches, %d no_matches\n", nmatches, nnomatches);

}

int main(void)
{
  test(0.3333333333333333);
  test(0.333333333333342);
  test(0.33333333333333342);

  return 0;
}
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 3739b9817a..cdeaac22c8 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -109,7 +109,7 @@ clauselist_selectivity(PlannerInfo *root,
     ListCell   *l;
     int            listidx;
 
-    /*
+     /*
      * If there's exactly one clause, just go directly to
      * clause_selectivity(). None of what we might do below is relevant.
      */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 5cc4cf15e2..15a8d2402a 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -33,8 +33,12 @@
 /* default selectivity estimate for equalities such as "A = b" */
 #define DEFAULT_EQ_SEL    0.005
 
-/* default selectivity estimate for inequalities such as "A < b" */
-#define DEFAULT_INEQ_SEL  0.3333333333333333
+/*
+ * default selectivity estimate for inequalities such as "A < b"
+ * The last two digits prevent it from making a false match with 1/3 computed
+ * from histogram and/or MCV.
+ */
+#define DEFAULT_INEQ_SEL  0.33333333333333342
 
 /* default selectivity estimate for range inequalities "A > b AND A < c" */
 #define DEFAULT_RANGE_INEQ_SEL    0.005

pgsql-hackers by date:

Previous
From: "Imai, Yoshikazu"
Date:
Subject: RE: speeding up planning with partitions
Next
From: Amit Kapila
Date:
Subject: Re: New function pg_stat_statements_reset_query() to reset statisticsof a specific query