Thread: [HACKERS] 200 = 199 + 1?

[HACKERS] 200 = 199 + 1?

From
Marko Tiikkaja
Date:
Hi,

I just came across this very peculiar behavior:

=# create table foo(id int primary key);
CREATE TABLE
=# insert into foo select generate_series(1, 1000000);
INSERT 0 1000000
=# set enable_hashjoin to false; set enable_mergejoin to false;
SET
SET
=# explain select * from foo where id in (select i from generate_series(1, 200) i limit 199);
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.90..1648.52 rows=199 width=4)
   ->  HashAggregate  (cost=4.48..6.47 rows=199 width=4)
         Group Key: i.i
         ->  Limit  (cost=0.00..1.99 rows=199 width=4)
               ->  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000 width=4)
   ->  Index Only Scan using foo_pkey on foo  (cost=0.42..8.24 rows=1 width=4)
         Index Cond: (id = i.i)
(7 rows)

=# explain select * from foo where id in (select i from generate_series(1, 200) i limit 200);
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.93..1653.00 rows=500000 width=4)
   ->  HashAggregate  (cost=4.50..6.50 rows=200 width=4)
         Group Key: i.i
         ->  Limit  (cost=0.00..2.00 rows=200 width=4)
               ->  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000 width=4)
   ->  Index Only Scan using foo_pkey on foo  (cost=0.42..8.22 rows=1 width=4)
         Index Cond: (id = i.i)
(7 rows)

So it seems that once the HashAggregate estimates to return 200 rows or more, something extremely weird happens and the Nested Loop's estimate goes wild.  I've recently seen numerous instances of this kind of a problem where the row estimates from a nested loop's child nodes are very reasonable but the loop itself goes absolutely nuts.  I can't see how this can possibly be justified.

I wonder if the nested loop shouldn't have some kind of a cap on its own estimate if it's wildly off of what you'd get by multiplying the child nodes' estimates with each other?


.m

Re: [HACKERS] 200 = 199 + 1?

From
Tom Lane
Date:
Marko Tiikkaja <marko@joh.to> writes:
> I just came across this very peculiar behavior:

I think this is a consequence of the clamping + fallback logic in
eqjoinsel_semi.  The planner has no info about the inner select's result,
except it can see the LIMIT clause so it uses that as a rowcount estimate.
get_variable_numdistinct reports DEFAULT_NUM_DISTINCT (200) with isdefault
true; but if the LIMIT is <= 200, eqjoinsel_semi's clamping logic fires:
    * If we clamp, we can treat nd2 as being a non-default estimate; it's not    * great, maybe, but it didn't come out
ofnowhere either.  This is most    * helpful when the inner relation is empty and consequently has no stats.
 

and that enables the nondefault (nd2 / nd1) estimate to be used,
which happens to be dead on in this case, because the estimated nd2
is in fact exact.  With a larger LIMIT, we end up with the 0.5 default
selectivity estimate, which is way off in this particular example.

Now I'd be the first to say that the 0.5 default didn't have all that much
thought put into it.  One idea that occurs to me while staring at this is
why don't we use DEFAULT_EQ_SEL instead?  Or for that matter, maybe just
bulling ahead and using nd2 even if it is default would give superior
results in many cases.  (Not sure whether nd1 ought to get the same
treatment.  At least in this example, we do know nd1 with some precision;
and if we don't know either one, dividing nd2/nd1 to arrive at 1.0 doesn't
sound like a good idea.)  Or maybe take nd2 = size of inner rel if we want
to be conservative but still do something smarter than 0.5.

> I wonder if the nested loop shouldn't have some kind of a cap on its own
> estimate if it's wildly off of what you'd get by multiplying the child
> nodes' estimates with each other?

Nonstarter I'm afraid.  The join relation's size estimate is determined
long before we get to a point where we could multiply the sizes of these
particular child paths to arrive at the conclusion that it should be
something different than what we decided originally.  Adjusting the size
of the nestloop result at that point would merely give it an unfair
advantage over other paths for the same join relation.  (I think it would
also break some assumptions about paths for the same relation all giving
the same number of rows, unless parameterized.)

Looking at it another way, the main thing that the combination of hashagg
outer path + indexscan inner path knows that eqjoinsel_semi didn't account
for is that there's a unique index on foo.id.  But that info is available
to eqjoinsel_semi, in the sense that it's been given a nondefault estimate
that nd1 is equal to the outer relation size.  So the mistake that it's
making is to throw up its hands and use an 0.5 selectivity estimate just
because it has no info about the inner relation.  I think if we'd pushed
through the nd2/nd1 calculation after setting nd2 = size of inner rel,
we'd end up with an estimate matching the product of these path sizes.
(Caution: inadequate caffeine absorbed yet, this might be all wrong.)
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 200 = 199 + 1?

From
Marko Tiikkaja
Date:
On Wed, Sep 27, 2017 at 5:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Tiikkaja <marko@joh.to> writes:
> I wonder if the nested loop shouldn't have some kind of a cap on its own
> estimate if it's wildly off of what you'd get by multiplying the child
> nodes' estimates with each other?

Nonstarter I'm afraid.  The join relation's size estimate is determined
long before we get to a point where we could multiply the sizes of these
particular child paths to arrive at the conclusion that it should be
something different than what we decided originally.

Ah hah.  Thanks for the explanation, that makes sense.
 
Adjusting the size
of the nestloop result at that point would merely give it an unfair
advantage over other paths for the same join relation.  (I think it would
also break some assumptions about paths for the same relation all giving
the same number of rows, unless parameterized.)

With the previous paragraph in mind, I would agree; it's not a very good idea.
 
Looking at it another way, the main thing that the combination of hashagg
outer path + indexscan inner path knows that eqjoinsel_semi didn't account
for is that there's a unique index on foo.id.  But that info is available
to eqjoinsel_semi, in the sense that it's been given a nondefault estimate
that nd1 is equal to the outer relation size.  So the mistake that it's
making is to throw up its hands and use an 0.5 selectivity estimate just
because it has no info about the inner relation.  I think if we'd pushed
through the nd2/nd1 calculation after setting nd2 = size of inner rel,
we'd end up with an estimate matching the product of these path sizes.
(Caution: inadequate caffeine absorbed yet, this might be all wrong.)

This sounds very reasonable to me.


.m

Re: [HACKERS] 200 = 199 + 1?

From
Tom Lane
Date:
Marko Tiikkaja <marko@joh.to> writes:
> On Wed, Sep 27, 2017 at 5:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Looking at it another way, the main thing that the combination of hashagg
>> outer path + indexscan inner path knows that eqjoinsel_semi didn't account
>> for is that there's a unique index on foo.id.  But that info is available
>> to eqjoinsel_semi, in the sense that it's been given a nondefault estimate
>> that nd1 is equal to the outer relation size.  So the mistake that it's
>> making is to throw up its hands and use an 0.5 selectivity estimate just
>> because it has no info about the inner relation.  I think if we'd pushed
>> through the nd2/nd1 calculation after setting nd2 = size of inner rel,
>> we'd end up with an estimate matching the product of these path sizes.
>> (Caution: inadequate caffeine absorbed yet, this might be all wrong.)

> This sounds very reasonable to me.

I experimented a bit with the attached patch, which modifies
eqjoinsel_semi in two ways.  First, if the number-of-distinct-values
estimate for the inner rel is just a default rather than having any
real basis, it replaces it with inner_rel->rows, effectively assuming
that the inside of the IN or EXISTS is unique.  Second, it drops the
fallback to selectivity 0.5 altogether, just applying the nd1 vs nd2
heuristic all the time.  This gets rid of the discontinuity of behavior
at 200 estimated distinct values.  The behavior in your example is
maybe a bit funny-looking: for LIMITs above 200 you get something like

 Nested Loop  (cost=7.18..869.25 rows=300 width=4)
   ->  HashAggregate  (cost=6.75..8.75 rows=200 width=4)
         Group Key: i.i
         ->  Limit  (cost=0.00..3.00 rows=300 width=4)
               ->  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000 width=4)
   ->  Index Only Scan using foo_pkey on foo  (cost=0.42..4.30 rows=1 width=4)
         Index Cond: (id = i.i)

The reported rowcount for the HashAggregate is still 200, which
is out of sync with what we did at the join level.  I think that's
just a cosmetic thing though, and am disinclined to try to jury-rig
something to make it look different.

The change causes one plan change that's visible in the regression test
outputs; that change seems OK so I've just included an expected-output
change below.

This could use somebody trying harder to break it than I have.  It might
also be wise to dig into the list archives and see what prompted the
inclusion of the don't-use-the-equations-with-default-ndistinct logic
in the first place.

            regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index db1792b..c32ff2b 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*************** eqjoinsel_semi(Oid operator,
*** 2570,2575 ****
--- 2570,2585 ----
      memset(&sslot2, 0, sizeof(sslot2));

      /*
+      * If we lack any fact-based estimate for nd2, it seems best to set it
+      * equal to the inner relation size estimate, ie, assume the inner side of
+      * the semijoin is unique.  This may lead to overestimating the size of
+      * the join, but that's usually better than an underestimate.  We don't
+      * make any comparable assumption for the outer side, though.
+      */
+     if (isdefault2)
+         nd2 = inner_rel->rows;
+
+     /*
       * We clamp nd2 to be not more than what we estimate the inner relation's
       * size to be.  This is intuitively somewhat reasonable since obviously
       * there can't be more than that many distinct values coming from the
*************** eqjoinsel_semi(Oid operator,
*** 2583,2606 ****
       * We can apply this clamping both with respect to the base relation from
       * which the join variable comes (if there is just one), and to the
       * immediate inner input relation of the current join.
-      *
-      * If we clamp, we can treat nd2 as being a non-default estimate; it's not
-      * great, maybe, but it didn't come out of nowhere either.  This is most
-      * helpful when the inner relation is empty and consequently has no stats.
       */
      if (vardata2->rel)
      {
!         if (nd2 >= vardata2->rel->rows)
!         {
              nd2 = vardata2->rel->rows;
-             isdefault2 = false;
-         }
      }
!     if (nd2 >= inner_rel->rows)
!     {
          nd2 = inner_rel->rows;
-         isdefault2 = false;
-     }

      if (HeapTupleIsValid(vardata1->statsTuple))
      {
--- 2593,2606 ----
       * We can apply this clamping both with respect to the base relation from
       * which the join variable comes (if there is just one), and to the
       * immediate inner input relation of the current join.
       */
      if (vardata2->rel)
      {
!         if (nd2 > vardata2->rel->rows)
              nd2 = vardata2->rel->rows;
      }
!     if (nd2 > inner_rel->rows)
          nd2 = inner_rel->rows;

      if (HeapTupleIsValid(vardata1->statsTuple))
      {
*************** eqjoinsel_semi(Oid operator,
*** 2701,2723 ****
           * the uncertain rows that a fraction nd2/nd1 have join partners. We
           * can discount the known-matched MCVs from the distinct-values counts
           * before doing the division.
-          *
-          * Crude as the above is, it's completely useless if we don't have
-          * reliable ndistinct values for both sides.  Hence, if either nd1 or
-          * nd2 is default, punt and assume half of the uncertain rows have
-          * join partners.
           */
!         if (!isdefault1 && !isdefault2)
!         {
!             nd1 -= nmatches;
!             nd2 -= nmatches;
!             if (nd1 <= nd2 || nd2 < 0)
!                 uncertainfrac = 1.0;
!             else
!                 uncertainfrac = nd2 / nd1;
!         }
          else
!             uncertainfrac = 0.5;
          uncertain = 1.0 - matchfreq1 - nullfrac1;
          CLAMP_PROBABILITY(uncertain);
          selec = matchfreq1 + uncertainfrac * uncertain;
--- 2701,2713 ----
           * the uncertain rows that a fraction nd2/nd1 have join partners. We
           * can discount the known-matched MCVs from the distinct-values counts
           * before doing the division.
           */
!         nd1 -= nmatches;
!         nd2 -= nmatches;
!         if (nd1 <= nd2 || nd2 < 0)
!             uncertainfrac = 1.0;
          else
!             uncertainfrac = nd2 / nd1;
          uncertain = 1.0 - matchfreq1 - nullfrac1;
          CLAMP_PROBABILITY(uncertain);
          selec = matchfreq1 + uncertainfrac * uncertain;
*************** eqjoinsel_semi(Oid operator,
*** 2730,2744 ****
           */
          double        nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;

!         if (!isdefault1 && !isdefault2)
!         {
!             if (nd1 <= nd2 || nd2 < 0)
!                 selec = 1.0 - nullfrac1;
!             else
!                 selec = (nd2 / nd1) * (1.0 - nullfrac1);
!         }
          else
!             selec = 0.5 * (1.0 - nullfrac1);
      }

      free_attstatsslot(&sslot1);
--- 2720,2729 ----
           */
          double        nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;

!         if (nd1 <= nd2 || nd2 < 0)
!             selec = 1.0 - nullfrac1;
          else
!             selec = (nd2 / nd1) * (1.0 - nullfrac1);
      }

      free_attstatsslot(&sslot1);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index f47449b..eaffbad 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*************** where not exists (
*** 2431,2440 ****
  );
                         QUERY PLAN
  ---------------------------------------------------------
!  Hash Anti Join
!    Hash Cond: (t1.c1 = t2.c2)
     ->  Seq Scan on tt4x t1
!    ->  Hash
           ->  Merge Right Join
                 Merge Cond: (t5.c1 = t3.c2)
                 ->  Merge Join
--- 2431,2440 ----
  );
                         QUERY PLAN
  ---------------------------------------------------------
!  Nested Loop Anti Join
!    Join Filter: (t1.c1 = t2.c2)
     ->  Seq Scan on tt4x t1
!    ->  Materialize
           ->  Merge Right Join
                 Merge Cond: (t5.c1 = t3.c2)
                 ->  Merge Join

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 200 = 199 + 1?

From
Tom Lane
Date:
I wrote:
> I experimented a bit with the attached patch, which modifies
> eqjoinsel_semi in two ways.  First, if the number-of-distinct-values
> estimate for the inner rel is just a default rather than having any
> real basis, it replaces it with inner_rel->rows, effectively assuming
> that the inside of the IN or EXISTS is unique.

That might or might not be a good idea ...

> Second, it drops the
> fallback to selectivity 0.5 altogether, just applying the nd1 vs nd2
> heuristic all the time.

... but this probably isn't.  A review of the history shows me that
this change amounts to reverting 3f5d2fe30, which doesn't seem like
a good plan because that was fixing user-reported misbehavior.
The problem is still that the nd2/nd1 calculation can produce garbage
if nd2 or nd1 is made-up.  It's possible that we can get away with using
nd2 = inner_rel->rows as a suitable substitute for a default nd2, but
I'm much less happy to assume something like that for nd1.

Now, there's still not much to defend the 0.5 selectivity in particular;
according to the commit log for 3f5d2fe30, I used that because it
reproduced the behavior of previous versions that didn't understand what
a semijoin was at all.  So we could perhaps substitute some other rule
there, but I don't know what.

I also note that the precise behavior of HEAD in this area only dates
back to ca5f88502, which hasn't even shipped in a release yet, so it
surely doesn't have a lot of field experience justifying it.

Other useful-though-not-terribly-recent discussions in this area include

https://www.postgresql.org/message-id/flat/201201112110.40403.andres%40anarazel.de

https://www.postgresql.org/message-id/13290.1335976455@sss.pgh.pa.us

Given that eqjoinsel_semi has been more or less a constant source of
issues, maybe we need to think about a wholesale redesign rather than
just incremental tweaking.  But I have few ideas about what would be
better.
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers