Thread: Weird behaviour in planner (PostgreSQL v 9.2.14)

Weird behaviour in planner (PostgreSQL v 9.2.14)

From
Maxim Boguk
Date:
Hi,

I found very weird behaviour on planner side with estimation error of 700.000.000.

Situation (with explain analyze):

EXPLAIN ANALYZE
select * from person2obj
WHERE
    p2o_id IN (SELECT p2o_id::bigint FROM (SELECT * FROM (SELECT column1 AS p2o_id FROM (
            VALUES ('2056892'), up to 199 values total
) AS __CDP_VALUES__) AS __CDP_DATA__) AS __TARGET__ );

;

                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2.99..16316.34 rows=199 width=58) (actual time=0.196..1.202 rows=198 loops=1)
   ->  HashAggregate  (cost=2.99..4.98 rows=199 width=32) (actual time=0.160..0.205 rows=199 loops=1)
         ->  Values Scan on "*VALUES*"  (cost=0.00..2.49 rows=199 width=32) (actual time=0.003..0.088 rows=199 loops=1)
   ->  Index Scan using pk_person2obj on person2obj  (cost=0.00..81.96 rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=199)
         Index Cond: (p2o_id = ("*VALUES*".column1)::bigint)


​Estimate looks pretty reasonable.


However, with length of the value list 200 (or more), the database switch to completely different (and very weird) estimation of 700.000.000:

                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=3.00..16398.33 rows=714143698 width=58) (actual time=0.200..1.239 rows=200 loops=1)
   ->  HashAggregate  (cost=3.00..5.00 rows=200 width=32) (actual time=0.165..0.201 rows=200 loops=1)
         ->  Values Scan on "*VALUES*"  (cost=0.00..2.50 rows=200 width=32) (actual time=0.004..0.090 rows=200 loops=1)
   ->  Index Scan using pk_person2obj on person2obj  (cost=0.00..81.96 rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=200)
         Index Cond: (p2o_id = ("*VALUES*".column1)::bigint)


The all estimates looks ok until the final nested loop plan estimate of ​700.000.000

PS: the person2obj table contains ~1.4 billion tuples, p2o_id - primary key.

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


Re: Weird behaviour in planner (PostgreSQL v 9.2.14)

From
Tom Lane
Date:
Maxim Boguk <maxim.boguk@gmail.com> writes:
> [ planner changes behavior when a VALUES RTE reaches 200 elements ]

The immediate cause of that is that, lacking any real statistics for the
VALUES RTE, eqjoinsel_semi() will fall back to a rather dubious default
estimate if it believes it's looking at a default estimate for the number
of distinct entries on either side of the join clause:

        /*
         * Without MCV lists for both sides, we can only use the heuristic
         * about nd1 vs nd2.
         */
        double        nullfrac1 = stats1 ? stats1->stanullfrac : 0.0;

        if (!isdefault1 && !isdefault2)
        {
            if (nd1 <= nd2 || nd2 < 0)
                selec = 1.0 - nullfrac1;
            else
                selec = (nd2 / nd1) * (1.0 - nullfrac1);
        }
        else
===>        selec = 0.5 * (1.0 - nullfrac1);

And get_variable_numdistinct() changes its mind about whether it's issuing
a default estimate when the VALUES size reaches DEFAULT_NUM_DISTINCT (200):

    /*
     * With no data, estimate ndistinct = ntuples if the table is small, else
     * use default.  We use DEFAULT_NUM_DISTINCT as the cutoff for "small" so
     * that the behavior isn't discontinuous.
     */
    if (ntuples < DEFAULT_NUM_DISTINCT)
        return clamp_row_est(ntuples);

    *isdefault = true;
    return DEFAULT_NUM_DISTINCT;

So basically, although this is alleged to be continuous behavior, the
changeover from isdefault = false to isdefault = true causes a huge
change in the result from eqjoinsel_semi.

There are a couple of things we might choose to do about this:

1. get_variable_numdistinct doesn't currently pay any attention to what
kind of RTE it's considering, but it does have access to that.  We could
teach it to assume that if the Var comes from a VALUES RTE, the values
are all distinct regardless of the length of the VALUES list.  That's
effectively what it assumes for all VALUES of < 200 elements today, and
it's not apparent why we shouldn't make the same assumption for longer
lists.  Or we could adopt some sort of nonlinear behavior that gradually
reduces the assumed stadistinct fraction, but I don't know any rule for
that that wouldn't be pretty ad-hoc.

2. We could change the default don't-know-anything selectivity estimate in
eqjoinsel_semi to be something less crude than 0.5.  But again, it's hard
to say what to use instead.

The first of these ideas might be something that would be sane to
back-patch, but I'd be pretty hesitant about back-patching anything
along the lines of #2; the scope of the effects is hard to predict.

            regards, tom lane