Thread: Weird behaviour in planner (PostgreSQL v 9.2.14)
Hi,
I found very weird behaviour on planner side with estimation error of 700.000.000.
Situation (with explain analyze):
EXPLAIN ANALYZE
select * from person2obj
WHERE
p2o_id IN (SELECT p2o_id::bigint FROM (SELECT * FROM (SELECT column1 AS p2o_id FROM (
VALUES ('2056892'), up to 199 values total
) AS __CDP_VALUES__) AS __CDP_DATA__) AS __TARGET__ );
;
QUERY PLANEXPLAIN ANALYZE
select * from person2obj
WHERE
p2o_id IN (SELECT p2o_id::bigint FROM (SELECT * FROM (SELECT column1 AS p2o_id FROM (
VALUES ('2056892'), up to 199 values total
) AS __CDP_VALUES__) AS __CDP_DATA__) AS __TARGET__ );
;
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=2.99..16316.34 rows=199 width=58) (actual time=0.196..1.202 rows=198 loops=1)
-> HashAggregate (cost=2.99..4.98 rows=199 width=32) (actual time=0.160..0.205 rows=199 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..2.49 rows=199 width=32) (actual time=0.003..0.088 rows=199 loops=1)
-> Index Scan using pk_person2obj on person2obj (cost=0.00..81.96 rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=199)
Index Cond: (p2o_id = ("*VALUES*".column1)::bigint)
Estimate looks pretty reasonable.
However, with length of the value list 200 (or more), the database switch to completely different (and very weird) estimation of 700.000.000:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=3.00..16398.33 rows=714143698 width=58) (actual time=0.200..1.239 rows=200 loops=1)
-> HashAggregate (cost=3.00..5.00 rows=200 width=32) (actual time=0.165..0.201 rows=200 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..2.50 rows=200 width=32) (actual time=0.004..0.090 rows=200 loops=1)
-> Index Scan using pk_person2obj on person2obj (cost=0.00..81.96 rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=200)
Index Cond: (p2o_id = ("*VALUES*".column1)::bigint)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=3.00..16398.33 rows=714143698 width=58) (actual time=0.200..1.239 rows=200 loops=1)
-> HashAggregate (cost=3.00..5.00 rows=200 width=32) (actual time=0.165..0.201 rows=200 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..2.50 rows=200 width=32) (actual time=0.004..0.090 rows=200 loops=1)
-> Index Scan using pk_person2obj on person2obj (cost=0.00..81.96 rows=1 width=58) (actual time=0.004..0.004 rows=1 loops=200)
Index Cond: (p2o_id = ("*VALUES*".column1)::bigint)
The all estimates looks ok until the final nested loop plan estimate of 700.000.000
PS: the person2obj table contains ~1.4 billion tuples, p2o_id - primary key.
--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
Senior Postgresql DBA
http://www.postgresql-consulting.ru/
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
Maxim Boguk <maxim.boguk@gmail.com> writes: > [ planner changes behavior when a VALUES RTE reaches 200 elements ] The immediate cause of that is that, lacking any real statistics for the VALUES RTE, eqjoinsel_semi() will fall back to a rather dubious default estimate if it believes it's looking at a default estimate for the number of distinct entries on either side of the join clause: /* * Without MCV lists for both sides, we can only use the heuristic * about nd1 vs nd2. */ double nullfrac1 = stats1 ? stats1->stanullfrac : 0.0; if (!isdefault1 && !isdefault2) { if (nd1 <= nd2 || nd2 < 0) selec = 1.0 - nullfrac1; else selec = (nd2 / nd1) * (1.0 - nullfrac1); } else ===> selec = 0.5 * (1.0 - nullfrac1); And get_variable_numdistinct() changes its mind about whether it's issuing a default estimate when the VALUES size reaches DEFAULT_NUM_DISTINCT (200): /* * With no data, estimate ndistinct = ntuples if the table is small, else * use default. We use DEFAULT_NUM_DISTINCT as the cutoff for "small" so * that the behavior isn't discontinuous. */ if (ntuples < DEFAULT_NUM_DISTINCT) return clamp_row_est(ntuples); *isdefault = true; return DEFAULT_NUM_DISTINCT; So basically, although this is alleged to be continuous behavior, the changeover from isdefault = false to isdefault = true causes a huge change in the result from eqjoinsel_semi. There are a couple of things we might choose to do about this: 1. get_variable_numdistinct doesn't currently pay any attention to what kind of RTE it's considering, but it does have access to that. We could teach it to assume that if the Var comes from a VALUES RTE, the values are all distinct regardless of the length of the VALUES list. That's effectively what it assumes for all VALUES of < 200 elements today, and it's not apparent why we shouldn't make the same assumption for longer lists. Or we could adopt some sort of nonlinear behavior that gradually reduces the assumed stadistinct fraction, but I don't know any rule for that that wouldn't be pretty ad-hoc. 2. We could change the default don't-know-anything selectivity estimate in eqjoinsel_semi to be something less crude than 0.5. But again, it's hard to say what to use instead. The first of these ideas might be something that would be sane to back-patch, but I'd be pretty hesitant about back-patching anything along the lines of #2; the scope of the effects is hard to predict. regards, tom lane