Thread: [HACKERS] estimation for join results cardinality is sometimes more than theproduct of the downstream nodes'
[HACKERS] estimation for join results cardinality is sometimes more than theproduct of the downstream nodes'
From
Alexey Bashtanov
Date:
Hello,
Postgres can produce a plan with a nested loop node having rows estimate much more than the product of underlying nodes' estimates, relying only on outer relation size:
alexey=# explain
SELECT oid, relname
FROM (
SELECT m.oid, m.relname
FROM pg_class m
UNION ALL
SELECT m.oid, m.relname
FROM pg_class m
) m
WHERE oid IN (VALUES (162456317), (162456310));
QUERY PLAN
----------------------------------------------------------------------------------------------------
Nested Loop (cost=0.31..33.24 rows=341 width=68)
-> Unique (cost=0.04..0.04 rows=2 width=4)
-> Sort (cost=0.04..0.04 rows=2 width=4)
Sort Key: (("*VALUES*".column1)::oid)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4)
-> Append (cost=0.27..16.58 rows=2 width=68)
-> Index Scan using pg_class_oid_index on pg_class m (cost=0.27..8.29 rows=1 width=68)
Index Cond: (oid = ("*VALUES*".column1)::oid)
-> Index Scan using pg_class_oid_index on pg_class m_1 (cost=0.27..8.29 rows=1 width=68)
Index Cond: (oid = ("*VALUES*".column1)::oid)
(10 rows)
Why?
Is there a reason that join cardinality estimates are not limited by the product of the joined parts cardinalities like in the join-card-est.patch attached?
An example of a query working faster as a result of this change is in join-card-est.sql, result is in join-card-est.result
Best Regards,
Alexey
Postgres can produce a plan with a nested loop node having rows estimate much more than the product of underlying nodes' estimates, relying only on outer relation size:
alexey=# explain
SELECT oid, relname
FROM (
SELECT m.oid, m.relname
FROM pg_class m
UNION ALL
SELECT m.oid, m.relname
FROM pg_class m
) m
WHERE oid IN (VALUES (162456317), (162456310));
QUERY PLAN
----------------------------------------------------------------------------------------------------
Nested Loop (cost=0.31..33.24 rows=341 width=68)
-> Unique (cost=0.04..0.04 rows=2 width=4)
-> Sort (cost=0.04..0.04 rows=2 width=4)
Sort Key: (("*VALUES*".column1)::oid)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4)
-> Append (cost=0.27..16.58 rows=2 width=68)
-> Index Scan using pg_class_oid_index on pg_class m (cost=0.27..8.29 rows=1 width=68)
Index Cond: (oid = ("*VALUES*".column1)::oid)
-> Index Scan using pg_class_oid_index on pg_class m_1 (cost=0.27..8.29 rows=1 width=68)
Index Cond: (oid = ("*VALUES*".column1)::oid)
(10 rows)
Why?
Is there a reason that join cardinality estimates are not limited by the product of the joined parts cardinalities like in the join-card-est.patch attached?
An example of a query working faster as a result of this change is in join-card-est.sql, result is in join-card-est.result
Best Regards,
Alexey
Attachment
Re: [HACKERS] estimation for join results cardinality is sometimes more than the product of the downstream nodes'
From
Tom Lane
Date:
Alexey Bashtanov <bashtanov@imap.cc> writes: > Is there a reason that join cardinality estimates are not limited by the > product of the joined parts cardinalities like in the > join-card-est.patch attached? Because that would be giving an unfair advantage to some paths over others based on nothing except estimation errors. I do not think we'd get a net benefit in plan quality. If we could do this earlier and adjust the join relation's overall cardinality estimate, it might be something to consider. regards, tom lane