Hello,
I have encountered problems with particular query so that
a started to dug into sources. I've two questions/ideas:
1) when optimizer computes size of join it does it as card(R1)*card(R2)*selectivity. Suppose two relations (R1 & R2)
each10000 rows. If you (inner) join them using equality operator, the result is at most 10000 rows
(min(card(R1),card(R2)).But pg estimates 1 000 000 (uses selectivity 0.01 here). Then when computing cost it will
resultin very high cost in case of hash and loop join BUT low (right) cost for merge join. It is because for hash and
loop joins the cost is estimated from row count but merge join uses another estimation (as it always know that merge
joincan be done only on equality op). It then leads to use of mergejoin for majority of joins. Unfortunately I found
thatin majority of such cases the hash join is two times faster. I tested it using SET ENABLE_MERGEJOIN=OFF ... What
aboutto change cost estimator to use min(card(R1), card(R2)) instead of card(R1)*card(R2)*selectivity in case where
R1and R2 are connected using equality ? It should lead to much faster plans for majority of SQLs.
2) suppose we have relation R1(id,name) and index ix(id,name) on it. In query like: select id,name from R1 order by id
planner will prefer to do seqscan+sort (althought the R1 is rather big). And yes it is really faster than using
indexscan. But indexscan always lookups actual record in heap even if all needed attributes are contained in the
index. Oracle and even MSSQL reads attributes directly from index without looking for actual tuple at heap. Is there
anyneed to do it in such ineffecient way ?
regards, devik