Re: A wrong index choose issue because of inaccurate statistics - Mailing list pgsql-hackers

From Andy Fan
Subject Re: A wrong index choose issue because of inaccurate statistics
Date
Msg-id CAKU4AWoNaJROJZUakdt6+d-h-yDxAfxtahqpjov0Hz0xdLbQLA@mail.gmail.com
Whole thread Raw
In response to A wrong index choose issue because of inaccurate statistics  (Andy Fan <zhihui.fan1213@gmail.com>)
List pgsql-hackers


Why will the (a, c) be choose?  If planner think a = x has only 1 row ..

I just did more research and found above statement is not accurate,
the root cause of this situation is because IndexSelectivity = 0. Even through I
don't think we can fix anything here since IndexSelectivity is calculated from
statistics and we don't know it is 1% wrong or 10% wrong or more just like What
Tomas said.

The way of fixing it is just add a "small" extra cost for pqquals for index
scan. that should be small enough to not have impacts on others part. I will
discuss how small is small later with details, we can say it just a guc variable
for now.

+++ b/src/backend/optimizer/path/costsize.c
@@ -730,6 +730,13 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,

        cpu_run_cost += cpu_per_tuple * tuples_fetched;

+       /*
+        * To make the planner more robust to handle some inaccurate statistics
+        * issue, we will add a extra cost to qpquals so that the less qpquals
+        * the lower cost it has.
+        */
+       cpu_run_cost += stat_stale_cost * list_length(qpquals);
+

If we want to reduce the impact of this change further, we can only add this if
the IndexSelecivity == 0.  

How to set the value of stat_stale_cost? Since the minimum cost for a query
should be a cpu_tuple_cost which is 0.01 default. Adding an 0.01 cost for each
pqqual in index scan should not make a big difference.  However sometimes we may
set it to 0.13 if we consider index->tree_height was estimated wrongly for 1 (cost is
50 * 0.0025 = 0.125). I don't know how it happened, but looks it do happen in prod
environment. At the same time it is unlikely index->tree_height is estimated
wrongly for 2 or more. so basically we can set this value to 0(totally disable
this feature), 0.01 (should be ok for most case), 0.13 (A bit aggressive).

The wrong estimation of IndexSelectitity = 0 might be common case and if
people just have 2 related index like (A, B) and (A, C). we have 50% chances to
have a wrong decision, so I would say this case worth the troubles. My current
implementation looks not cool, so any suggestion to research further is pretty
welcome.

--
Best Regards
Andy Fan

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Atomic operations within spinlocks
Next
From: Andy Fan
Date:
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey