Thread: Using operators to do query hints
I've been playing with an idea I had a while back. Basically that it would be useful to have some "noop" operators that are used purely to influence the planner. For context I've suggested in the past that there are two categories of hints: 1 Hints that override the planner's decisions with explicit planning decisions. These we don't like for a variety of reasons, notably because users don't have very good knowledge of all the plan types available and new plan types come out in the future. 2 Hints that just influence the estimates that the planner uses to make its decisions. These seem more acceptable as it amounts to admitting the users know the distribution of their data and the behaviour of their functions better than the statistics. Also, there are plenty of cases where the statistics are really just wild guesses. They still allow the planner to make decisions about what join orders or types and so on given the updated data. So I thought adding some noop operators that took a boolean on one side and a float on the other and simply returned the boolean at run-time but used the float (which would have to be a constant) from the lhs as the selectivity would be useful. In practice it's usable but not nearly as widely useful as I expected. 1) The boolean being passed through has to be a real clause with real fields. Otherwise the planner is too clever and turns it into a one-time filter which doesn't affect the plan :/ 2) If it's a clauase that is potentially going to be an index condition then you need to repeat the clause outside the selectivity noop operator. Otherwise the selectivity operator hides it from the planner. 3) It doesn't work on joins at all. Joins are done using the join selectivity function on the join clause's operator. There's no way to pass construct a simple noop wrapper that would still work with that that I can see. Nonetheless what I have does seem to be somewhat handy for simple cases. I added some simple SQL wrapper functions such as pg_unlikely(): postgres=# explain select * from test where i<100; QUERY PLAN --------------------------------------------------------------------- Index Only Scan using i on test (cost=0.28..10.01 rows=99 width=4) Index Cond: (i < 100) (2 rows) postgres=# explain select * from test where pg_unlikely(i<100); QUERY PLAN -------------------------------------------------------------------- Index Only Scan using i on test (cost=0.28..10.50 rows=1 width=4) Index Cond: (i < 100) Filter: ('1e-06'::double precision %%% (i < 100)) (3 rows) However this doesn't really do what you might really be hoping. specifically, it doesn't actually affect the planner's choice of the index scan in that query. It affects the estimate of the result of the scan and that might be useful for subsequent nodes of the plan. But not for the index scan itself: postgres=# explain select * from test where pg_unlikely(i<500); QUERY PLAN --------------------------------------------------------------------- Seq Scan on test (cost=0.00..22.50 rows=1 width=4) Filter: ((i < 500) AND ('1e-06'::double precision %%% (i < 500))) (2 rows) So..... I dunno. This feels like it's something that could be quite handy but it's not as good as I had hoped and I think I'm out of ideas for making it more powerful. -- greg
On Tue, Feb 22, 2022 at 04:12:15PM -0500, Greg Stark wrote: > I've been playing with an idea I had a while back. Basically that it > would be useful to have some "noop" operators that are used purely to > influence the planner. > > For context I've suggested in the past that there are two categories of hints: > > 1 Hints that override the planner's decisions with explicit planning > decisions. These we don't like for a variety of reasons, notably > because users don't have very good knowledge of all the plan types > available and new plan types come out in the future. > > 2 Hints that just influence the estimates that the planner uses to > make its decisions. These seem more acceptable as it amounts to > admitting the users know the distribution of their data and the > behaviour of their functions better than the statistics. Also, there > are plenty of cases where the statistics are really just wild guesses. > They still allow the planner to make decisions about what join orders > or types and so on given the updated data. #2 is an interesting distinction. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.