Hi hackers,
While analyzing planner performance on JOB with default_statistics_target = 1000, I noticed that a significant portion of planning time is spent inside the eqjoinsel() function. According to perf, in most JOB queries at default_statistics_target = 1000, eqjoinsel() is the most expensive function during planning, accounting for approximately 8% of total CPU time. At default_statistics_target = 10000, the planner spend up to 75% of its time inside eqjoinsel(), making it one of the primary bottlenecks.
Еhis overhead is caused by the O(N^2) nested-loop comparison of MCVs in var1 = var2 clauses.
I propose an optimization: when the column datatype supports ordering(i.e., has < and >), we can sort both MCV lists and apply mege-style algorithm to detect matches. This reduces runtime from O(N^2) to O(NlogN), where N is the number of MCV entries. The patch also applies the same optimization to semi-join clauses, which show similar performance behavior.
On JOB, this changes reduce planner time in most queries with complex joins and large MCVs with no observable effect on plan quality. I’ve also attached bar charts showing per-query planner time before and after the patch for default_statistics_target = 100, 1000, 10000 along with query numbers for reference.
Any feedback or suggestions are welcome!
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.