Thread: Adjust Memoize hit_ratio calculation
Yesterday, in 785f70957, I adjusted the Memoize costing code to account for the size of the cache key when estimating how many cache entries can exist at once in the cache. That effectively makes Memoize a less likely choice as fewer entries will be expected to fit in work_mem now. Because that's being changed in v16, I think it might also be a good idea to fix the hit_ratio calculation problem reported by David Johnston in [1]. In the attached, I've adjusted David's calculation slightly so that we divide by Max(ndistinct, est_cache_entries) instead of ndistinct. This saves from overestimating when ndistinct is smaller than est_cache_entries. I'd rather fix this now for v16 than wait until v17 and further adjust the Memoize costing. I've attached a spreadsheet showing the new and old hit_ration calculations. Cells C1 - C3 can be adjusted to show what the hit ratio is for both the old and new method. Any objections? David [1] https://postgr.es/m/CAKFQuwZEmcNk3YQo2Xj4EDUOdY6qakad31rOD1Vc4q1_s68-Ew@mail.gmail.com
Attachment
On Tue, 21 Mar 2023 at 09:41, David Rowley <dgrowleyml@gmail.com> wrote: > Because that's being changed in v16, I think it might also be a good > idea to fix the hit_ratio calculation problem reported by David > Johnston in [1]. In the attached, I've adjusted David's calculation > slightly so that we divide by Max(ndistinct, est_cache_entries) > instead of ndistinct. This saves from overestimating when ndistinct > is smaller than est_cache_entries. I'd rather fix this now for v16 > than wait until v17 and further adjust the Memoize costing. I've now pushed this change. David