Thread: v10 release notes for extended stats
2017-03-24 [7b504eb28] Implement multivariate n-distinct coefficients 2017-04-05 [2686ee1b7] Collect and use multi-column dependency stats 2017-05-12 [bc085205c] Change CREATE STATISTICS syntax The existing notes say: |Add multi-column optimizer statistics to compute the correlation ratio and number of distinct values (Tomas Vondra, DavidRowley, Álvaro Herrera) |New commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS. |This feature is helpful in estimating query memory usage and when combining the statistics from individual columns. "correlation ratio" is referring to stxkind=d (dependencies), right ? That's very unclear. "helpful in estimating query memory usage": I guess it means that this allows the planner to correctly account for large vs small number of GROUP BY values, but it sounds more like it's going to help a user to estimate memory use. "when combining the statistics from individual columns." this is referring to stxkind=d, handling correlated/redundant clauses, but it'd be hard for a user to know that. Also, maybe it should say "combining stats from columns OF THE SAME TABLE". So I propose: |Allow creation of multi-column statistics objects, for computing the |dependencies between columns and number of distinct values of combinations of columns |(Tomas Vondra, |David Rowley, Álvaro Herrera) |The new commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS. |Improved statistics allow the planner to generate better query plans with more accurate |estimates of the row count and memory usage when grouping by multiple |columns, and more accurate estimates of the row count if WHERE clauses apply |to multiple columns and values of some columns are correlated with values of |other columns.
On Sat, Dec 19, 2020 at 01:39:27PM -0600, Justin Pryzby wrote: > 2017-03-24 [7b504eb28] Implement multivariate n-distinct coefficients > 2017-04-05 [2686ee1b7] Collect and use multi-column dependency stats > 2017-05-12 [bc085205c] Change CREATE STATISTICS syntax > > The existing notes say: > |Add multi-column optimizer statistics to compute the correlation ratio and number of distinct values (Tomas Vondra, DavidRowley, Álvaro Herrera) > |New commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS. > |This feature is helpful in estimating query memory usage and when combining the statistics from individual columns. > > "correlation ratio" is referring to stxkind=d (dependencies), right ? That's > very unclear. > > "helpful in estimating query memory usage": I guess it means that this allows > the planner to correctly account for large vs small number of GROUP BY values, > but it sounds more like it's going to help a user to estimate memory use. > > "when combining the statistics from individual columns." this is referring to > stxkind=d, handling correlated/redundant clauses, but it'd be hard for a user > to know that. > > Also, maybe it should say "combining stats from columns OF THE SAME TABLE". > > So I propose: > |Allow creation of multi-column statistics objects, for computing the > |dependencies between columns and number of distinct values of combinations of columns > |(Tomas Vondra, |David Rowley, Álvaro Herrera) > |The new commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS. > |Improved statistics allow the planner to generate better query plans with more accurate > |estimates of the row count and memory usage when grouping by multiple > |columns, and more accurate estimates of the row count if WHERE clauses apply > |to multiple columns and values of some columns are correlated with values of > |other columns. Uh, at the time, that was the best text we could come up with. We don't usually go back to update them unless there is a very good reason, and I am not seeing that above. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Bruce Momjian <bruce@momjian.us> writes: > On Sat, Dec 19, 2020 at 01:39:27PM -0600, Justin Pryzby wrote: >> So I propose: > Uh, at the time, that was the best text we could come up with. We don't > usually go back to update them unless there is a very good reason, and I > am not seeing that above. Yeah, it's a couple years too late to be worth spending effort on improving the v10 notes, I fear. If there's text in the main documentation that could be improved, that's a different story. regards, tom lane