Re: Multivariate MCV list vs. statistics target - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Multivariate MCV list vs. statistics target |
Date | |
Msg-id | 20190726220552.kafvalbd2o3l7xw6@development Whole thread Raw |
In response to | RE: Multivariate MCV list vs. statistics target ("Jamison, Kirk" <k.jamison@jp.fujitsu.com>) |
Responses |
RE: Multivariate MCV list vs. statistics target
|
List | pgsql-hackers |
On Fri, Jul 26, 2019 at 07:03:41AM +0000, Jamison, Kirk wrote: >On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote: > >> >+ /* XXX What if the target is set to 0? Reset the statistic? >> */ >> > >> >This also makes me wonder. I haven't looked deeply into the code, but >> >since 0 is a valid value, I believe it should reset the stats. >> >> I agree resetting the stats after setting the target to 0 seems quite >> reasonable. But that's not what we do for attribute stats, because in that >> case we simply skip the attribute during the future ANALYZE runs - we don't >> reset the stats, we keep the existing stats. So I've done the same thing here, >> and I've removed the XXX comment. >> >> If we want to change that, I'd do it in a separate patch for both the regular >> and extended stats. > >Hi, Tomas > >Sorry for my late reply. >You're right. I have no strong opinion whether we'd want to change that behavior. >I've also confirmed the change in the patch where setting statistics target 0 >skips the statistics. > OK, thanks. >Maybe only some minor nitpicks in the source code comments below: >1. "it's" should be "its": >> + * Compute statistic target, based on what's set for the statistic >> + * object itself, and for it's attributes. > >2. Consistency whether you'd use either "statistic " or "statisticS ". >Ex. statistic target vs statisticS target, statistics object vs statistic object, etc. > >> Attached is v4 of the patch, with a couple more improvements: >> >> 1) I've renamed the if_not_exists flag to missing_ok, because that's more >> consistent with the "IF EXISTS" clause in the grammar (the old flag was kinda >> the exact opposite), and I've added a NOTICE about the skip. > >+ bool missing_ok; /* do nothing if statistics does not exist */ > >Confirmed. So we ignore if statistic does not exist, and skip the error. >Maybe to make it consistent with other data structures in parsernodes.h, >you can change the comment of missing_ok to: >/* skip error if statistics object does not exist */ > Thanks, I've fixed all those places in the attached v5. >> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because that's >> what the function was doing anyway (computing sample size). >> >> 3) I've added a couple of regression tests to stats_ext.sql >> >> Aside from that, I've cleaned up a couple of places and improved a bunch of >> comments. Nothing huge. > >I have a question though regarding ComputeExtStatisticsRows. >I'm just curious with the value 300 when computing sample size. >Where did this value come from? > >+ /* compute sample size based on the statistic target */ >+ return (300 * result); > >Overall, the patch is almost already in good shape for commit. >I'll wait for the next update. > That's how we compute number of rows to sample, based on the statistics target. See std_typanalyze() in analyze.c, which also cites the paper where this comes from. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: