RE: Multivariate MCV list vs. statistics target - Mailing list pgsql-hackers
From | Jamison, Kirk |
---|---|
Subject | RE: Multivariate MCV list vs. statistics target |
Date | |
Msg-id | D09B13F772D2274BB348A310EE3027C6517E5D@g01jpexmbkw24 Whole thread Raw |
In response to | Re: Multivariate MCV list vs. statistics target (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
RE: Multivariate MCV list vs. statistics target
|
List | pgsql-hackers |
On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote: > On Fri, Jul 26, 2019 at 07:03:41AM +0000, Jamison, Kirk wrote: > >On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote: > > > >> >+ /* XXX What if the target is set to 0? Reset the statistic? > >> */ > >> > > >> >This also makes me wonder. I haven't looked deeply into the code, > >> >but since 0 is a valid value, I believe it should reset the stats. > >> > >> I agree resetting the stats after setting the target to 0 seems quite > >> reasonable. But that's not what we do for attribute stats, because in > >> that case we simply skip the attribute during the future ANALYZE runs > >> - we don't reset the stats, we keep the existing stats. So I've done > >> the same thing here, and I've removed the XXX comment. > >> > >> If we want to change that, I'd do it in a separate patch for both the > >> regular and extended stats. > > > >Hi, Tomas > > > >Sorry for my late reply. > >You're right. I have no strong opinion whether we'd want to change that > behavior. > >I've also confirmed the change in the patch where setting statistics > >target 0 skips the statistics. > > > > OK, thanks. > > >Maybe only some minor nitpicks in the source code comments below: > >1. "it's" should be "its": > >> + * Compute statistic target, based on what's set for the > statistic > >> + * object itself, and for it's attributes. > > > >2. Consistency whether you'd use either "statistic " or "statisticS ". > >Ex. statistic target vs statisticS target, statistics object vs statistic > object, etc. > > > >> Attached is v4 of the patch, with a couple more improvements: > >> > >> 1) I've renamed the if_not_exists flag to missing_ok, because that's > >> more consistent with the "IF EXISTS" clause in the grammar (the old > >> flag was kinda the exact opposite), and I've added a NOTICE about the skip. > > > >+ bool missing_ok; /* do nothing if statistics does > not exist */ > > > >Confirmed. So we ignore if statistic does not exist, and skip the error. > >Maybe to make it consistent with other data structures in > >parsernodes.h, you can change the comment of missing_ok to: > >/* skip error if statistics object does not exist */ > > > > Thanks, I've fixed all those places in the attached v5. I've confirmed the fix. > >> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because > >> that's what the function was doing anyway (computing sample size). > >> > >> 3) I've added a couple of regression tests to stats_ext.sql > >> > >> Aside from that, I've cleaned up a couple of places and improved a > >> bunch of comments. Nothing huge. > > > >I have a question though regarding ComputeExtStatisticsRows. > >I'm just curious with the value 300 when computing sample size. > >Where did this value come from? > > > >+ /* compute sample size based on the statistic target */ > >+ return (300 * result); > > > >Overall, the patch is almost already in good shape for commit. > >I'll wait for the next update. > > > > That's how we compute number of rows to sample, based on the statistics target. > See std_typanalyze() in analyze.c, which also cites the paper where this comes > from. Noted. Found it. Thank you for the reference. There's just a small whitespace (extra space) below after running git diff --check. >src/bin/pg_dump/pg_dump.c:7226: trailing whitespace. >+ It would be better to post an updated patch, but other than that, I've confirmed the fixes. The patch passed the make-world and regression tests as well. I've marked this as "ready for committer". Regards, Kirk Jamison
pgsql-hackers by date: