RE: Multivariate MCV list vs. statistics target - Mailing list pgsql-hackers

From Jamison, Kirk
Subject RE: Multivariate MCV list vs. statistics target
Date
Msg-id D09B13F772D2274BB348A310EE3027C6518F94@g01jpexmbkw24
Whole thread Raw
In response to RE: Multivariate MCV list vs. statistics target  ("Jamison, Kirk" <k.jamison@jp.fujitsu.com>)
Responses Re: Multivariate MCV list vs. statistics target
Re: Multivariate MCV list vs. statistics target
List pgsql-hackers
Hi, 

> From: Jamison, Kirk [mailto:k.jamison@jp.fujitsu.com]
> Sent: Monday, July 29, 2019 10:53 AM
> To: 'Tomas Vondra' <tomas.vondra@2ndquadrant.com>
> Cc: Dean Rasheed <dean.a.rasheed@gmail.com>; PostgreSQL Hackers
> <pgsql-hackers@lists.postgresql.org>
> Subject: RE: Multivariate MCV list vs. statistics target
> 
> On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote:
> > On Fri, Jul 26, 2019 at 07:03:41AM +0000, Jamison, Kirk wrote:
> > >On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote:
> > >
> > >> >+        /* XXX What if the target is set to 0? Reset the statistic?
> > >> */
> > >> >
> > >> >This also makes me wonder. I haven't looked deeply into the code,
> > >> >but since 0 is a valid value, I believe it should reset the stats.
> > >>
> > >> I agree resetting the stats after setting the target to 0 seems
> > >> quite reasonable. But that's not what we do for attribute stats,
> > >> because in that case we simply skip the attribute during the future
> > >> ANALYZE runs
> > >> - we don't reset the stats, we keep the existing stats. So I've
> > >> done the same thing here, and I've removed the XXX comment.
> > >>
> > >> If we want to change that, I'd do it in a separate patch for both
> > >> the regular and extended stats.
> > >
> > >Hi, Tomas
> > >
> > >Sorry for my late reply.
> > >You're right. I have no strong opinion whether we'd want to change
> > >that
> > behavior.
> > >I've also confirmed the change in the patch where setting statistics
> > >target 0 skips the statistics.
> > >
> >
> > OK, thanks.
> >
> > >Maybe only some minor nitpicks in the source code comments below:
> > >1. "it's" should be "its":
> > >> +         * Compute statistic target, based on what's set for the
> > statistic
> > >> +         * object itself, and for it's attributes.
> > >
> > >2. Consistency whether you'd use either "statistic " or "statisticS ".
> > >Ex. statistic target vs statisticS target, statistics object vs
> > >statistic
> > object, etc.
> > >
> > >> Attached is v4 of the patch, with a couple more improvements:
> > >>
> > >> 1) I've renamed the if_not_exists flag to missing_ok, because
> > >> that's more consistent with the "IF EXISTS" clause in the grammar
> > >> (the old flag was kinda the exact opposite), and I've added a NOTICE
> about the skip.
> > >
> > >+    bool        missing_ok;      /* do nothing if statistics does
> > not exist */
> > >
> > >Confirmed. So we ignore if statistic does not exist, and skip the error.
> > >Maybe to make it consistent with other data structures in
> > >parsernodes.h, you can change the comment of missing_ok to:
> > >/* skip error if statistics object does not exist */
> > >
> >
> > Thanks, I've fixed all those places in the attached v5.
> 
> I've confirmed the fix.
> 
> > >> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows,
> > >> because that's what the function was doing anyway (computing sample
> size).
> > >>
> > >> 3) I've added a couple of regression tests to stats_ext.sql
> > >>
> > >> Aside from that, I've cleaned up a couple of places and improved a
> > >> bunch of comments. Nothing huge.
> > >
> > >I have a question though regarding ComputeExtStatisticsRows.
> > >I'm just curious with the value 300 when computing sample size.
> > >Where did this value come from?
> > >
> > >+    /* compute sample size based on the statistic target */
> > >+    return (300 * result);
> > >
> > >Overall, the patch is almost already in good shape for commit.
> > >I'll wait for the next update.
> > >
> >
> > That's how we compute number of rows to sample, based on the statistics
> target.
> > See std_typanalyze() in analyze.c, which also cites the paper where
> > this comes from.
> Noted. Found it. Thank you for the reference.
> 
> 
> There's just a small whitespace (extra space) below after running git diff
> --check.
> >src/bin/pg_dump/pg_dump.c:7226: trailing whitespace.
> >+
> It would be better to post an updated patch, but other than that, I've confirmed
> the fixes.
> The patch passed the make-world and regression tests as well.
> I've marked this as "ready for committer".

The patch does not apply anymore.
In addition to the whitespace detected,
kindly rebase the patch as there were changes from recent commits
in the following files:
       src/backend/commands/analyze.c
       src/bin/pg_dump/pg_dump.c
       src/bin/pg_dump/t/002_pg_dump.pl
       src/test/regress/expected/stats_ext.out
       src/test/regress/sql/stats_ext.sql

Regards,
Kirk Jamison



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: POC: converting Lists into arrays
Next
From: Tom Lane
Date:
Subject: Re: POC: converting Lists into arrays