Thread: Multivariate MCV list vs. statistics target
Hi, One slightly inconvenient thing I realized while playing with the address data set is that it's somewhat difficult to set the desired size of the multi-column MCV list. At the moment, we simply use the maximum statistic target for attributes the MCV list is built on. But that does not allow keeping default size for per-column stats, and only increase size of multi-column MCV lists. So I'm thinking we should allow tweaking the statistics for extended stats, and serialize it in the pg_statistic_ext catalog. Any opinions why that would be a bad idea? I suppose it should be part of the CREATE STATISTICS command, but I'm not sure what'd be the best syntax. We might also have something more similar to ALTER COLUMNT, but perhaps ALTER STATISTICS s SET STATISTICS 1000; looks a bit too weird. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 18 Jun 2019 at 22:34, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > One slightly inconvenient thing I realized while playing with the > address data set is that it's somewhat difficult to set the desired size > of the multi-column MCV list. > > At the moment, we simply use the maximum statistic target for attributes > the MCV list is built on. But that does not allow keeping default size > for per-column stats, and only increase size of multi-column MCV lists. > > So I'm thinking we should allow tweaking the statistics for extended > stats, and serialize it in the pg_statistic_ext catalog. Any opinions > why that would be a bad idea? > Seems reasonable to me. This might not be the only option we'll ever want to add though, so perhaps a "stxoptions text[]" column along the lines of a relation's reloptions would be the way to go. > I suppose it should be part of the CREATE STATISTICS command, but I'm > not sure what'd be the best syntax. We might also have something more > similar to ALTER COLUMNT, but perhaps > > ALTER STATISTICS s SET STATISTICS 1000; > > looks a bit too weird. > Yes it does look a bit weird, but that's the natural generalisation of what we have for per-column statistics, so it's probably preferable to do that rather than invent some other syntax that wouldn't be so consistent. Regards, Dean
On Thu, Jun 20, 2019 at 08:08:44AM +0100, Dean Rasheed wrote: >On Tue, 18 Jun 2019 at 22:34, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> One slightly inconvenient thing I realized while playing with the >> address data set is that it's somewhat difficult to set the desired size >> of the multi-column MCV list. >> >> At the moment, we simply use the maximum statistic target for attributes >> the MCV list is built on. But that does not allow keeping default size >> for per-column stats, and only increase size of multi-column MCV lists. >> >> So I'm thinking we should allow tweaking the statistics for extended >> stats, and serialize it in the pg_statistic_ext catalog. Any opinions >> why that would be a bad idea? >> > >Seems reasonable to me. This might not be the only option we'll ever >want to add though, so perhaps a "stxoptions text[]" column along the >lines of a relation's reloptions would be the way to go. > I don't know - I kinda dislike the idea of stashing stuff like this into text[] arrays unless there's a clear need for such flexibility (i.e. vision to have more such options). Which I'm not sure is the case here. And we kinda have a precedent in pg_attribute.attstattarget, so I'd use the same approach here. >> I suppose it should be part of the CREATE STATISTICS command, but I'm >> not sure what'd be the best syntax. We might also have something more >> similar to ALTER COLUMNT, but perhaps >> >> ALTER STATISTICS s SET STATISTICS 1000; >> >> looks a bit too weird. >> > >Yes it does look a bit weird, but that's the natural generalisation of >what we have for per-column statistics, so it's probably preferable to >do that rather than invent some other syntax that wouldn't be so >consistent. > Yeah, I agree. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 20 Jun 2019 at 23:12, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Thu, Jun 20, 2019 at 08:08:44AM +0100, Dean Rasheed wrote: > >On Tue, 18 Jun 2019 at 22:34, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > >> > >> So I'm thinking we should allow tweaking the statistics for extended > >> stats, and serialize it in the pg_statistic_ext catalog. Any opinions > >> why that would be a bad idea? > > > >Seems reasonable to me. This might not be the only option we'll ever > >want to add though, so perhaps a "stxoptions text[]" column along the > >lines of a relation's reloptions would be the way to go. > > I don't know - I kinda dislike the idea of stashing stuff like this into > text[] arrays unless there's a clear need for such flexibility (i.e. > vision to have more such options). Which I'm not sure is the case here. > And we kinda have a precedent in pg_attribute.attstattarget, so I'd use > the same approach here. > Hmm, maybe. I can certainly understand your dislike of using text[]. I'm not sure that we can confidently say that multivariate statistics won't ever need additional tuning knobs, but I have no idea at the moment what they might be, and nothing else has come up so far in all the time spent considering MCV lists and histograms, so maybe this is OK. Regards, Dean
On Fri, Jun 21, 2019 at 08:09:18AM +0100, Dean Rasheed wrote: >On Thu, 20 Jun 2019 at 23:12, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> On Thu, Jun 20, 2019 at 08:08:44AM +0100, Dean Rasheed wrote: >> >On Tue, 18 Jun 2019 at 22:34, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> >> >> So I'm thinking we should allow tweaking the statistics for extended >> >> stats, and serialize it in the pg_statistic_ext catalog. Any opinions >> >> why that would be a bad idea? >> > >> >Seems reasonable to me. This might not be the only option we'll ever >> >want to add though, so perhaps a "stxoptions text[]" column along the >> >lines of a relation's reloptions would be the way to go. >> >> I don't know - I kinda dislike the idea of stashing stuff like this into >> text[] arrays unless there's a clear need for such flexibility (i.e. >> vision to have more such options). Which I'm not sure is the case here. >> And we kinda have a precedent in pg_attribute.attstattarget, so I'd use >> the same approach here. >> > >Hmm, maybe. I can certainly understand your dislike of using text[]. >I'm not sure that we can confidently say that multivariate statistics >won't ever need additional tuning knobs, but I have no idea at the >moment what they might be, and nothing else has come up so far in all >the time spent considering MCV lists and histograms, so maybe this is >OK. > OK, attached is a patch implementing this - it adds ALTER STATISTICS ... SET STATISTICS ... modifying a new stxstattarget column in pg_statistic_ext catalog, following the same logic as pg_attribute.attstattarget. During analyze, the per-ext-statistic value is determined like this: 1) When pg_statistic_ext.stxstattarget != (-1), we just use this value and we're done. 2) Otherwise we inspect per-column attstattarget values, and use the largest value. This is what we do now, so it's backwards-compatible behavior. 3) If the value is still (-1), we use default_statistics_target. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, 21 Jun 2019 at 19:09, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > Hmm, maybe. I can certainly understand your dislike of using text[]. > I'm not sure that we can confidently say that multivariate statistics > won't ever need additional tuning knobs, but I have no idea at the > moment what they might be, and nothing else has come up so far in all > the time spent considering MCV lists and histograms, so maybe this is > OK. I agree with having the stxstattarget column. Even if something did come up in the future, then we could consider merging the stxstattarget column with a new text[] column at that time. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, apparently v1 of the ALTER STATISTICS patch was a bit confused about length of the VacAttrStats array passed to statext_compute_stattarget, causing segfaults. Attached v2 patch fixes that, and it also makes sure we print warnings about ignored statistics only once. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sun, Jul 07, 2019 at 12:02:38AM +0200, Tomas Vondra wrote: >Hi, > >apparently v1 of the ALTER STATISTICS patch was a bit confused about >length of the VacAttrStats array passed to statext_compute_stattarget, >causing segfaults. Attached v2 patch fixes that, and it also makes sure >we print warnings about ignored statistics only once. > v3 of the patch, adding pg_dump support - it works just like when you tweak statistics target for a column, for example. When the value stored in the catalog is not -1, pg_dump emits a separate ALTER STATISTICS command setting it (for the already created statistics object). I've considered making it part of CREATE STATISTICS itself, but it seems a bit cumbersome and we don't do it for columns either. I'm not against adding it in the future, but at this point I don't see a need. At this point I'm not aware of any missing or broken pieces, so I'd welcome feedback. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tuesday, July 9, 2019, Tomas Vondra wrote: > >apparently v1 of the ALTER STATISTICS patch was a bit confused about > >length of the VacAttrStats array passed to statext_compute_stattarget, > >causing segfaults. Attached v2 patch fixes that, and it also makes sure > >we print warnings about ignored statistics only once. > > > > v3 of the patch, adding pg_dump support - it works just like when you tweak > statistics target for a column, for example. When the value stored in the > catalog is not -1, pg_dump emits a separate ALTER STATISTICS command setting > it (for the already created statistics object). > Hi Tomas, I stumbled upon your patch. According to the CF bot, your patch applies cleanly, builds successfully, and passes make world. Meaning, the pg_dump tap test passed, but there was no test for the new SET STATISTICS yet. So you might want to add a regression test for that and integrate it in the existing alter_generic file. Upon quick read-through, the syntax and docs are correct because it's similar to the format of ALTER TABLE/INDEX... SET STATISTICS... : ALTER [ COLUMN ] column_name SET STATISTICS integer + /* XXX What if the target is set to 0? Reset the statistic? */ This also makes me wonder. I haven't looked deeply into the code, but since 0 is a valid value, I believe it should reset the stats. After lookup though, this is how it's tested in ALTER TABLE: /test/regress/sql/stats_ext.sql:-- Ensure things work sanely with SET STATISTICS 0 > I've considered making it part of CREATE STATISTICS itself, but it seems a > bit cumbersome and we don't do it for columns either. I'm not against adding > it in the future, but at this point I don't see a need. I agree. Perhaps that's for another patch should you decide to add it in the future. Regards, Kirk Jamison
On Fri, Jul 19, 2019 at 06:12:20AM +0000, Jamison, Kirk wrote: >On Tuesday, July 9, 2019, Tomas Vondra wrote: >> >apparently v1 of the ALTER STATISTICS patch was a bit confused about >> >length of the VacAttrStats array passed to statext_compute_stattarget, >> >causing segfaults. Attached v2 patch fixes that, and it also makes sure >> >we print warnings about ignored statistics only once. >> > >> >> v3 of the patch, adding pg_dump support - it works just like when you tweak >> statistics target for a column, for example. When the value stored in the >> catalog is not -1, pg_dump emits a separate ALTER STATISTICS command setting >> it (for the already created statistics object). >> > >Hi Tomas, I stumbled upon your patch. > >According to the CF bot, your patch applies cleanly, builds successfully, and >passes make world. Meaning, the pg_dump tap test passed, but there was no >test for the new SET STATISTICS yet. So you might want to add a regression >test for that and integrate it in the existing alter_generic file. > >Upon quick read-through, the syntax and docs are correct because it's similar >to the format of ALTER TABLE/INDEX... SET STATISTICS... : > ALTER [ COLUMN ] column_name SET STATISTICS integer > >+ /* XXX What if the target is set to 0? Reset the statistic? */ > >This also makes me wonder. I haven't looked deeply into the code, but since 0 is >a valid value, I believe it should reset the stats. I agree resetting the stats after setting the target to 0 seems quite reasonable. But that's not what we do for attribute stats, because in that case we simply skip the attribute during the future ANALYZE runs - we don't reset the stats, we keep the existing stats. So I've done the same thing here, and I've removed the XXX comment. If we want to change that, I'd do it in a separate patch for both the regular and extended stats. >After lookup though, this is how it's tested in ALTER TABLE: >/test/regress/sql/stats_ext.sql:-- Ensure things work sanely with SET STATISTICS 0 > Well, yeah. But that tests we skip building the extended statistic (because we excluded the column from the ANALYZE run). >> I've considered making it part of CREATE STATISTICS itself, but it seems a >> bit cumbersome and we don't do it for columns either. I'm not against adding >> it in the future, but at this point I don't see a need. > >I agree. Perhaps that's for another patch should you decide to add it in the future. > Right. Attached is v4 of the patch, with a couple more improvements: 1) I've renamed the if_not_exists flag to missing_ok, because that's more consistent with the "IF EXISTS" clause in the grammar (the old flag was kinda the exact opposite), and I've added a NOTICE about the skip. 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because that's what the function was doing anyway (computing sample size). 3) I've added a couple of regression tests to stats_ext.sql Aside from that, I've cleaned up a couple of places and improved a bunch of comments. Nothing huge. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote: > >+ /* XXX What if the target is set to 0? Reset the statistic? > */ > > > >This also makes me wonder. I haven't looked deeply into the code, but > >since 0 is a valid value, I believe it should reset the stats. > > I agree resetting the stats after setting the target to 0 seems quite > reasonable. But that's not what we do for attribute stats, because in that > case we simply skip the attribute during the future ANALYZE runs - we don't > reset the stats, we keep the existing stats. So I've done the same thing here, > and I've removed the XXX comment. > > If we want to change that, I'd do it in a separate patch for both the regular > and extended stats. Hi, Tomas Sorry for my late reply. You're right. I have no strong opinion whether we'd want to change that behavior. I've also confirmed the change in the patch where setting statistics target 0 skips the statistics. Maybe only some minor nitpicks in the source code comments below: 1. "it's" should be "its": > + * Compute statistic target, based on what's set for the statistic > + * object itself, and for it's attributes. 2. Consistency whether you'd use either "statistic " or "statisticS ". Ex. statistic target vs statisticS target, statistics object vs statistic object, etc. > Attached is v4 of the patch, with a couple more improvements: > > 1) I've renamed the if_not_exists flag to missing_ok, because that's more > consistent with the "IF EXISTS" clause in the grammar (the old flag was kinda > the exact opposite), and I've added a NOTICE about the skip. + bool missing_ok; /* do nothing if statistics does not exist */ Confirmed. So we ignore if statistic does not exist, and skip the error. Maybe to make it consistent with other data structures in parsernodes.h, you can change the comment of missing_ok to: /* skip error if statistics object does not exist */ > 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because that's > what the function was doing anyway (computing sample size). > > 3) I've added a couple of regression tests to stats_ext.sql > > Aside from that, I've cleaned up a couple of places and improved a bunch of > comments. Nothing huge. I have a question though regarding ComputeExtStatisticsRows. I'm just curious with the value 300 when computing sample size. Where did this value come from? + /* compute sample size based on the statistic target */ + return (300 * result); Overall, the patch is almost already in good shape for commit. I'll wait for the next update. Regards, Kirk Jamison
On Fri, Jul 26, 2019 at 07:03:41AM +0000, Jamison, Kirk wrote: >On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote: > >> >+ /* XXX What if the target is set to 0? Reset the statistic? >> */ >> > >> >This also makes me wonder. I haven't looked deeply into the code, but >> >since 0 is a valid value, I believe it should reset the stats. >> >> I agree resetting the stats after setting the target to 0 seems quite >> reasonable. But that's not what we do for attribute stats, because in that >> case we simply skip the attribute during the future ANALYZE runs - we don't >> reset the stats, we keep the existing stats. So I've done the same thing here, >> and I've removed the XXX comment. >> >> If we want to change that, I'd do it in a separate patch for both the regular >> and extended stats. > >Hi, Tomas > >Sorry for my late reply. >You're right. I have no strong opinion whether we'd want to change that behavior. >I've also confirmed the change in the patch where setting statistics target 0 >skips the statistics. > OK, thanks. >Maybe only some minor nitpicks in the source code comments below: >1. "it's" should be "its": >> + * Compute statistic target, based on what's set for the statistic >> + * object itself, and for it's attributes. > >2. Consistency whether you'd use either "statistic " or "statisticS ". >Ex. statistic target vs statisticS target, statistics object vs statistic object, etc. > >> Attached is v4 of the patch, with a couple more improvements: >> >> 1) I've renamed the if_not_exists flag to missing_ok, because that's more >> consistent with the "IF EXISTS" clause in the grammar (the old flag was kinda >> the exact opposite), and I've added a NOTICE about the skip. > >+ bool missing_ok; /* do nothing if statistics does not exist */ > >Confirmed. So we ignore if statistic does not exist, and skip the error. >Maybe to make it consistent with other data structures in parsernodes.h, >you can change the comment of missing_ok to: >/* skip error if statistics object does not exist */ > Thanks, I've fixed all those places in the attached v5. >> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because that's >> what the function was doing anyway (computing sample size). >> >> 3) I've added a couple of regression tests to stats_ext.sql >> >> Aside from that, I've cleaned up a couple of places and improved a bunch of >> comments. Nothing huge. > >I have a question though regarding ComputeExtStatisticsRows. >I'm just curious with the value 300 when computing sample size. >Where did this value come from? > >+ /* compute sample size based on the statistic target */ >+ return (300 * result); > >Overall, the patch is almost already in good shape for commit. >I'll wait for the next update. > That's how we compute number of rows to sample, based on the statistics target. See std_typanalyze() in analyze.c, which also cites the paper where this comes from. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote: > On Fri, Jul 26, 2019 at 07:03:41AM +0000, Jamison, Kirk wrote: > >On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote: > > > >> >+ /* XXX What if the target is set to 0? Reset the statistic? > >> */ > >> > > >> >This also makes me wonder. I haven't looked deeply into the code, > >> >but since 0 is a valid value, I believe it should reset the stats. > >> > >> I agree resetting the stats after setting the target to 0 seems quite > >> reasonable. But that's not what we do for attribute stats, because in > >> that case we simply skip the attribute during the future ANALYZE runs > >> - we don't reset the stats, we keep the existing stats. So I've done > >> the same thing here, and I've removed the XXX comment. > >> > >> If we want to change that, I'd do it in a separate patch for both the > >> regular and extended stats. > > > >Hi, Tomas > > > >Sorry for my late reply. > >You're right. I have no strong opinion whether we'd want to change that > behavior. > >I've also confirmed the change in the patch where setting statistics > >target 0 skips the statistics. > > > > OK, thanks. > > >Maybe only some minor nitpicks in the source code comments below: > >1. "it's" should be "its": > >> + * Compute statistic target, based on what's set for the > statistic > >> + * object itself, and for it's attributes. > > > >2. Consistency whether you'd use either "statistic " or "statisticS ". > >Ex. statistic target vs statisticS target, statistics object vs statistic > object, etc. > > > >> Attached is v4 of the patch, with a couple more improvements: > >> > >> 1) I've renamed the if_not_exists flag to missing_ok, because that's > >> more consistent with the "IF EXISTS" clause in the grammar (the old > >> flag was kinda the exact opposite), and I've added a NOTICE about the skip. > > > >+ bool missing_ok; /* do nothing if statistics does > not exist */ > > > >Confirmed. So we ignore if statistic does not exist, and skip the error. > >Maybe to make it consistent with other data structures in > >parsernodes.h, you can change the comment of missing_ok to: > >/* skip error if statistics object does not exist */ > > > > Thanks, I've fixed all those places in the attached v5. I've confirmed the fix. > >> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because > >> that's what the function was doing anyway (computing sample size). > >> > >> 3) I've added a couple of regression tests to stats_ext.sql > >> > >> Aside from that, I've cleaned up a couple of places and improved a > >> bunch of comments. Nothing huge. > > > >I have a question though regarding ComputeExtStatisticsRows. > >I'm just curious with the value 300 when computing sample size. > >Where did this value come from? > > > >+ /* compute sample size based on the statistic target */ > >+ return (300 * result); > > > >Overall, the patch is almost already in good shape for commit. > >I'll wait for the next update. > > > > That's how we compute number of rows to sample, based on the statistics target. > See std_typanalyze() in analyze.c, which also cites the paper where this comes > from. Noted. Found it. Thank you for the reference. There's just a small whitespace (extra space) below after running git diff --check. >src/bin/pg_dump/pg_dump.c:7226: trailing whitespace. >+ It would be better to post an updated patch, but other than that, I've confirmed the fixes. The patch passed the make-world and regression tests as well. I've marked this as "ready for committer". Regards, Kirk Jamison
Hi, > From: Jamison, Kirk [mailto:k.jamison@jp.fujitsu.com] > Sent: Monday, July 29, 2019 10:53 AM > To: 'Tomas Vondra' <tomas.vondra@2ndquadrant.com> > Cc: Dean Rasheed <dean.a.rasheed@gmail.com>; PostgreSQL Hackers > <pgsql-hackers@lists.postgresql.org> > Subject: RE: Multivariate MCV list vs. statistics target > > On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote: > > On Fri, Jul 26, 2019 at 07:03:41AM +0000, Jamison, Kirk wrote: > > >On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote: > > > > > >> >+ /* XXX What if the target is set to 0? Reset the statistic? > > >> */ > > >> > > > >> >This also makes me wonder. I haven't looked deeply into the code, > > >> >but since 0 is a valid value, I believe it should reset the stats. > > >> > > >> I agree resetting the stats after setting the target to 0 seems > > >> quite reasonable. But that's not what we do for attribute stats, > > >> because in that case we simply skip the attribute during the future > > >> ANALYZE runs > > >> - we don't reset the stats, we keep the existing stats. So I've > > >> done the same thing here, and I've removed the XXX comment. > > >> > > >> If we want to change that, I'd do it in a separate patch for both > > >> the regular and extended stats. > > > > > >Hi, Tomas > > > > > >Sorry for my late reply. > > >You're right. I have no strong opinion whether we'd want to change > > >that > > behavior. > > >I've also confirmed the change in the patch where setting statistics > > >target 0 skips the statistics. > > > > > > > OK, thanks. > > > > >Maybe only some minor nitpicks in the source code comments below: > > >1. "it's" should be "its": > > >> + * Compute statistic target, based on what's set for the > > statistic > > >> + * object itself, and for it's attributes. > > > > > >2. Consistency whether you'd use either "statistic " or "statisticS ". > > >Ex. statistic target vs statisticS target, statistics object vs > > >statistic > > object, etc. > > > > > >> Attached is v4 of the patch, with a couple more improvements: > > >> > > >> 1) I've renamed the if_not_exists flag to missing_ok, because > > >> that's more consistent with the "IF EXISTS" clause in the grammar > > >> (the old flag was kinda the exact opposite), and I've added a NOTICE > about the skip. > > > > > >+ bool missing_ok; /* do nothing if statistics does > > not exist */ > > > > > >Confirmed. So we ignore if statistic does not exist, and skip the error. > > >Maybe to make it consistent with other data structures in > > >parsernodes.h, you can change the comment of missing_ok to: > > >/* skip error if statistics object does not exist */ > > > > > > > Thanks, I've fixed all those places in the attached v5. > > I've confirmed the fix. > > > >> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, > > >> because that's what the function was doing anyway (computing sample > size). > > >> > > >> 3) I've added a couple of regression tests to stats_ext.sql > > >> > > >> Aside from that, I've cleaned up a couple of places and improved a > > >> bunch of comments. Nothing huge. > > > > > >I have a question though regarding ComputeExtStatisticsRows. > > >I'm just curious with the value 300 when computing sample size. > > >Where did this value come from? > > > > > >+ /* compute sample size based on the statistic target */ > > >+ return (300 * result); > > > > > >Overall, the patch is almost already in good shape for commit. > > >I'll wait for the next update. > > > > > > > That's how we compute number of rows to sample, based on the statistics > target. > > See std_typanalyze() in analyze.c, which also cites the paper where > > this comes from. > Noted. Found it. Thank you for the reference. > > > There's just a small whitespace (extra space) below after running git diff > --check. > >src/bin/pg_dump/pg_dump.c:7226: trailing whitespace. > >+ > It would be better to post an updated patch, but other than that, I've confirmed > the fixes. > The patch passed the make-world and regression tests as well. > I've marked this as "ready for committer". The patch does not apply anymore. In addition to the whitespace detected, kindly rebase the patch as there were changes from recent commits in the following files: src/backend/commands/analyze.c src/bin/pg_dump/pg_dump.c src/bin/pg_dump/t/002_pg_dump.pl src/test/regress/expected/stats_ext.out src/test/regress/sql/stats_ext.sql Regards, Kirk Jamison
On Thu, Aug 1, 2019 at 12:16 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote: > > On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote: > > > On Fri, Jul 26, 2019 at 07:03:41AM +0000, Jamison, Kirk wrote: > > > >Overall, the patch is almost already in good shape for commit. > > > >I'll wait for the next update. > > The patch passed the make-world and regression tests as well. > > I've marked this as "ready for committer". > > The patch does not apply anymore. Based on the above, it sounds like this patch is super close and the only problem is bitrot, so I've set it back to Ready for Committer. Over to Tomas to rebase and commit, or move to the next CF if that's more appropriate. -- Thomas Munro https://enterprisedb.com
Hello. At Thu, 1 Aug 2019 00:15:48 +0000, "Jamison, Kirk" <k.jamison@jp.fujitsu.com> wrote in <D09B13F772D2274BB348A310EE3027C6518F94@g01jpexmbkw24> > The patch does not apply anymore. > In addition to the whitespace detected, > kindly rebase the patch as there were changes from recent commits > in the following files: > src/backend/commands/analyze.c > src/bin/pg_dump/pg_dump.c > src/bin/pg_dump/t/002_pg_dump.pl > src/test/regress/expected/stats_ext.out > src/test/regress/sql/stats_ext.sql The patch finally failed only for stats_ext.out, where 14ef15a222 is hitting. (for b2a3d706b8) I looked through this patch and have some comments. +++ b/src/backend/commands/statscmds.c +#include "access/heapam.h" .. +#include "utils/fmgroids.h" These don't seem needed. + Assert(stmt->missing_ok); Perhaps we shouldn't Assert on this condition. Isn't it better we just "elog(ERROR" here? + DeconstructQualifiedName(stmt->defnames, &schemaname, &statname); Maybe we don't need detailed analysis that the function emits on error. Couldn't we use NameListToString() instead? That reduces the number of ereport()s and considerably simplify the code around. + oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(stxoid)); + + /* Must be owner of the existing statistics object */ + if (!pg_statistics_object_ownercheck(stxoid, GetUserId())) This repeats the SearchSysCache twice in a quite short duration. I suppose it'd be better that ACL (and validity) checks done directly using oldtup. + /* replace the stxstattarget column */ + repl_repl[Anum_pg_statistic_ext_stxstattarget - 1] = true; + repl_val[Anum_pg_statistic_ext_stxstattarget - 1] = Int32GetDatum(newtarget) We usually do this kind of work using SearchSysCacheCopyN(), which simplifies the code around, too. +++ b/src/backend/statistics/mcv.c > * Maximum number of MCV items to store, based on the attribute with the > * largest stats target (and the number of groups we have available). > */ - nitems = stats[0]->attr->attstattarget; - for (i = 1; i < numattrs; i++) - { - if (stats[i]->attr->attstattarget > nitems) - nitems = stats[i]->attr->attstattarget; - } + nitems = stattarget; Maybe you forgot to modify the comment. check_xact_readonly() returns false for this command. As the result it emits a somewhat pointless error message. =# alter statistics s1 set statistics 0; ERROR: cannot assign TransactionIds during recovery +++ b/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.h > i_stxname = PQfnumber(res, "stxname"); > i_stxnamespace = PQfnumber(res, "stxnamespace"); > i_rolname = PQfnumber(res, "rolname"); + i_stattarget = PQfnumber(res, "stxstattarget"); I'm not sure whether it is the convention here, but variable name is different from column name only for the added line. +++ b/src/bin/psql/tab-complete.c - COMPLETE_WITH("OWNER TO", "RENAME TO", "SET SCHEMA"); + COMPLETE_WITH("OWNER TO", "RENAME TO", "SET SCHEMA", "SET STATISTICS"); ALTER STATISTICS s2 SET STATISTICS<tab> is suggested with ext-stats names, but it's the place for target value. +++ b/src/include/nodes/nodes.h T_CallStmt, + T_AlterStatsStmt, I think it should be immediately below T_CreateStatsStmt. +++ b/src/include/nodes/parsenodes.h + bool missing_ok; /* skip error if statistics object is missing */ Should be very trivial, but many bool members especially missing_ok have a comment having "?" at the end. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Aug 01, 2019 at 05:25:31PM +1200, Thomas Munro wrote: >On Thu, Aug 1, 2019 at 12:16 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote: >> > On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote: >> > > On Fri, Jul 26, 2019 at 07:03:41AM +0000, Jamison, Kirk wrote: >> > > >Overall, the patch is almost already in good shape for commit. >> > > >I'll wait for the next update. > >> > The patch passed the make-world and regression tests as well. >> > I've marked this as "ready for committer". >> >> The patch does not apply anymore. > >Based on the above, it sounds like this patch is super close and the >only problem is bitrot, so I've set it back to Ready for Committer. >Over to Tomas to rebase and commit, or move to the next CF if that's >more appropriate. > I'll move it to the next CF. Aside from the issues pointed by Kyotaro-san in his review, I still haven't made my mind about whether to base the use statistics targets set for the attributes. That's what we're doing now, but I'm not sure it's a good idea after adding separate statistics target. I wonder what Dean's opinion on this is, as he added the current logic. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 1 Aug 2019 at 11:30, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > I'll move it to the next CF. Aside from the issues pointed by Kyotaro-san > in his review, I still haven't made my mind about whether to base the use > statistics targets set for the attributes. That's what we're doing now, > but I'm not sure it's a good idea after adding separate statistics target. > I wonder what Dean's opinion on this is, as he added the current logic. > If this were being released in the same version as MCV stats first appeared, I'd say that there's not much point basing the default multivariate stats target on the per-column targets, when it has its own knob to control it. However, since this won't be released for a year, those per-column-based defaults will be in the field for that long, and so I'd say that we shouldn't change the default when adding this, otherwise users who don't use this new feature might be surprised by the change in behaviour. Regards, Dean
On 2019-Aug-01, Tomas Vondra wrote: > I'll move it to the next CF. Aside from the issues pointed by Kyotaro-san > in his review, I still haven't made my mind about whether to base the use > statistics targets set for the attributes. That's what we're doing now, > but I'm not sure it's a good idea after adding separate statistics target. > I wonder what Dean's opinion on this is, as he added the current logic. Latest patch no longer applies. Please update. And since you already seem to have handled all review comments since it was Ready for Committer, and you now know Dean's opinion on the remaining question, please get it pushed. Thanks -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote: >On 2019-Aug-01, Tomas Vondra wrote: > >> I'll move it to the next CF. Aside from the issues pointed by Kyotaro-san >> in his review, I still haven't made my mind about whether to base the use >> statistics targets set for the attributes. That's what we're doing now, >> but I'm not sure it's a good idea after adding separate statistics target. >> I wonder what Dean's opinion on this is, as he added the current logic. > >Latest patch no longer applies. Please update. And since you already >seem to have handled all review comments since it was Ready for >Committer, and you now know Dean's opinion on the remaining question, >please get it pushed. > OK, I've pushed this the patch, after some minor polishing. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Multivariate MCV list vs. statistics target
From
"Shinoda, Noriyoshi (PN Japan A&PS Delivery)"
Date:
Hello, I found a missing column description in the pg_statistic_ext catalog document for this new feature. The attached patch adds a description of the 'stxstattarget' column to pg_statistic_ext catalog's document. If there is a better explanation, please correct it. Regards, Noriyoshi Shinoda -----Original Message----- From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com] Sent: Wednesday, September 11, 2019 7:28 AM To: Alvaro Herrera <alvherre@2ndquadrant.com> Cc: Thomas Munro <thomas.munro@gmail.com>; Jamison, Kirk <k.jamison@jp.fujitsu.com>; Dean Rasheed <dean.a.rasheed@gmail.com>;PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org> Subject: Re: Multivariate MCV list vs. statistics target On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote: >On 2019-Aug-01, Tomas Vondra wrote: > >> I'll move it to the next CF. Aside from the issues pointed by >> Kyotaro-san in his review, I still haven't made my mind about whether >> to base the use statistics targets set for the attributes. That's >> what we're doing now, but I'm not sure it's a good idea after adding separate statistics target. >> I wonder what Dean's opinion on this is, as he added the current logic. > >Latest patch no longer applies. Please update. And since you already >seem to have handled all review comments since it was Ready for >Committer, and you now know Dean's opinion on the remaining question, >please get it pushed. > OK, I've pushed this the patch, after some minor polishing. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, On Wed, Mar 18, 2020 at 04:28:34AM +0000, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote: >Hello, > >I found a missing column description in the pg_statistic_ext catalog document for this new feature. >The attached patch adds a description of the 'stxstattarget' column to pg_statistic_ext catalog's document. >If there is a better explanation, please correct it. > Thanks for the report. Yes, this is clearly an omission. I think it would be better to use wording similar to attstattarget, per the attached patch. That is, without reference to ALTER STATISTICS and better explaination of what positive/negative values do. Do you agree? regards >Regards, >Noriyoshi Shinoda > >-----Original Message----- >From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com] >Sent: Wednesday, September 11, 2019 7:28 AM >To: Alvaro Herrera <alvherre@2ndquadrant.com> >Cc: Thomas Munro <thomas.munro@gmail.com>; Jamison, Kirk <k.jamison@jp.fujitsu.com>; Dean Rasheed <dean.a.rasheed@gmail.com>;PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org> >Subject: Re: Multivariate MCV list vs. statistics target > >On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote: >>On 2019-Aug-01, Tomas Vondra wrote: >> >>> I'll move it to the next CF. Aside from the issues pointed by >>> Kyotaro-san in his review, I still haven't made my mind about whether >>> to base the use statistics targets set for the attributes. That's >>> what we're doing now, but I'm not sure it's a good idea after adding separate statistics target. >>> I wonder what Dean's opinion on this is, as he added the current logic. >> >>Latest patch no longer applies. Please update. And since you already >>seem to have handled all review comments since it was Ready for >>Committer, and you now know Dean's opinion on the remaining question, >>please get it pushed. >> > >OK, I've pushed this the patch, after some minor polishing. > > >regards > >-- >Tomas Vondra http://www.2ndQuadrant.com >PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
RE: Multivariate MCV list vs. statistics target
From
"Shinoda, Noriyoshi (PN Japan A&PS Delivery)"
Date:
Hi, >Thanks for the report. Yes, this is clearly an omission. I think it would be better to use wording >similar to attstattarget,per the attached patch. That is, without reference to ALTER STATISTICS and >better explaination of what positive/negativevalues do. Do you agree? Thank you for your comment. I agree with the text you suggested. Regards, Noriyoshi Shinoda -----Original Message----- From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com] Sent: Wednesday, March 18, 2020 9:36 PM To: Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com> Cc: Alvaro Herrera <alvherre@2ndquadrant.com>; Thomas Munro <thomas.munro@gmail.com>; Jamison, Kirk <k.jamison@jp.fujitsu.com>;Dean Rasheed <dean.a.rasheed@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org> Subject: Re: Multivariate MCV list vs. statistics target Hi, On Wed, Mar 18, 2020 at 04:28:34AM +0000, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote: >Hello, > >I found a missing column description in the pg_statistic_ext catalog document for this new feature. >The attached patch adds a description of the 'stxstattarget' column to pg_statistic_ext catalog's document. >If there is a better explanation, please correct it. > Thanks for the report. Yes, this is clearly an omission. I think it would be better to use wording similar to attstattarget,per the attached patch. That is, without reference to ALTER STATISTICS and better explaination of what positive/negativevalues do. Do you agree? regards >Regards, >Noriyoshi Shinoda > >-----Original Message----- >From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com] >Sent: Wednesday, September 11, 2019 7:28 AM >To: Alvaro Herrera <alvherre@2ndquadrant.com> >Cc: Thomas Munro <thomas.munro@gmail.com>; Jamison, Kirk ><k.jamison@jp.fujitsu.com>; Dean Rasheed <dean.a.rasheed@gmail.com>; >PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org> >Subject: Re: Multivariate MCV list vs. statistics target > >On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote: >>On 2019-Aug-01, Tomas Vondra wrote: >> >>> I'll move it to the next CF. Aside from the issues pointed by >>> Kyotaro-san in his review, I still haven't made my mind about >>> whether to base the use statistics targets set for the attributes. >>> That's what we're doing now, but I'm not sure it's a good idea after adding separate statistics target. >>> I wonder what Dean's opinion on this is, as he added the current logic. >> >>Latest patch no longer applies. Please update. And since you already >>seem to have handled all review comments since it was Ready for >>Committer, and you now know Dean's opinion on the remaining question, >>please get it pushed. >> > >OK, I've pushed this the patch, after some minor polishing. > > >regards > >-- >Tomas Vondra http://www.2ndQuadrant.com >PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 18, 2020 at 01:32:19PM +0000, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote: >Hi, > >>Thanks for the report. Yes, this is clearly an omission. I think it >>would be better to use wording >similar to attstattarget, per the >>attached patch. That is, without reference to ALTER STATISTICS and >>>better explaination of what positive/negative values do. Do you >>>agree? > >Thank you for your comment. I agree with the text you suggested. > Thank you for the report, I've pushed the reworded fix. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services