On Tue, 9 Jun 2020 at 14:27, Thomas Munro <thomas.munro@gmail.com> wrote:
> Two recent failures show plan changes in RLS queries on master. Based
> on nearby comments, the choice plan is being used to verify access (or
> lack of access) to row estimates, so I guess that means something
> could be amiss here. (Or it could be due to the dropped UDP flaky
> stats problem, but then why in the same place twice, and why twice in
> a week, only on master, and not for months before that?)
I see 0c882e52a did change the number of statistics targets on that
table. The first failure was on the commit directly after that one.
I'm not sure what instability Tom meant when he wrote "-- results
below depend on having quite accurate stats for atest12".
It does seem plausible, given how slow prion is that autovacuum might
be trigger after the manual vacuum somehow and building stats with
just 1k buckets instead of 10k. 0936d1b6 made some changes to disable
autovacuum because it was sometimes coming in and messing with the
statistics, maybe we need to do the same here, or at least do
something less temporary than changing default_statistics_target.
select attname,array_length(histogram_bounds,1) from pg_stats where
tablename = 'atest12' order by attname;
should mention the array length is 10000 if it's working as intended.
Is it worth sticking that query in there before and after the failures
to ensure we're working with the stats we think we are?
David