Andres Freund <andres@anarazel.de> writes:
> In fact using conceptually like a new snapshot for each sample tuple
> actually seems like it'd be somewhat of an improvement over using a
> single snapshot.
Dunno, that feels like a fairly bad idea to me. It seems like it would
overemphasize the behavior of whatever queries happened to be running
concurrently with the ANALYZE. I do follow the argument that using a
single snapshot for the whole ANALYZE overemphasizes a single instant
in time, but I don't think that leads to the conclusion that we shouldn't
use a snapshot at all.
Another angle that would be worth considering, aside from the issue
of whether the sample used for pg_statistic becomes more or less
representative, is what impact all this would have on the tuple count
estimates that go to the stats collector and pg_class.reltuples.
Right now, we don't have a great story at all on how the stats collector's
count is affected by combining VACUUM/ANALYZE table-wide counts with
the incremental deltas reported by transactions happening concurrently
with VACUUM/ANALYZE. Would changing this behavior make that better,
or worse, or about the same?
regards, tom lane