Thread: Thinking about ANALYZE stats and autovacuum and large non-uniform tables

Thinking about ANALYZE stats and autovacuum and large non-uniform tables

From
Greg Stark
Date:
One problem I've seen in multiple databases and is when a table has a
mixture of data sets within it. E.g. A queue table where 99% of the
entries are "done" but most queries are working with the 1% that are
"new" or in other states. Often the statistics are skewed by the
"done" entries and give bad estimates for query planning when the
query is actually looking at the other rows.

We've always talked about this as a "skewed distribution" or
"intercolumn correlation" problem. And we've developed some tools for
dealing with those issues. But I've been thinking that's not the only
problem with these cases.

The problem I'm finding is that the distribution of these small
subsets can swing quickly. And understanding intercolumn correlations
even if we could do it perfectly would be no help at all.

Consider a table with millions of rows that are "done" but none that
are "pending". Inserting just a few hundred or thousand new pending
rows makes any estimates based on the existing statistics entirely
incorrect. Even if we had perfect statistics capable of making perfect
estimates they would be entirely wrong once a few inserts of pending
rows are done.

Worse, this is kind of true for even n_dead_tup, n_mod_since_analyze,
etc are kind of affected by this. It's easy (at least on older
versions, maybe Peter's work has fixed this for btree) to get severe
index bloat because vacuum doesn't run for a long time relative to the
size of the busy portion of a table.

I'm imagining to really tackle this we should be doing something like
noticing when inserts, updates, deletes are affecting key values that
are "rare" according to the statistics and triggering autovacuum
ANALYZE statements that use indexes to only update the statistics for
the relevant key ranges.

Obviously this could get complex quickly. Perhaps it should be
something users could declare. Some kind of "partitioned statistics"
where you declare a where clause and we generate statistics for the
table where that where clause is true. Then we could fairly easily
also count things like n_mod_since_analyze for that where clause too.

And yes, partitioning the table could be a solution to this in some
cases. I think there are reasons why it isn't always going to work for
these issues though, not least that users will likely have other ways
they want to partition the data already.


-- 
greg



Re: Thinking about ANALYZE stats and autovacuum and large non-uniform tables

From
Thomas Munro
Date:
On Fri, Oct 22, 2021 at 10:13 AM Greg Stark <stark@mit.edu> wrote:
> Obviously this could get complex quickly. Perhaps it should be
> something users could declare. Some kind of "partitioned statistics"
> where you declare a where clause and we generate statistics for the
> table where that where clause is true. Then we could fairly easily
> also count things like n_mod_since_analyze for that where clause too.

It's a different thing, but somehow related and maybe worth
mentioning, that in DB2 you can declare a table to be VOLATILE.  In
that case, by some unspecified different heuristics, it'll prefer
index scans over table scans, and it's intended to give stable
performance for queue-like tables by defending against automatically
scheduled stats being collected at a bad time.  It's been a while
since I ran busy queue-like workloads on DB2 but I seem to recall it
was more about the dangers of tables that sometimes have say 10 rows
and something 42 million, rather than the case of 42 million DONE rows
and 0-10 PENDING rows, but not a million miles off.



Re: Thinking about ANALYZE stats and autovacuum and large non-uniform tables

From
Peter Geoghegan
Date:
On Thu, Oct 21, 2021 at 2:13 PM Greg Stark <stark@mit.edu> wrote:
> The problem I'm finding is that the distribution of these small
> subsets can swing quickly. And understanding intercolumn correlations
> even if we could do it perfectly would be no help at all.
>
> Consider a table with millions of rows that are "done" but none that
> are "pending". Inserting just a few hundred or thousand new pending
> rows makes any estimates based on the existing statistics entirely
> incorrect. Even if we had perfect statistics capable of making perfect
> estimates they would be entirely wrong once a few inserts of pending
> rows are done.

I am very sympathetic to this view of things. Because this asymmetry
obviously exists, and matters. There is no getting around that.

> Worse, this is kind of true for even n_dead_tup, n_mod_since_analyze,
> etc are kind of affected by this. It's easy (at least on older
> versions, maybe Peter's work has fixed this for btree) to get severe
> index bloat because vacuum doesn't run for a long time relative to the
> size of the busy portion of a table.

My work (especially in 14) has definitely helped a great deal with
index bloat, by cleaning it up in a targeted fashion, based on
page-level considerations. This is just the only thing that can work;
we can never expect VACUUM to be able to deal with that, no matter
what. Simply because it's totally normal and expected for index bloat
to grow at an uneven rate over time.

I do still think that there is an unsolved issue here, which leads to
problems with index bloat when there isn't "B-Tree keyspace
concentration" of garbage index tuples. That problem is with the
statistics that drive VACUUM themselves; they just don't work very
well in certain cases [1]. Statistics that drive autovacuum usually
come from ANALYZE, of course. The entire intellectual justification
for database statistics doesn't really carry over to VACUUM. There are
certain "physical database" implementation details that bleed into the
way ANALYZE counts dead rows. For example, most dead tuples are
usually LP_DEAD stub line pointers (not even tuples). They're only 4
bytes, whereas live tuples are about 30 bytes at a minimum (depending
on how you count it). This leads to the ANALYZE block-based sampling
becoming confused.

This confusion seems related to the fact that ANALYZE is really a
"logical database" thing. It's slightly amazing that statistics from
ANALYZE work as well as they do for query planning, so we shouldn't be
too surprised.

Note that the TPC-C issue I describe in [1] involves a table that's a
little bit like a queue table, but with lots of non-HOT updates (lots
overall, but only one update per logical row, ever). This might tie
things to what Thomas just said about DB2 and queue tables.

> I'm imagining to really tackle this we should be doing something like
> noticing when inserts, updates, deletes are affecting key values that
> are "rare" according to the statistics and triggering autovacuum
> ANALYZE statements that use indexes to only update the statistics for
> the relevant key ranges.

I'm not sure. I tend to think that the most promising approaches all
involve some kind of execution time smarts about the statistics, and
their inherent unreliability. Somehow query execution itself should
become less gullible, at least in cases where we really can have high
confidence in the statistics being wrong at this exact time, for this
exact key space.

[1] https://postgr.es/m/CAH2-Wz=9R83wcwZcPUH4FVPeDM4znzbzMvp3rt21+XhQWMU8+g@mail.gmail.com
--
Peter Geoghegan