On Fri, 12 Feb 2021, Victor Sudakov wrote:
> Most of the stuff I've looked at is pretty old, much seems unsupported.
> What is the current best practice?
None of the estimate queries ever really worked well. They just gave a
bit more information than zero in the days before PG's internal functions
were really reliable and useful for bloat measurement, something that
happened in version 9.5 development.
For any modern PG, it's worth the trouble to learn how to directly use
pgstattuple https://www.postgresql.org/docs/current/pgstattuple.html to do
this job. You run and interpret the output from pgstattuple(relation) and
its faster estimate version pgstattuple_approx. It's not hard to run some
simulations with deleted rows to see what bloat looks like when it builds
up.
There's still some need for manual estimates if you want to account for
fillfactor in all cases, but I see that as a niche topic, not where people
should start at.
There are also wrapper scripts built on top of pgstattuple around, like
the already mentioned https://github.com/keithf4/pg_bloat_check
A good bit of the work done in that script is around handling multiple
versions of PG and building some long-term idea of bloat state on all
tables. Workloads that have a bloat problem are sometimes fixed, but in a
lot of cases the best you can do is monitor them and rebuild things when
it gets bad. That's one context Keith's packaging of this feature aims
at.
--
Greg Smith greg.smith@crunchydata.com
Director of Open Source Strategy
Crunchy Data https://www.crunchydata.com/