Re: index bloat estimation - Mailing list pgsql-admin

From Keith Fiske
Subject Re: index bloat estimation
Date
Msg-id CAODZiv69O=pTg7RMew=9Q1d6xKaVxriSSHZ661tx-YKhbuvmQw@mail.gmail.com
Whole thread Raw
In response to Re: index bloat estimation  (Victor Sudakov <vas@sibptus.ru>)
List pgsql-admin


On Tue, Feb 16, 2021 at 9:27 PM Victor Sudakov <vas@sibptus.ru> wrote:
Keith Fiske wrote:

[dd]
>
> You cannot run the actual scan on the replica, no. But it can be set to run
> against the replica and just report the statistics so you can have a cron
> set up to always run on the given system in case of failover. The
> "--recovery_mode_norun" can be set so it will only run if the target system
> is actually a primary.
>
> I could possibly see about letting this actually run against the replica,
> however this can be a rather long running transaction depending on the size
> of the tables involved. You can set the "--commit_rate" to avoid some of
> that, but if you have really large tables, it can still run quite long. So
> this sort of check is really best run against the primary to avoid issues
> around having to allow long running queries on the replica (delayed
> replication or even worse bloat buildup).

If you have a dedicated replica for OLAP, even one running from a WAL
archive (not from a replication slot), this is not an issue.

So running pg_bloat_check against a replica would be very useful for
some of us.

--
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
2:5005/49@fidonet http://vas.tomsk.ru/



Actually this could still be an issue, but all really depends on the size of the tables involved. Long running transactions on the replica have the potential to either delay replication entirely (max_standby_archive_delay, max_standby_streaming_delay) or cause more bloat than normal on the primary (hot_standby_feedback). The latter is more often used to avoid the replication delay, but I have frequently seen people push long running transactions that seem to be a "problem" onto the replicas and don't realize that it doesn't solve all the problems of actually running those queries on the primary. You still have a transaction causing autovacuum to not be able to run efficiently.

The approximate/quick mode of pgstattuple could certainly help with this problem, but as the issue you opened up on the github repo pointed out, that skips over scanning toast tables (https://github.com/keithf4/pg_bloat_check/issues/22) and also does not work against indexes which are more often the problem with bloat and query performance. I have seen significant bloat forming in the toast tables (hundreds of GB) when the regular table only reports very minimal bloat. So I don't recommend relying completely on the approximate check.

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

pgsql-admin by date:

Previous
From: Victor Sudakov
Date:
Subject: Re: index bloat estimation
Next
From: Rakesh T
Date:
Subject: Aurora PostgreSQL Support