On Tue, Mar 13, 2012 at 4:53 PM, Josh Berkus <josh@agliodbs.com> wrote:
> All,
>
> I've discovered a built-in performance issue with replication failover
> at one site, which I couldn't find searching the archives. I don't
> really see what we can do to fix it, so I'm posting it here in case
> others might have clever ideas.
>
> 1. The Free Space Map is not replicated between servers.
>
> 2. Thus, when we fail over to a replica, it starts with a blank FSM.
>
> 3. I believe replica also starts with zero counters for autovacuum.
>
> 4. On a high-UPDATE workload, this means that the replica assumes tables
> have no free space until it starts to build a new FSM or autovacuum
> kicks in on some of the tables, much later on.
>
> 5. If your hosting is such that you fail over a lot (such as on AWS),
> then this causes cumulative table bloat which can only be cured by a
> VACUUM FULL.
>
> I can't see any way around this which wouldn't also bog down
> replication. Clever ideas, anyone?
Would it bog it down by "much"?
(1 byte per 8kb) * 2TB = 250MB. Even if you doubled or tripled it for
pointer-overhead reasons it's pretty menial, whereas VACUUM traffic is
already pretty intense. Still, it's clearly...work.
--
fdr