Tom Lane wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
> > When VACUUM tries to truncate the trailing empty pages, it scans shared_buffers
> > to invalidate the pages-to-truncate during holding an AccessExclusive lock on
> > the relation. So if shared_buffers is huge, other transactions need to wait for
> > a very long time before accessing to the relation. Which would cause the
> > response-time spikes, for example, I observed such spikes several times on
> > the server with shared_buffers = 300GB while running the benchmark.
> > Therefore, I'm thinking to propose $SUBJECT and enable it to avoid such spikes
> > for that relation.
>
> I think that the real problem here is having to do a scan of all of shared
> buffers. VACUUM's not the only thing that has to do that, there's also
> e.g. DROP and TRUNCATE. So rather than a klugy solution that only fixes
> VACUUM (and not very well, requiring user intervention and an unpleasant
> tradeoff), we ought to look at ways to avoid needing a whole-pool scan to
> find the pages belonging to one relation. In the past we've been able to
> skate by without a decent solution for that because shared buffers were
> customarily not all that big. But if we're going to start considering
> huge buffer pools to be a case we want to have good performance for,
> that's got to change.
Andres was working on a radix tree structure to fix this problem, but
that seems to be abandoned now, and it seems a major undertaking. While
I agree that the proposed solution is a wart, it seems much better than
no solution at all. Can we consider Fujii's proposal as a temporary
measure until we fix shared buffers? I'm +1 on it myself.
We've seen this problem also affecting a production workload pretty
severely, though shared_buffers is not as big.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services