Robert Haas <robertmhaas@gmail.com> wrote:
> This particular example shows the above chunk of code taking >13s
> to execute. Within 3s, every other backend piles up behind that,
> leading to the database getting no work at all done for a good ten
> seconds.
>
> My guess is that what's happening here is that one backend needs
> to read a page into CLOG, so it calls SlruSelectLRUPage to evict
> the oldest SLRU page, which is dirty. For some reason, that I/O
> takes a long time. Then, one by one, other backends comes along
> and also need to read various SLRU pages, but the oldest SLRU page
> hasn't changed, so SlruSelectLRUPage keeps returning the exact
> same page that it returned before, and everybody queues up waiting
> for that I/O, even though there might be other buffers available
> that aren't even dirty.
>
> I am thinking that SlruSelectLRUPage() should probably do
> SlruRecentlyUsed() on the selected buffer before calling
> SlruInternalWritePage, so that the next backend that comes along
> looking for a buffer doesn't pick the same one.
That, or something else which prevents this the same page from being
targeted by all processes, sounds like a good idea.
> Possibly we should go further and try to avoid replacing dirty
> buffers in the first place, but sometimes there may be no choice,
> so doing SlruRecentlyUsed() is still a good idea.
I can't help thinking that the "background hinter" I had ideas about
writing would prevent many of the reads of old CLOG pages, taking a
lot of pressure off of this area. It just occurred to me that the
difference between that idea and having an autovacuum thread which
just did first-pass work on dirty heap pages is slim to none. I
know how much time good benchmarking can take, so I hesitate to
suggest another permutation, but it might be interesting to see what
it does to the throughput if autovacuum is configured to what would
otherwise be considered insanely aggressive values (just for vacuum,
not analyze). To give this a fair shot, the whole database would
need to be vacuumed between initial load and the start of the
benchmark.
-Kevin