Re: VACUUM FREEZE output more than double input - Mailing list pgsql-performance

From Tom Lane
Subject Re: VACUUM FREEZE output more than double input
Date
Msg-id 15401.1197676791@sss.pgh.pa.us
Whole thread Raw
In response to VACUUM FREEZE output more than double input  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: VACUUM FREEZE output more than double input
List pgsql-performance
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Why double writes per read, plus massive writes at checkpoint?

The double writes aren't surprising: freezing has to be WAL-logged,
and the odds are that each page hasn't been touched since the last
checkpoint, so the WAL log will include a complete page image.
So in the steady state where all shared buffers are dirty, the
per-page cycle is:
    * write out a dirty buffer so it can be reclaimed
    * read in a page
    * modify it to mark tuples frozen
    * write an image of the page to WAL
    * leave the now-dirty page in shared buffers for later writing

The checkpoint spikes would come from trying to flush out all the
dirty buffers at once.

You'd expect a bit of a valley after each peak, since the vacuum
could presumably recycle some buffers without having to flush 'em
first; but I don't see one in your data.  That may just be because
the numbers are too noisy, but I kinda suspect that the vacuum is
dirtying buffers nearly as fast as the bgwriter can clean them,
leaving not a lot of daylight for a valley.

8.3 should pretty well eliminate the checkpoint spike in this scenario,
because vacuum will work in a limited number of shared buffers instead
of dirtying the whole cache.  But you'll still see 2X writes over reads.

If this is data that you could re-generate at need, it might make sense
to turn off full_page_writes during the initial data load and vacuum.

I concur with trying to FREEZE all the data while you do this, else
you'll see the same work done whenever the data happens to slip past
the auto freeze threshold.

            regards, tom lane

pgsql-performance by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: viewing source code
Next
From: Tom Lane
Date:
Subject: Re: explanation for seeks in VACUUM