Re: VACUUM FREEZE output more than double input - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: VACUUM FREEZE output more than double input
Date
Msg-id 47664546.EE98.0025.0@wicourts.gov
Whole thread Raw
In response to Re: VACUUM FREEZE output more than double input  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
>>> On Fri, Dec 14, 2007 at  5:59 PM, in message <15401.1197676791@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Why double writes per read, plus massive writes at checkpoint?
>
> The double writes aren't surprising: freezing has to be WAL-logged,
> and the odds are that each page hasn't been touched since the last
> checkpoint, so the WAL log will include a complete page image.
> So in the steady state where all shared buffers are dirty, the
> per-page cycle is:
>     * write out a dirty buffer so it can be reclaimed
>     * read in a page
>     * modify it to mark tuples frozen
>     * write an image of the page to WAL
>     * leave the now-dirty page in shared buffers for later writing
>
> The checkpoint spikes would come from trying to flush out all the
> dirty buffers at once.

Got it.  Thanks.

> You'd expect a bit of a valley after each peak, since the vacuum
> could presumably recycle some buffers without having to flush 'em
> first; but I don't see one in your data.  That may just be because
> the numbers are too noisy, but I kinda suspect that the vacuum is
> dirtying buffers nearly as fast as the bgwriter can clean them,
> leaving not a lot of daylight for a valley.

Yeah, the pattern was pretty consistent and without valleys.

> 8.3 should pretty well eliminate the checkpoint spike in this scenario,
> because vacuum will work in a limited number of shared buffers instead
> of dirtying the whole cache.  But you'll still see 2X writes over reads.

Testing 8.3beta4 so far has shown both smoother I/O and better
performance in all respects.  The preliminary post I did where I
thought I saw some regression on loading a pg_dump turned out to be
was an "apples to oranges" comparison; comparing the same load on
the same hardware and OS, 8.3 wins.  (Kudos to all who worked on
these improvements!)

> If this is data that you could re-generate at need, it might make sense
> to turn off full_page_writes during the initial data load and vacuum.

Thanks for the suggestions; I'll try that.

> I concur with trying to FREEZE all the data while you do this, else
> you'll see the same work done whenever the data happens to slip past
> the auto freeze threshold.

Thanks.  I thought that made sense, but I'm still trying to get my
head around some of the dynamics of PostgreSQL and MVCC.  I'll
suggest that as policy here.

-Kevin



pgsql-performance by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: viewing source code
Next
From: "Joshua D. Drake"
Date:
Subject: Re: viewing source code