Re: [PERFORM] DELETE vs TRUNCATE explanation - Mailing list pgsql-hackers

From Greg Smith
Subject Re: [PERFORM] DELETE vs TRUNCATE explanation
Date
Msg-id 50062BEE.80700@2ndQuadrant.com
Whole thread Raw
In response to Re: [PERFORM] DELETE vs TRUNCATE explanation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 07/16/2012 02:39 PM, Robert Haas wrote:
> Unfortunately, there are lots of important operations (like bulk
> loading, SELECT * FROM bigtable, and VACUUM notverybigtable) that
> inevitably end up writing out their own dirty buffers.  And even when
> the background writer does write something, it's not always clear that
> this is a positive thing.  Here's Greg Smith commenting on the
> more-is-worse phenonmenon:
>
> http://archives.postgresql.org/pgsql-hackers/2012-02/msg00564.php

You can add "crash recovery" to the list of things where the interaction 
with the OS write cache matters a lot too, something I just took a 
beating and learned from recently.  Since the recovery process is 
essentially one giant unified backend, how effectively the background 
writer and/or checkpointer move writes from recovery to themselves is 
really important.  It's a bit easier to characterize than a complicated 
mixed set of clients, which has given me a couple of ideas to chase down.

What I've been doing for much of the last month (instead of my original 
plan of reviewing patches) is moving toward the bottom of characterizing 
that under high pressure.  It provides an even easier way to compare 
multiple write strategies at the OS level than regular pgbench-like 
benchmarks.  Recovery playback with a different tuning becomes as simple 
as rolling back to a simple base backup and replaying all the WAL, 
possibly including some number of bulk operations that showed up.  You 
can measure that speed instead of transaction-level throughput.  I'm 
seeing the same ~100% difference in performance between various Linux 
tunings on recovery as I was getting on VACUUM tests, and it's a whole 
lot easier to setup and (ahem) replicate the results.  I'm putting 
together a playback time benchmark based on this observation.

The fact that I have servers all over the place now with >64GB worth of 
RAM has turned the topic of how much dirty memory should be used for 
write caching into a hot item for me again in general too.  If I live 
through 9.3 development, I expect to have a lot more ideas about how to 
deal with this whole area play out in the upcoming months.  I could 
really use a cool day to sit outside thinking about it right now.

> Jeff Janes and I came up with what I believe to be a plausible
> explanation for the problem:
>
> http://archives.postgresql.org/pgsql-hackers/2012-03/msg00356.php
>
> I kinda think we ought to be looking at fixing that for 9.2, and
> perhaps even back-patching further, but nobody else seemed terribly
> excited about it.

FYI, I never rejected any of that thinking, I just haven't chewed on 
what you two were proposing.  If that's still something you think should 
be revisited for 9.2, I'll take a longer look at it.  My feeling on this 
so far has really been that the write blocking issues are much larger 
than the exact logic used by the background writer during the code you 
were highlighting, which I always saw as more active/important during 
idle periods.  This whole area needs to get a complete overhaul during 
9.3 though, especially since there are plenty of people who want to fit 
checksum writes into that path too.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: "xu2002261"
Date:
Subject: Re: During Xlog replaying, is there maybe emitted xlog?
Next
From: Greg Smith
Date:
Subject: Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)