Performance lossage in checkpoint dumping - Mailing list pgsql-hackers

From Tom Lane
Subject Performance lossage in checkpoint dumping
Date
Msg-id 23621.982377108@sss.pgh.pa.us
Whole thread Raw
Responses Re: Performance lossage in checkpoint dumping  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
While poking at Peter Schmidt's comments about pgbench showing worse
performance than for 7.0 (using -F in both cases), I noticed that given
enough buffer space, FileWrite never seemed to get called at all.  A
little bit of sleuthing revealed the following:

1. Under WAL, we don't write dirty buffers out of the shared memory at
every transaction commit.  Instead, as long as a dirty buffer's slot
isn't needed for something else, it just sits there until the next
checkpoint or shutdown.  CreateCheckpoint calls FlushBufferPool which
writes out all the dirty buffers in one go.  This is a Good Thing; it
lets us consolidate multiple updates of a single datafile page by
successive transactions into one disk write.  We need this to buy back
some of the extra I/O required to write the WAL logfile.

2. However, this means that a lot of the dirty-buffer writes get done by
the periodic checkpoint process, not by the backends that originally
dirtied the buffers.  And that means that every last one gets done by
blind write, because the checkpoint process isn't going to have opened
any relation cache entries --- maybe a couple of system catalog
relations, but for sure it won't have any for user relations.  If you
look at BufferSync, any page that the current process doesn't have an
already-open relcache entry for is sent to smgrblindwrt not smgrwrite.

3. Blind write is gratuitously inefficient: it does separate open,
seek, write, close kernel calls for every request.  This was the right
thing in 7.0.*, because backends relatively seldom did blind writes and
even less often needed to blindwrite multiple pages of a single relation
in succession.  But the typical usage has changed a lot.


I am thinking it'd be a good idea if blind write went through fd.c and
thus was able to re-use open file descriptors, just like normal writes.
This should improve the efficiency of dumping dirty buffers during
checkpoint by a noticeable amount.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: beta5 ...
Next
From: Bruce Momjian
Date:
Subject: Re: Performance lossage in checkpoint dumping