Hello, Itagaki-san
> This is a proposal for load distributed checkpoint.
> We offen encounters performance gap during checkpoint. The reason is
write
> bursts. Storage devices are too overworked in checkpoint, so they
can not
> supply usual transaction processing.
Good! You are focusing on a very important problem. System designers
don't like unsteady performance -- sudden slowdown. Commercial
database systems have made efforts to provide steady performance. I've
seen somewhere the report that Oracle provides stable throughput even
during checkpoint. I wonder how it is implemented.
> I'm working about adjusting the progress of checkpoint to checkpoint
timeout
> and wal segments limitation automatically to avoid overlap of two
checkpoints.
Have you already tried your patch? What's your first impression about
the improvement? I'm very interested. On my machine, pgbench shows 210
tps at first, however, it drops to 70 tps during checkpoint.
> Checkpoint consists of the following four steps, and the major
performance
> problem is 2nd step. All dirty buffers are written without interval
in it.
> 1. Query information (REDO pointer, next XID etc.)
> 2. Write dirty pages in buffer pool
> 3. Flush all modified files
> 4. Update control file
Hmm. Isn't it possible that step 3 affects the performance greatly?
I'm sorry if you have already identified step 2 as disturbing
backends.
As you know, PostgreSQL does not transfer the data to disk when
write()ing. Actual transfer occurs when fsync()ing at checkpoints,
unless the filesystem cache runs short. So, disk is overworked at
fsync()s.
What processing of bgwriter (shared locking of buffers, write(),
fsync(), flushing of log for WAL, etc.) do you consider (or did you
detect) as disturbing what processing of backends (exclusive locking
of buffers, putting log records onto WAL buffers, flushing log at
commit, writing dirty buffers when shared buffers run short)?