Thread: Re: [PATCHES] Load distributed checkpoint patch

Re: [PATCHES] Load distributed checkpoint patch

From
ITAGAKI Takahiro
Date:
Bruce Momjian <bruce@momjian.us> wrote:

> OK, if I understand correctly, instead of doing a buffer scan, write(),
> and fsync(), and recyle the WAL files at checkpoint time, you delay the
> scan/write part with the some delay.

Exactly. Actual behavior of checkpoint is not changed by the patch. Compared
with existing checkpoints, it just takes longer time in scan/write part.

> Do you use the same delay autovacuum uses?

What do you mean 'the same delay'? Autovacuum does VACUUM, not CHECKPOINT.
If you think cost-based-delay, I think we cannot use it here. It's hard to
estimate how much checkpoints delay by cost-based sleeping, but we should
finish asynchronous checkpoints by the start of next checkpoint. So I gave
priority to punctuality over load smoothing.

> As I remember, often the checkpoint is caused because
> we are using the last WAL file.  Doesn't this delay the creation of new
> WAL files by renaming the old ones to higher numbers (we can't rename
> them until the checkpoint is complete)?

Checkpoints should be done by the next one, so we need WAL files for two
checkpoints. It is the same as now.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: [PATCHES] Load distributed checkpoint patch

From
"Takayuki Tsunakawa"
Date:
From: "ITAGAKI Takahiro" <itagaki.takahiro@oss.ntt.co.jp>
> Bruce Momjian <bruce@momjian.us> wrote:
>> Do you use the same delay autovacuum uses?
>
> What do you mean 'the same delay'? Autovacuum does VACUUM, not
CHECKPOINT.
> If you think cost-based-delay, I think we cannot use it here. It's
hard to
> estimate how much checkpoints delay by cost-based sleeping, but we
should
> finish asynchronous checkpoints by the start of next checkpoint. So
I gave
> priority to punctuality over load smoothing.

I consider that smoothing the load (more meaningfully, response time)
has higher priority over checkpoint punctuality in a practical sense,
because the users of a system benefit from good steady response and
give good reputation to the system.  If the checkpoint processing is
not punctual, crash recovery would take longer time.  But which would
you give higher priority, the unlikely event (=crash of the system) or
likely event (=peek hours of the system)?  I believe the latter should
be regarded.  The system can write dirty buffers after the peek hours
pass.  User experience should be taken much case of.




Re: [PATCHES] Load distributed checkpoint patch

From
Bruce Momjian
Date:
ITAGAKI Takahiro wrote:
> Bruce Momjian <bruce@momjian.us> wrote:
> 
> > OK, if I understand correctly, instead of doing a buffer scan, write(),
> > and fsync(), and recyle the WAL files at checkpoint time, you delay the
> > scan/write part with the some delay.
> 
> Exactly. Actual behavior of checkpoint is not changed by the patch. Compared
> with existing checkpoints, it just takes longer time in scan/write part.
> 
> > Do you use the same delay autovacuum uses?

Sorry, I meant bgwriter delay, not autovauum.

> What do you mean 'the same delay'? Autovacuum does VACUUM, not CHECKPOINT.
> If you think cost-based-delay, I think we cannot use it here. It's hard to
> estimate how much checkpoints delay by cost-based sleeping, but we should
> finish asynchronous checkpoints by the start of next checkpoint. So I gave
> priority to punctuality over load smoothing.

OK.

> > As I remember, often the checkpoint is caused because
> > we are using the last WAL file.  Doesn't this delay the creation of new
> > WAL files by renaming the old ones to higher numbers (we can't rename
> > them until the checkpoint is complete)?
> 
> Checkpoints should be done by the next one, so we need WAL files for two
> checkpoints. It is the same as now.

Ah, OK, so we already reserve a full set of WAL files while we are
waiting for the checkpoint to complete.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Load distributed checkpoint patch

From
"Kevin Grittner"
Date:
>>> On Wed, Dec 20, 2006 at  6:05 AM, in message
<03be01c7242f$2b4ce130$19527c0a@OPERAO>, "Takayuki Tsunakawa"
<tsunakawa.takay@jp.fujitsu.com> wrote: 
> 
> I consider that smoothing the load (more meaningfully, response
time)
> has higher priority over checkpoint punctuality in a practical
sense,
> because the users of a system benefit from good steady response and
> give good reputation to the system.
I agree with that.
> If the checkpoint processing is
> not punctual, crash recovery would take longer time.  But which
would
> you give higher priority, the unlikely event (=crash of the system)
or
> likely event (=peek hours of the system)?  I believe the latter
should
> be regarded.
I'm still with you here.
> The system can write dirty buffers after the peek hours
> pass.
I don't see that in our busiest environment.
We have 3,000 "directly connected" users, various business partner
interfaces, and public web entry doing OLTP in 72 databases distributed
around the state, with real-time replication to central databases which
are considered derived copies.  If all the pages modified on the central
databases were held in buffers or cache until after peak hours, query
performance would suffer -- assuming it would all even fit in cache.  We
must have a way for dirty pages to be written under load while
responding to hundreds of thousands of queries per hour without
disturbing "freezes" during checkpoints.
On top of that, we monitor database requests on the source machines,
and during "idle time" we synchronize the data with all of the targets
to identify, log, and correct "drift".  So even if we could shift all
our disk writes to the end of the day, that would have its own down
side, in extending our synchronization cycle.
I raise this only to be sure that such environments are considered with
these changes, not to discourage improvements in the checkpoint
techniques.  We have effectively eliminated checkpoint problems in our
environment with a combination of battery backed controller cache and
aggressive background writer configuration.  When you have a patch which
seems to help those who still have problems, I'll try to get time
approved to run a transaction replication stream onto one of our servers
(in "catch up mode") while we do a web "stress test" by playing back
requests from our production log.  That should indicate how the patch
will affect us.
-Kevin



Re: [PATCHES] Load distributed checkpoint patch

From
ITAGAKI Takahiro
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> > I consider that smoothing the load (more meaningfully, response time)
> > has higher priority over checkpoint punctuality in a practical sense,
>  
> I agree with that.

I agree with checkpoint_time is not so important, but we should
respect checkpoint_segements, or else new WAL files would be
created unboundedly, as Bruce pointed.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: [PATCHES] Load distributed checkpoint patch

From
"Takayuki Tsunakawa"
Date:
Hello, Mr. Grittner,

From: "Kevin Grittner" <Kevin.Grittner@wicourts.gov>
> We have 3,000 "directly connected" users, various business partner
> interfaces, and public web entry doing OLTP in 72 databases
distributed
> around the state, with real-time replication to central databases
which
> are considered derived copies.

What a big system you have.

>   If all the pages modified on the central
> databases were held in buffers or cache until after peak hours,
query
> performance would suffer -- assuming it would all even fit in cache.
We
> must have a way for dirty pages to be written under load while
> responding to hundreds of thousands of queries per hour without
> disturbing "freezes" during checkpoints.

I agree with you.  My words were not good.  I consider it is necessary
to always advance checkpoints even under heavy load, caring OLTP
transactions.

> I raise this only to be sure that such environments are considered
with
> these changes, not to discourage improvements in the checkpoint
> techniques.  We have effectively eliminated checkpoint problems in
our
> environment with a combination of battery backed controller cache
and
> aggressive background writer configuration.  When you have a patch
which
> seems to help those who still have problems, I'll try to get time
> approved to run a transaction replication stream onto one of our
servers
> (in "catch up mode") while we do a web "stress test" by playing back
> requests from our production log.  That should indicate how the
patch
> will affect us.

Thank you very much for your kind offer.