Home > mailing lists

Re: postgresql latency & bgwriter not doing its job - Mailing list pgsql-hackers

From	Fabien COELHO
Subject	Re: postgresql latency & bgwriter not doing its job
Date	August 27, 2014 07:32:40
Msg-id	alpine.DEB.2.10.1408261739280.8876@sto Whole thread Raw
In response to	Re: postgresql latency & bgwriter not doing its job (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: postgresql latency & bgwriter not doing its job Re: postgresql latency & bgwriter not doing its job
List	pgsql-hackers

Tree view

Hello Andres,

> [...]
> I think you're misunderstanding how spread checkpoints work.

Yep, definitely:-) On the other hand I though I was seeking something 
"simple", namely correct latency under small load, that I would expect out 
of the box.

What you describe is reasonable, and is more or less what I was hoping 
for, although I thought that bgwriter was involved from the start and 
checkpoint would only do what is needed in the end. My mistake.

> When the checkpointer process starts a spread checkpoint it first writes 
> all buffers to the kernel in a paced manner.
> That pace is determined by checkpoint_completion_target and 
> checkpoint_timeout.

This pacing does not seem to work, even at slow pace.

> If you have a stall of roughly the same magnitude (say a factor
> of two different), the smaller once a minute, the larger once an
> hour. Obviously the once-an-hour one will have a better latency in many,
> many more transactions.

I do not believe in delaying as much as possible writing do disk to handle 
a small load as a viable strategy.  However, to show my good will, I have 
tried to follow your advices: I've launched a 5000 seconds test with 50 
segments, 30 min timeout, 0.9 completion target, at 25 tps, which is less 
than 1/10 of the maximum throughput.

There are only two time-triggered checkpoints:
  LOG:  checkpoint starting: time  LOG:  checkpoint complete: wrote 48725 buffers (47.6%);      1 transaction log
file(s)added, 0 removed, 0 recycled;      write=1619.750 s, sync=27.675 s, total=1647.932 s;      sync files=14,
longest=27.593s, average=1.976 s
 
  LOG:  checkpoint starting: time  LOG:  checkpoint complete: wrote 22533 buffers (22.0%);      0 transaction log
file(s)added, 0 removed, 23 recycled;      write=826.919 s, sync=9.989 s, total=837.023 s;      sync files=8,
longest=6.742s, average=1.248 s
 

For the first one, 48725 buffers is 380MB. 1800 * 0.9 = 1620 seconds to 
complete, so it means 30 buffer writes per second... should be ok. However 
sync costs 27 seconds nevertheless, and the server was more or less 
offline for about 30 seconds flat. For the second one, 180 MB to write, 10 
seconds offline. For some reason the target time is reduced. I have also 
tried with the "deadline" IO scheduler which make more sense than the 
default "cfq", but the result was similar. Not sure how software RAID 
interacts with IO scheduling, though.

Overall result: over the 5000s test, I have lost (i.e. more than 200ms 
behind schedule) more than 2.5% of transactions (1/40). Due to the 
unfinished cycle, the long term average is probably about 3%. Although it 
is better than 10%, it is not good. I would expect/hope for something 
pretty close to 0, even with ext4 on Linux, for a dedicated host which has 
nothing else to do but handle two dozen transactions per second.

Current conclusion: I have not found any way to improve the situation to 
"good" with parameters from the configuration. Currently a small load 
results in periodic offline time, that can be delayed but not avoided. The 
delaying tactic results in less frequent but longer downtime. I prefer 
frequent very short downtime instead.

I really think that something is amiss. Maybe pg does not handle pacing as 
it should.

For the record, a 25tps bench with a "small" config (default 3 segments, 
5min timeout, 0.5 completion target) and with a parallel:
    while true ; do echo "CHECKPOINT;"; sleep 0.2s; done | psql

results in "losing" only 0.01% of transactions (12 transactions out of 
125893 where behind more than 200ms in 5000 seconds). Although you may 
think it stupid, from my point of view it shows that it is possible to 
coerce pg to behave.

With respect to the current status:

(1) the ability to put checkpoint_timeout to values smaller than 30s could 
help, although obviously there would be other consequences. But the 
ability to avoid periodic offline time looks like a desirable objective.

(2) I still think that a parameter to force bgwriter to write more stuff 
could help, but this is not tested.

(3) Any other effective idea to configure for responsiveness is welcome!

If someone wants to repeat these tests, it is easy and only takes a few 
minutes:
  sh> createdb test  sh> pgbench -i -s 100 -F 95 test  sh> pgbench -M prepared -N -R 25 -L 200 -c 2 -T 5000 -P 1 test >
pgb.out

Note: the -L to limit latency is a submitted patch. Without this, 
unresponsiveness shows as increasing laging time.

-- 
Fabien.

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 27 August 2014, 07:24:26
Subject: Re: delta relations in AFTER triggers

From: Heikki Linnakangas
Date: 27 August 2014, 07:48:14
Subject: Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING

Re: postgresql latency & bgwriter not doing its job - Mailing list pgsql-hackers

Previous

Next