Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1508230812470.29146@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: checkpointer continuous flushing
List pgsql-hackers
Hello Amit,

> I have tried your scripts and found some problem while using avg.py
> script.
> grep 'progress:' test_medium4_FW_off.out | cut -d' ' -f4 | ./avg.py
> --limit=10 --length=300
> : No such file or directory

> I didn't get chance to poke into avg.py script (the command without
> avg.py works fine). Python version on the m/c, I planned to test is
> Python 2.7.5.

Strange... What does "/usr/bin/env python" say? Can the script be started 
on its own at all? I think that the script should work both with python2 
and python3, at least it does on my laptop...

> Today while reading the first patch (checkpoint-continuous-flush-10-a),
> I have given some thought to below part of patch which I would like
> to share with you.
>
> + * Select a tablespace depending on the current overall progress.
> + *
> + * The progress ratio of each unfinished tablespace is compared to
> + * the overall progress ratio to find one with is not in advance
> + * (i.e. overall ratio > tablespace ratio,
> + *  i.e. tablespace written/to_write > overall written/to_write

> Here, I think above calculation can go for toss if backend or bgwriter
> starts writing buffers when checkpoint is in progress.  The tablespace
> written parameter won't be able to consider the one's written by backends
> or bgwriter.

Sure... This is *already* the case with the current checkpointer, the 
schedule is performed with respect to the initial number of buffers it 
think it will have to write, and if someone else writes these buffers then 
the schedule is skewed a little bit, or more... I have not changed this 
logic, but I extended it to handle several tablespaces.

If this (the checkpointer progress evaluation used for its schedule is 
sometimes wrong because of other writes) is proven to be a major 
performance issue, then the processes which writes the checkpointed 
buffers behind its back should tell the checkpointer about it, probably 
with some shared data structure, so that the checkpointer can adapt its 
schedule.

This is an independent issue, that may be worth to address some day. My 
opinion is that when the bgwriter or backends quick in to write buffers, 
they are basically generating random I/Os on HDD and killing tps and 
latency, so it is a very bad time anyway, thus I'm not sure that this is 
the next problem to address to improve pg performance and responsiveness.

> Now it may not big thing to worry but I find Heikki's version worth 
> considering, he has not changed the overall idea of this patch, but the 
> calculations are somewhat simpler and hence less chance of going wrong.

I do not think that Heikki version worked wrt to balancing writes over 
tablespaces, and I'm not sure it worked at all. However I reused some of 
his ideas to simplify and improve the code.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL for VAX on NetBSD/OpenBSD
Next
From: Fabien COELHO
Date:
Subject: Re: PATCH: numeric timestamp in log_line_prefix