Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1509061516560.3687@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Andres Freund <andres@anarazel.de>)
Responses Re: checkpointer continuous flushing
List pgsql-hackers
Hello Andres,

> Here's a bunch of comments on this (hopefully the latest?)

Who knows?! :-)

> version of the patch:
>
> * I'm not sure I like the FileWrite & FlushBuffer API changes. Do you
>  forsee other callsites needing similar logic?

I foresee that the bgwriter should also do something more sensible than 
generating random I/Os over HDDs, and this is also true for workers... But 
this is for another time, maybe.

> Wouldn't it be just as easy to put this logic into the checkpointing 
> code?

Not sure it would simplify anything, because the checkpointer currently 
knows about buffers but flushing is about files, which are hidden from 
view.

Doing it with this API change means that the code does not have to compute 
twice in which file is a buffer: The buffer/file boundary has to be broken 
somewhere anyway so that flushing can be done when needed, and the 
solution I took seems the simplest way to do it, without having to make 
the checkpointer too much file concious.

> * We don't do one-line ifs;

Ok, I'll return them.

> function parameters are always in the same line as the function name

Ok, I'll try to improve.

> * Wouldn't a binary heap over the tablespaces + progress be nicer?

I'm not sure where it would fit exactly.

Anyway, I think it would complicate the code significantly (compared to 
the straightforward array), so I would not do anything like that without a 
strong intensive, such as an actual failing case.

Moreover such a data structure would probably require some kind of pointer 
(probably 8 bytes added per node, maybe more), and the amount of memory is 
already a concern, at least to me, and moreover it has to reside in shared 
memory which does not simplify allocation of tree data structures.

> If you make the sorting criterion include the tablespace id you wouldn't 
> need the lookahead loop in NextBufferToWrite().

Yep, I thought of it. It would mean 4 more bytes per buffer, and bsearch 
to find the boundaries, so significantly less simple code. I think that 
the current approach is ok as the number of tablespace should be small.

It may be improved upon later if there is a motivation to do so.

> Isn't the current approach O(NBuffers^2) in the worst case?

ISTM that the overall lookahead complexity is Nbuffers * Ntablespace: 
buffers are scanned once for each tablespace. I assume that the number of 
tablespace is kept low, and having a simpler code which use less memory 
seems a good idea.

ISTM that using a tablespace in the sorting would reduce the complexity 
to ln(NBuffers) * Ntablespace for finding the boundaries, and then 
Nbuffers * (Ntablespace/Ntablespace) = NBuffers for scanning, at the 
expense of more memory and code complexity.

So this is a voluntary design decision.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Allow a per-tablespace effective_io_concurrency setting
Next
From: Fabien COELHO
Date:
Subject: Re: checkpointer continuous flushing