Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | alpine.DEB.2.10.1509061516560.3687@sto Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Andres Freund <andres@anarazel.de>) |
Responses |
Re: checkpointer continuous flushing
|
List | pgsql-hackers |
Hello Andres, > Here's a bunch of comments on this (hopefully the latest?) Who knows?! :-) > version of the patch: > > * I'm not sure I like the FileWrite & FlushBuffer API changes. Do you > forsee other callsites needing similar logic? I foresee that the bgwriter should also do something more sensible than generating random I/Os over HDDs, and this is also true for workers... But this is for another time, maybe. > Wouldn't it be just as easy to put this logic into the checkpointing > code? Not sure it would simplify anything, because the checkpointer currently knows about buffers but flushing is about files, which are hidden from view. Doing it with this API change means that the code does not have to compute twice in which file is a buffer: The buffer/file boundary has to be broken somewhere anyway so that flushing can be done when needed, and the solution I took seems the simplest way to do it, without having to make the checkpointer too much file concious. > * We don't do one-line ifs; Ok, I'll return them. > function parameters are always in the same line as the function name Ok, I'll try to improve. > * Wouldn't a binary heap over the tablespaces + progress be nicer? I'm not sure where it would fit exactly. Anyway, I think it would complicate the code significantly (compared to the straightforward array), so I would not do anything like that without a strong intensive, such as an actual failing case. Moreover such a data structure would probably require some kind of pointer (probably 8 bytes added per node, maybe more), and the amount of memory is already a concern, at least to me, and moreover it has to reside in shared memory which does not simplify allocation of tree data structures. > If you make the sorting criterion include the tablespace id you wouldn't > need the lookahead loop in NextBufferToWrite(). Yep, I thought of it. It would mean 4 more bytes per buffer, and bsearch to find the boundaries, so significantly less simple code. I think that the current approach is ok as the number of tablespace should be small. It may be improved upon later if there is a motivation to do so. > Isn't the current approach O(NBuffers^2) in the worst case? ISTM that the overall lookahead complexity is Nbuffers * Ntablespace: buffers are scanned once for each tablespace. I assume that the number of tablespace is kept low, and having a simpler code which use less memory seems a good idea. ISTM that using a tablespace in the sorting would reduce the complexity to ln(NBuffers) * Ntablespace for finding the boundaries, and then Nbuffers * (Ntablespace/Ntablespace) = NBuffers for scanning, at the expense of more memory and code complexity. So this is a voluntary design decision. -- Fabien.
pgsql-hackers by date: