Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | alpine.DEB.2.10.1508101942030.13396@sto Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Andres Freund <andres@anarazel.de>) |
Responses |
Re: checkpointer continuous flushing
Re: checkpointer continuous flushing |
List | pgsql-hackers |
Hello Andres, > You can't allocate 4GB with palloc(), it has a builtin limit against > allocating more than 1GB. Argh, too bad, I assumed very naively that palloc was malloc in disguise. >> [...] > Well, then everytime the checkpointer is restarted. Hm... > The point is that it's done at postmaster startup, and we're pretty much > guaranteed that the memory will availabl.e. Ok ok, I stop resisting... I'll have a look. Would it also fix the 1 GB palloc limit on the same go? I guess so... >>> That reasoning makes it impossible to move the fsyncing of files into the >>> loop (whenever we move to a new file). That's not nice. >> >> I do not see why. > > Because it means that the sorting isn't necessarily correct. I.e. we > can't rely on it to determine whether a file has already been fsynced. Ok, I understand your point. Then the file would be fsynced twice: if the fsync is done properly (data have already been flushed to disk) then it would not cost much, and doing it sometimes twice on some file would not be a big issue. The code could also detect such event and log a warning, which would give a hint about how often it occurs in practice. >>> Hm. Is that actually the case for our qsort implementation? >> >> I think that it is hard to write a qsort which would fail that. That would >> mean that it would compare the same items twice, which would be inefficient. > > What? The same two elements aren't frequently compared pairwise with > each other, but of course an individual element is frequently compared > with other elements. Sure. > Consider what happens when the chosen pivot element changes its identity > after already dividing half. The two partitions will not be divided in > any meaning full way anymore. I don't see how this will results in a > meaningful sort. It would be partly meaningful, which is enough for performance, and does not matter for correctness: currently buffers are not sorted at all and it works, even if it does not work well. >>> If the pivot element changes its identity won't the result be pretty much >>> random? >> >> That would be a very unlikely event, given the short time spent in >> qsort. > > Meh, we don't want to rely on "likeliness" on such things. My main argument is that even if it occurs, and the qsort result is partly wrong, it does not change correctness, it just mean that the actual writes will be less in order than wished. If it occurs, one pivot separation would be quite strange, but then others would be right, so the buffers would be "partly sorted". Another issue I see is that even if buffers are locked within cmp, the status may change between two cmp... I do not think that locking all buffers for sorting them is an option. So on the whole, I think that locking buffers for sorting is probably not possible with the simple (and efficient) lightweight approach used in the patch. The good news, as I argued before, is that the order is only advisory to help with performance, but the correctness is really that all checkpoint buffers are written and fsync is called in the end, and does not depend on the buffer order. That is how it currently works anyway. If you block on this then I'll put a heavy weight approach, but that would be a waste of memory in my opinion, hence my argumentation for the lightweight approach. -- Fabien.
pgsql-hackers by date: