Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | alpine.DEB.2.10.1508090837300.6511@sto Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Heikki Linnakangas <hlinnaka@iki.fi>) |
List | pgsql-hackers |
Hello Heikki, Thanks for having a look at the patch. > * I think we should drop the "flush" part of this for now. It's not as > clearly beneficial as the sorting part, and adds a great deal more code > complexity. And it's orthogonal to the sorting patch, so we can deal with it > separately. I agree that it is orthogonal and that the two features could be in distinct patches. The flush part is the first patch I really submitted because it has significant effect on latency, and I was told to mix it with sorting... The flushing part really helps to keep "write stalls" under control in many cases, for instance: - 400-tps 1-client (or 4 for medium) max 100-ms latency options | percent of late transactions flush | sort | tiny | small | medium off | off | 12.0 | 64.28 | 68.6 off | on | 11.3 | 22.05 | 22.6 on | off | 1.1 | 67.93 | 67.9 on | on | 0.6 | 3.24 | 3.1 The "percent of late transactions" is really the fraction of time the database is unreachable because of write stalls... So sort without flush is cleary not enough. Another thing suggested by Andres is to fsync as early as possible, but this is not a simple patch because is intermix things which are currently in distinct parts of checkpoint processing, so I already decided that this would be for another submission. > * Is it really necessary to parallelize the I/O among tablespaces? I can see > the point, but I wonder if it makes any difference in practice. I think that if someone bothers with tablespace there is no reason to kill them behind her. Without sorting you may hope that tablespaces will be touched randomly enough, but once buffers are sorted you can probably find cases where it would write on one table space and then on the other. So I think that it really should be kept. > * Is there ever any harm in sorting the buffers? The GUC is useful for > benchmarking, but could we leave it out of the final patch? I think that the performance show that it is basically always beneficial, so the guc may be left out. However on SSD it is unclear to me whether it is just a loss of time or whether it helps, say with wear-leveling. Maybe best to keep it? Anyway it is definitely needed for testing. > * Do we need to worry about exceeding the 1 GB allocation limit in > AllocateCheckpointBufferIds? It's enough got 2 TB of shared_buffers. That's a > lot, but it's not totally crazy these days that someone might do that. At the > very least, we need to lower the maximum of shared_buffers so that you can't > hit that limit. Yep. > I ripped out the "flushing" part, keeping only the sorting. I refactored > the logic in BufferSync() a bit. There's now a separate function, > nextCheckpointBuffer(), that returns the next buffer ID from the sorted > list. The tablespace-parallelization behaviour in encapsulated there, I do not understand the new tablespace-parallelization logic: there is no test about the tablespace of the buffer in the selection process... Note that I did wrote a proof for the one I put, and also did some detailed testing on the side because I'm always wary of proofs, especially mines:-) I notice that you assume that table space numbers are always small and contiguous. Is that a fact? I was feeling more at ease with relying on a hash table to avoid such an assumption. > keeping the code in BufferSync() much simpler. See attached. Needs some > minor cleanup and commenting still before committing, and I haven't done > any testing besides a simple "make check". Hmmm..., just another detail, the patch does not sort: + if (checkpoint_sort && num_to_write > 1 && false) I'll resubmit a patch with only the sorting part, and do the kind of restructuring you suggest which is a good thing. -- Fabien.
pgsql-hackers by date: