Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1508090837300.6511@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Hello Heikki,

Thanks for having a look at the patch.

> * I think we should drop the "flush" part of this for now. It's not as 
> clearly beneficial as the sorting part, and adds a great deal more code 
> complexity. And it's orthogonal to the sorting patch, so we can deal with it 
> separately.

I agree that it is orthogonal and that the two features could be in 
distinct patches. The flush part is the first patch I really submitted 
because it has significant effect on latency, and I was told to mix it 
with sorting...

The flushing part really helps to keep "write stalls" under control in 
many cases, for instance:

- 400-tps 1-client (or 4 for medium) max 100-ms latency
     options   | percent of late transactions  flush | sort | tiny | small | medium    off |  off | 12.0 | 64.28 | 68.6
  off |   on | 11.3 | 22.05 | 22.6     on |  off |  1.1 | 67.93 | 67.9     on |   on |  0.6 |  3.24 |  3.1
 

The "percent of late transactions" is really the fraction of time the 
database is unreachable because of write stalls... So sort without flush 
is cleary not enough.

Another thing suggested by Andres is to fsync as early as possible, but 
this is not a simple patch because is intermix things which are currently 
in distinct parts of checkpoint processing, so I already decided that this 
would be for another submission.

> * Is it really necessary to parallelize the I/O among tablespaces? I can see 
> the point, but I wonder if it makes any difference in practice.

I think that if someone bothers with tablespace there is no reason to kill 
them behind her. Without sorting you may hope that tablespaces will be 
touched randomly enough, but once buffers are sorted you can probably find 
cases where it would write on one table space and then on the other.

So I think that it really should be kept.

> * Is there ever any harm in sorting the buffers? The GUC is useful for 
> benchmarking, but could we leave it out of the final patch?

I think that the performance show that it is basically always beneficial, 
so the guc may be left out. However on SSD it is unclear to me whether it 
is just a loss of time or whether it helps, say with wear-leveling. Maybe 
best to keep it? Anyway it is definitely needed for testing.

> * Do we need to worry about exceeding the 1 GB allocation limit in 
> AllocateCheckpointBufferIds? It's enough got 2 TB of shared_buffers. That's a 
> lot, but it's not totally crazy these days that someone might do that. At the 
> very least, we need to lower the maximum of shared_buffers so that you can't 
> hit that limit.

Yep.

> I ripped out the "flushing" part, keeping only the sorting. I refactored 
> the logic in BufferSync() a bit. There's now a separate function,
> nextCheckpointBuffer(), that returns the next buffer ID from the sorted 
> list. The tablespace-parallelization behaviour in encapsulated there,

I do not understand the new tablespace-parallelization logic: there is no 
test about the tablespace of the buffer in the selection process... Note 
that I did wrote a proof for the one I put, and also did some detailed 
testing on the side because I'm always wary of proofs, especially mines:-)

I notice that you assume that table space numbers are always small and 
contiguous. Is that a fact? I was feeling more at ease with relying on a 
hash table to avoid such an assumption.

> keeping the code in BufferSync() much simpler. See attached. Needs some 
> minor cleanup and commenting still before committing, and I haven't done 
> any testing besides a simple "make check".

Hmmm..., just another detail, the patch does not sort:
  + if (checkpoint_sort && num_to_write > 1 && false)


I'll resubmit a patch with only the sorting part, and do the kind of 
restructuring you suggest which is a good thing.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Satoshi Nagayasu
Date:
Subject: Re: Assert in pg_stat_statements
Next
From: Vladimir Koković
Date:
Subject: make check-world problem