Hallo Andres,
>>> [...] I do think that this whole writeback logic really does make sense
>>> *per table space*,
>>
>> Leads to less regular IO, because if your tablespaces are evenly sized
>> (somewhat common) you'll sometimes end up issuing sync_file_range's
>> shortly after each other. For latency outside checkpoints it's
>> important to control the total amount of dirty buffers, and that's
>> obviously independent of tablespaces.
>
> I do not understand/buy this argument.
>
> The underlying IO queue is per device, and table spaces should be per device
> as well (otherwise what the point?), so you should want to coalesce and
> "writeback" pages per device as wel. Calling sync_file_range on distinct
> devices should probably be issued more or less randomly, and should not
> interfere one with the other.
>
> If you use just one context, the more table spaces the less performance
> gains, because there is less and less aggregation thus sequential writes per
> device.
>
> So for me there should really be one context per tablespace. That would
> suggest a hashtable or some other structure to keep and retrieve them, which
> would not be that bad, and I think that it is what is needed.
Note: I think that an easy way to do that in the "checkpoint sort" patch
is simply to keep a WritebackContext in CkptTsStatus structure which is
per table space in the checkpointer.
For bgwriter & backends it can wait, there is few "writeback" coalescing
because IO should be pretty random, so it does not matter much.
--
Fabien.