On 07.08.2020 00:33, Tomas Vondra wrote:
>
> Unfortunately Konstantin did not share any details about what workloads
> he tested, what config etc. But I find the "no regression" hypothesis
> rather hard to believe, because we're adding non-trivial amount of code
> to a place that can be quite hot.
Sorry, that I have not explained my test scenarios.
As far as Postgres is pgbench-oriented database:) I have also used pgbench:
read-only case and sip-some updates.
For this patch most critical is number of buffer allocations,
so I used small enough database (scale=100), but shared buffer was set
to 1Gb.
As a result, all data is cached in memory (in file system cache), but
there is intensive swapping at Postgres buffer manager level.
I have tested it both with relatively small (100) and large (1000)
number of clients.
I repeated this tests at my notebook (quadcore, 16Gb RAM, SSD) and IBM
Power2 server with about 380 virtual cores and about 1Tb of memory.
I the last case results are vary very much I think because of NUMA
architecture) but I failed to find some noticeable regression of patched
version.
But I have to agree that adding parallel hash (in addition to existed
buffer manager hash) is not so good idea.
This cache really quite frequently becomes bottleneck.
My explanation of why I have not observed some noticeable regression was
that this patch uses almost the same lock partitioning schema
as already used it adds not so much new conflicts. May be in case of
POwer2 server, overhead of NUMA is much higher than other factors
(although shared hash is one of the main thing suffering from NUMA
architecture).
But in principle I agree that having two independent caches may decrease
speed up to two times (or even more).
I hope that everybody will agree that this problem is really critical.
It is certainly not the most common case when there are hundreds of
relation which are frequently truncated. But having quadratic complexity
in drop function is not acceptable from my point of view.
And it is not only recovery-specific problem, this is why solution with
local cache is not enough.
I do not know good solution of the problem. Just some thoughts.
- We can somehow combine locking used for main buffer manager cache (by
relid/blockno) and cache for relid. It will eliminates double locking
overhead.
- We can use something like sorted tree (like std::map) instead of hash
- it will allow to locate blocks both by relid/blockno and by relid only.