On Wed, Aug 25, 2021 at 10:58 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Makes sense.
I'm glad that the big picture stuff makes sense to you.
> I think one of the big implementation challenges here is
> coping with the scenario where there's not enough shared memory
> available ... or else somehow making that impossible without reserving
> an unreasonable amount of shared memory.
Yes, it'll definitely be necessary to nail that down.
> If you allowed space for
> every buffer to belong to a different relation and have the maximum
> number of leases and whatever, you'd probably have no possibility of
> OOM, but you'd probably be pre-reserving too much memory.
I hope that we can control the shared memory space overhead by making
it a function of max_connections, plus some configurable number of
relations that get modified within a single transaction. This approach
must behave in the same way when when the number of tables that each
transaction actually modifies is high -- perhaps a transaction that
does this then pays a penalty in WAL logging within the FSM. I think
that that can be made manageable, especially if we can pretty much
impose the cost directly on those transactions that need to modify
lots of relations all at once. (If we can reuse the shared memory over
time it'll help too.)
> I also think
> there are some implementation challenges around locking.
That seems likely.
> You probably
> need some, because the data structure is shared, but because it's
> complex, it's not easy to create locking that allows for good
> concurrency. Or so I think.
My hope is that this design more than makes up for it by relieving
contention in other areas. Like buffer lock contention, or relation
extension lock contention.
> Andres has been working -- I think for years now -- on replacing the
> buffer mapping table with a radix tree of some kind. That strikes me
> as very similar to what you're doing here. The per-relation data can
> then include not only the kind of stuff you're talking about but very
> fundamental things like how long it is and where its buffers are in
> the buffer pool. Hopefully we don't end up with dueling patches.
I agree that there is definitely some overlap. I see no risk of a real
conflict, though. I have mostly been approaching this project as an
effort to fix the locality problems, mostly by looking for fixes to
the BenchmarkSQL workload's problems. I have to admit that the big
picture stuff about exploiting transactional semantics with free space
management is still pretty aspirational. The resource management parts
of my prototype patch are by far the kludgiest parts.
I hope that I can benefit from whatever work Andres has already done
on this, particularly when it comes to managing per-relation metadata
in shared memory.
--
Peter Geoghegan