Re: The Free Space Map: Problems and Opportunities - Mailing list pgsql-hackers

From Robert Haas
Subject Re: The Free Space Map: Problems and Opportunities
Date
Msg-id CA+TgmoYP=fpXXozj+LUM4VMxgF3VKjTAc=W80vwBPfjJinzboQ@mail.gmail.com
Whole thread Raw
In response to Re: The Free Space Map: Problems and Opportunities  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: The Free Space Map: Problems and Opportunities
Re: The Free Space Map: Problems and Opportunities
List pgsql-hackers
On Mon, Aug 23, 2021 at 5:55 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Right now my prototype has a centralized table in shared memory, with
> a hash table. One entry per relation, generally multiple freelists per
> relation. And with per-freelist metadata such as owner and original
> leader backend XID values. Plus of course the lists of free blocks
> themselves. The prototype already clearly fixes the worst problems
> with BenchmarkSQL, but that's only one of my goals. That's just the
> starting point.
>
> I appreciate your input on this. And not just on the implementation
> details -- the actual requirements themselves are still in flux. This
> isn't so much a project to replace the FSM as it is a project that
> adds a new rich abstraction layer that goes between access methods and
> smgr.c -- free space management is only one of the responsibilities.

Makes sense. I think one of the big implementation challenges here is
coping with the scenario where there's not enough shared memory
available ... or else somehow making that impossible without reserving
an unreasonable amount of shared memory. If you allowed space for
every buffer to belong to a different relation and have the maximum
number of leases and whatever, you'd probably have no possibility of
OOM, but you'd probably be pre-reserving too much memory. I also think
there are some implementation challenges around locking. You probably
need some, because the data structure is shared, but because it's
complex, it's not easy to create locking that allows for good
concurrency. Or so I think.

Andres has been working -- I think for years now -- on replacing the
buffer mapping table with a radix tree of some kind. That strikes me
as very similar to what you're doing here. The per-relation data can
then include not only the kind of stuff you're talking about but very
fundamental things like how long it is and where its buffers are in
the buffer pool. Hopefully we don't end up with dueling patches.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Mark all GUC variable as PGDLLIMPORT
Next
From: Fujii Masao
Date:
Subject: Re: archive status ".ready" files may be created too early