Re: The Free Space Map: Problems and Opportunities - Mailing list pgsql-hackers

From John Naylor
Subject Re: The Free Space Map: Problems and Opportunities
Date
Msg-id CAFBsxsGH-yy-_xAPK8McLrwzZNo1r=JrshgLLZdmsX2hgtXL4g@mail.gmail.com
Whole thread Raw
In response to The Free Space Map: Problems and Opportunities  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: The Free Space Map: Problems and Opportunities  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Mon, Aug 16, 2021 at 1:36 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> Open and closed pages
> ---------------------

> This stickiness concept is called "hysteresis" by some DB researchers,
> often when discussing UNDO stuff [8]. Having *far less* granularity
> than FSM_CATEGORIES/255 seems essential to make that work as intended.
> Pages need to be able to settle without being disturbed by noise-level
> differences. That's all that super fine grained values buy you: more
> noise, more confusion.

I'm not sure it's essential to have "far" fewer categories, if the closed-to-open transition is made less granular through some other mechanism. We can certainly get by with fewer categories, freeing up bits -- it seems we'd need at least one bit to track a block's open-close state.

> Visibility map
> --------------
>
> If the logical database and natural locality are important to the FSM,
> then what about the visibility map? And what about the relationship
> between the FSM and the visibility map, in light of all this?
>
> Currently VACUUM doesn't care about how its FSM behavior interacts
> with how it sets all-frozen/all-visible bits for the same heap page.
> To me this seems completely unreasonable -- they're *obviously*
> related! We're probably only gaining a minimal amount of free space on
> one occasion by ignoring the VM/FSM relationship, for which we pay a
> high cost. Worst of all we're *perpetuating the cycle* of dirtying and
> redirtying the same pages over time. Maybe we should go as far as
> merging the FSM and VM, even -- that seems like a natural consequence
> of logical-ish/qualitative definitions of "page is full".

> [...]

> I now wonder if the FSM is fundamentally doing the wrong thing by
> keeping track of all "free space" in every page over time. Why
> wouldn't we just have a free space management strategy that is
> generally only concerned with recent events? If we find a way to make
> almost all newly allocated heap pages become "closed" quickly (maybe
> they're marked all-frozen quickly too), and make sure that that
> condition is sticky, then this can work out well. We may not even need
> to store explicit freespace information for most heap pages in the
> database -- a page being "closed" can be made implicit by the FSM
> and/or VM. Making heap pages age-out like this (and mostly stay that
> way over time) has obvious benefits in many different areas.

The second paragraph here is an interesting idea and makes a great deal of sense. It would lead to smaller FSMs that are navigated more quickly and locked for shorter durations.

Implicit "closure" seems riskier in my view if you want to bring VM qualities into it, however. Currently, setting an all-visible or all-frozen flag must be correct and crash-safe, but clearing those is just a lost optimization. If either of those qualities are implicit by lack of reference, it seems more vulnerable to bugs.

On Tue, Aug 17, 2021 at 12:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Aug 17, 2021 at 9:18 AM Andres Freund <andres@anarazel.de> wrote:
>
> > To me this seems like it'd be better addressed by a shared, per-relfilenode,
> > in-memory datastructure. Thomas Munro has been working on keeping accurate
> > per-relfilenode relation size information. ISTM that that that'd be a better
> > place to hook in for this.
>
> +1. I had this same thought reading Peter's email. I'm not sure if it
> makes sense to hook that into Thomas's work, but I think it makes a
> ton of sense to use shared memory to coordinate transient state like
> "hey, right now I'm inserting into this block, you guys leave it
> alone" while using the disk for durable state like "here's how much
> space this block has left."

This makes sense as well. Shared memory for more recent / highly contended state, and disk for less recent / less contended / stickier state. This also might have the advantage of smaller, more focused projects from a coding standpoint.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: .ready and .done files considered harmful
Next
From: "Bossart, Nathan"
Date:
Subject: Re: .ready and .done files considered harmful