Re: Cache relation sizes? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Cache relation sizes?
Date
Msg-id 20201117.172916.1231082071484485725.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Cache relation sizes?  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
At Mon, 16 Nov 2020 20:11:52 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in 
> After recent discussions about the limitations of relying on SEEK_END
> in a nearby thread[1], I decided to try to prototype a system for
> tracking relation sizes properly in shared memory.  Earlier in this
> thread I was talking about invalidation schemes for backend-local
> caches, because I only cared about performance.  In contrast, this new
> system has SMgrRelation objects that point to SMgrSharedRelation
> objects (better names welcome) that live in a pool in shared memory,
> so that all backends agree on the size.  The scheme is described in
> the commit message and comments.  The short version is that smgr.c
> tracks the "authoritative" size of any relation that has recently been
> extended or truncated, until it has been fsync'd.  By authoritative, I
> mean that there may be dirty buffers in that range in our buffer pool,
> even if the filesystem has vaporised the allocation of disk blocks and
> shrunk the file.
> 
> That is, it's not really a "cache".  It's also not like a shared
> catalog, which Konstantin was talking about... it's more like the pool
> of inodes in a kernel's memory.  It holds all currently dirty SRs
> (SMgrSharedRelations), plus as many clean ones as it can fit, with
> some kind of reclamation scheme, much like buffers.  Here, "dirty"
> means the size changed.
> 
> Attached is an early sketch, not debugged much yet (check undir
> contrib/postgres_fdw fails right now for a reason I didn't look into),
> and there are clearly many architectural choices one could make
> differently, and more things to be done... but it seemed like enough
> of a prototype to demonstrate the concept and fuel some discussion
> about this and whatever better ideas people might have...
> 
> Thoughts?
> 
> [1]
https://www.postgresql.org/message-id/flat/OSBPR01MB3207DCA7EC725FDD661B3EDAEF660%40OSBPR01MB3207.jpnprd01.prod.outlook.com

I was naively thinking that we could just remember the size of all
database files in a shared array but I realized that that needs a hash
table to translate rnodes into array indexes, which could grow huge..

The proposed way tries to make sure that the next fseek call tells the
truth before forgetting cached values.  On the other hand a
shared-smgrrel allows to defer fsyncing of a (local) smgrrel until
someone evicts the entry.  That seems to me to be the minimal
mechanism that allows us to keep us being able to know the right file
size at all times, getting rid of a need to remember the size of all
database files in shared memory.  I'm afraid that it might cause fsync
storms on very-huge systems, though..

However, if we try to do the similar thing in any other way, it seems
to be that it's going grown to be more like the pg_smgr catalog. But
that seems going to eat more memory and cause invalidation storms.

Sorry for the rambling thought, but I think this is basically in the
right direction.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: Add Information during standby recovery conflicts
Next
From: Daniel Gustafsson
Date:
Subject: Re: Online checksums patch - once again