Re: Cache relation sizes? - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Cache relation sizes?
Date
Msg-id CA+hUKGLfYgBz7_w=oOMDYvReuNr1u-xQdizTbTiHpVKwbAyvOQ@mail.gmail.com
Whole thread Raw
In response to Re: Cache relation sizes?  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses RE: Cache relation sizes?  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Re: Cache relation sizes?  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Nov 16, 2020 at 11:01 PM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> I will look at your implementation more precisely latter.

Thanks!  Warning:  I thought about making a thing like this for a
while, but the patch itself is only a one-day prototype, so I am sure
you can find many bugs...  Hmm, I guess the smgrnblocks_fast() code
really needs a nice big comment to explain the theory about why this
value is fresh enough, based on earlier ideas about implicit memory
barriers.  (Something like: if you're extending, you must have
acquired the extension lock; if you're scanning you must have acquired
a snapshot that only needs to see blocks that existed at that time and
concurrent truncations are impossible; I'm wondering about special
snapshots like VACUUM, and whether this is too relaxed for that, or if
it's covered by existing horizon-based interlocking...).

> But now I just to ask one question: is it reasonable to spent
> substantial amount of efforts trying to solve
> the problem of redundant calculations of relation size, which seems to
> be just part of more general problem: shared relation cache?

It's a good question.  I don't know exactly how to make a shared
relation cache (I have only some vague ideas and read some of
Idehira-san's work) but I agree that it would be very nice.  However,
I think that's an independent and higher level thing, because Relation
!= SMgrRelation.

What is an SMgrRelation?  Not much!  It holds file descriptors, which
are different in every backend so you can't share them, and then some
information about sizes and a target block for insertions.  With this
patch, the sizes become shared and more reliable, so there's *almost*
nothing left at all except for the file descriptors and the pointer to
the shared object.  I haven't looked into smgr_targblock, the last
remaining non-shared thing, but there might be an opportunity to do
something clever for parallel inserters (the proposed feature) by
coordinating target blocks in SMgrSharedRelation; I don't know.

> It is well known problem: private caches of system catalog can cause
> unacceptable memory consumption for large number of backends.

Yeah.

> Also private caches requires sophisticated and error prone invalidation
> mechanism.
>
> If we will have such shared cache, then keeping shared relation size
> seems to be trivial task, isn't it?
> So may be we before "diving" into the problem of maintaining shared pool
> of dirtied "inodes" we can better discuss and conerntrate on solving
> more generic problem?

Sure, we can talk about how to make a shared relation cache (and
syscache etc).  I think it'll be much harder than this shared SMGR
concept.  Even if we figure out how to share palloc'd trees of nodes
with correct invalidation and interlocking (ie like Ideriha-san's work
in [1]), including good performance and space management/replacement,
I guess we'll still need per-backend Relation objects to point to the
per-backend SMgrRelation objects to hold the per-backend file
descriptors.  (Until we have threads...).  But... do you think sharing
SMGR relations to the maximum extent possible conflicts with that?
Are you thinking there should be some general component here, perhaps
a generic system for pools of shmem objects with mapping tables and a
replacement policy?

[1] https://www.postgresql.org/message-id/flat/4E72940DA2BF16479384A86D54D0988A567B9245%40G01JPEXMBKW04



pgsql-hackers by date:

Previous
From: luis.roberto@siscobra.com.br
Date:
Subject: Re: enable_incremental_sort changes query behavior
Next
From: Masahiko Sawada
Date:
Subject: Re: VACUUM (DISABLE_PAGE_SKIPPING on)