RE: [HACKERS] mdnblocks is an amazing time sink in huge relations - Mailing list pgsql-hackers

From Hiroshi Inoue
Subject RE: [HACKERS] mdnblocks is an amazing time sink in huge relations
Date
Msg-id 000801bf1a19$2d88ae20$2801007e@cadzone.tpf.co.jp
Whole thread Raw
In response to Re: [HACKERS] mdnblocks is an amazing time sink in huge relations  (Vadim Mikheev <vadim@krs.ru>)
Responses Re: [HACKERS] mdnblocks is an amazing time sink in huge relations  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> 
> Tom Lane wrote:
> > 
> > >> a shared cache for system catalog tuples, which might be a 
> win but I'm
> > >> not sure (I'm worried about contention for the cache, 
> especially if it's
> > >> protected by just one or a few spinlocks).  Anyway, if we 
> did have one
> 
> Commercial DBMSes have this... Isn't it a good reason? -:)
> 
> > > But there would be a problem if we use shared catalog cache.
> > > Being updated system tuples are only visible to an updating backend
> > > and other backends should see committed tuples.
> > > On the other hand,an accurate block count should be visible to all
> > > backends.
> > > Which tuple of a row should we load to catalog cache and update ?
> > 
> > Good point --- rolling back a transaction would cancel changes to the
> > pg_class row, but it mustn't cause the relation's file to get truncated
> > (since there could be tuples of other uncommitted transactions in the
> > newly added block(s)).
> > 
> > This says that having a block count column in pg_class is the Wrong
> > Thing; we should get rid of relpages entirely.  The Right Thing is a
> > separate data structure in shared memory that stores the current
> > physical block count for each active relation.  The first backend to
> > touch a given relation would insert an entry, and then subsequent
> > extensions/truncations/deletions would need to update it.  We already
> > obtain a special lock when extending a relation, so seems like there'd
> > be no extra locking cost to have a table like this.
> 
> I supposed that each backend will still use own catalog 
> cache (after reading entries from shared one) and synchronize 
> shared/private caches on commit - e.g. update reltuples!
> relpages will be updated immediately after physical changes -
> what's problem with this?
>

Does this mean the following ?

1. shared cache holds committed system tuples.
2. private cache holds uncommitted system tuples.
3. relpages of shared cache are updated immediately by   phisical change and corresponding buffer pages are   marked
dirty.
4. on commit, the contents of uncommitted tuples except  relpages,reltuples,... are copied to correponding tuples  in
sharedcache and the combined contents are  committed.
 

If so,catalog cache invalidation would be no longer needed.
But synchronization of the step 4. may be difficult.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


pgsql-hackers by date:

Previous
From: "Hiroshi Inoue"
Date:
Subject: RE: [HACKERS] mdnblocks is an amazing time sink in huge relations
Next
From: Peter Eisentraut
Date:
Subject: Re: New developer globe