RE: Protect syscache from bloating with negative cache entries - Mailing list pgsql-hackers

From Tsunakawa, Takayuki
Subject RE: Protect syscache from bloating with negative cache entries
Date
Msg-id 0A3221C70F24FB45833433255569204D1FB684A6@G01JPEXMBYT05
Whole thread Raw
In response to Re: Protect syscache from bloating with negative cache entries  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Certainly, what I've done here doesn't preclude adding some wider solution
> to
> the issue of extremely large catcaches.

I'm relieved to hear that.

> I think it takes the pressure off
> for one rather narrow problem case, and the mechanism could be used to fix
> other ones.  But if you've got an application that just plain accesses a
> huge number of objects, this isn't going to make your life better.

I understand you're trying to solve the problem caused by negative cache entries as soon as possible, because the user
isreally suffering from it.  I feel sympathy with that attitude, because you seem to be always addressing issues that
othersare reluctant to take.  That's one of the reasons I respect you.
 


> Well, we *had* an LRU mechanism for the catcaches way back when.  We got
> rid of it --- see commit 8b9bc234a --- because (a) maintaining the LRU
> info was expensive and (b) performance fell off a cliff in scenarios where
> the cache size limit was exceeded.  You could probably find some more info
> about that by scanning the mail list archives from around the time of that
> commit, but I'm too lazy to do so right now.

Oh, in 2006...  I'll examine the patch and the discussion to see how the LRU management was done.


> That was a dozen years ago, and it's possible that machine performance
> has moved so much since then that the problems are gone or mitigated.

I really, really hope so.  Even if we see some visible impact by the LRU management, I think that's the debt PostgreSQL
hadto pay for but doesn't now.  Even the single-process MySQL, which doesn't suffer from cache bloat for many server
processes,has the ability to limit the cache.  And PostgreSQL has many parameters for various memory components such as
shared_buffers,wal_buffers, work_mem, etc, so it would be reasonable to also have the limit for the catalog caches.
Thatsaid, we can avoid the penalty and retain the current performance by disabling the limit (some_size_param = 0).
 

I think we'll evaluate the impact of LRU management by adding prev and next members to catcache and relcache
structures,and putting the entry at the front (or back) of the LRU chain every time the entry is obtained.  I think
pgbench'sselect-only mode is enough for evaluation.  I'd like to hear if any other workload is more appropriate to see
theCPU cache effect.
 


> In particular I'm sure that any limit we would want to impose today will
> be far more than the 5000-entries-across-all-caches limit that was in use
> back then.  But I'm not convinced that a workload that would create 100K
> cache entries in the first place wouldn't have severe problems if you
> tried to constrain it to use only 80K entries.  I fear it's just wishful
> thinking to imagine that the behavior of a larger cache won't be just
> like a smaller one.  Also, IIRC some of the problem with the LRU code
> was that it resulted in lots of touches of unrelated data, leading to
> CPU cache miss problems.  It's hard to see how that doesn't get even
> worse with a bigger cache.
>
> As far as the relcache goes, we've never had a limit on that, but there
> are enough routine causes of relcache flushes --- autovacuum for instance
> --- that I'm not really convinced relcache bloat can be a big problem in
> production.

As Andres and Robert mentioned, we want to free less frequently used cache entries.  Otherwise, we're now suffering
fromthe bloat to TBs of memory.  This is a real, not hypothetical issue...
 



> The plancache has never had a limit either, which is a design choice that
> was strongly influenced by our experience with catcaches.  Again, I'm
> concerned about the costs of adding a management layer, and the likelihood
> that cache flushes will simply remove entries we'll soon have to rebuild.

Fortunately, we're not bothered with the plan cache.  But I remember you said you were annoyed by PL/pgSQL's plan cache
useat Salesforce.  Were you able to overcome it somehow?
 



> Oracle seldom impresses me as having designs we ought to follow.
> They have a well-earned reputation for requiring a lot of expertise to
> operate, which is not the direction this project should be going in.
> In particular, I don't want to "solve" cache size issues by exposing
> a bunch of knobs that most users won't know how to twiddle.


Oracle certainly seems to be difficult to use.  But they seem to be studying other DBMSs to make it simpler to use.
I'msure they also have a lot we should learn, and the cache limit if one of them (although MySQL's per-cache tuning may
bebetter.)
 

And having limits for various components would be the first step toward the autonomous database; tunable -> auto tuning
->autonomous
 



Regards
Takayuki Tsunakawa





pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: speeding up planning with partitions
Next
From: Michael Paquier
Date:
Subject: Re: de-deduplicate code in DML execution hooks in postgres_fdw