Thread: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tomas Vondra
Date:
Hi all, a few days ago I setup an buildfarm animal markhor, running the tests with CLOBBER_CACHE_RECURSIVELY. As the tests are running very long, reporting the results back to the server fails because of a safeguard limit in the buildfarm server. Anyway, that's being discussed in a different thread - here it's merely as a 'don't bother looking for addax on the buildfarm website' warning. I've been checking the progress of the recursive tests today, and I found it actually failed in the 'make check' step. The logs are available here: buildfarm logs: http://www.fuzzy.cz/tmp/buildfarm/recursive-oom.tgz kernel logs: http://www.fuzzy.cz/tmp/buildfarm/messages The tests are running within a LXC container (operated through libvirt), so whenever I say 'VM' I actually mean a LXC container. It might be some VM/LXC misconfiguration, but as this happens only to a single VM (the one running tests with recursive clobber), I find it unlikely. ================== An example of the failure ================== parallel group (20 tests): pg_lsn regproc oid name char money float4 txid text int2 varchar int4 float8 boolean int8 uuid rangetypes bit numeric enum ... float4 ... ok float8 ... ok bit ... FAILED (test process exited with exitcode 2) numeric ... FAILED (test process exited with exit code 2) txid ... ok ... =============================================================== and then of course the usual 'terminating connection because of crash of another server process' warning. Apparently, it's getting killed by the OOM killer, because it exhausts all the memory assigned to that VM (2GB). May 15 19:44:53 postgres invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0 May 15 19:44:53 cspug kernel: postgres cpuset=recursive-builds mems_allowed=0 May 15 19:44:53 cspug kernel: Pid: 17159, comm: postgres Not tainted 2.6.32-431.17.1.el6.centos.plus.x86_64 #1 AFAIK 2GB is more than enough for a buildfarm machine (after all, chipmunk hass just 512MB). Also, this only happens on this VM (cpuset=recursive-builds), the other two VMs, with exactly the same limits, running other buildfarm animals (regular or with CLOBBER_CACHE_ALWAYS) are perfectly happy. See magpie or markhor for example. And I don't see any reason why a build with recursive clobber should require more memory than a regular build. So this seems like a memory leak somewhere in the cache invalidation code. I thought it might be fixed by commit b23b0f5588 (Code review for recent changes in relcache.c), but mite is currently working on 7894ac5004 and yet it already failed on OOM. The failures apparently happen within a few hours of the test start. For example on addax (gcc), the build started on 02:50 and the first OOM failure happened on 05:19, on mite (clang), it's 03:20 vs. 06:50. So it's like ~3-4 after the tests start. regards Tomas
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tom Lane
Date:
Tomas Vondra <tv@fuzzy.cz> writes: > ... then of course the usual 'terminating connection because of crash of > another server process' warning. Apparently, it's getting killed by the > OOM killer, because it exhausts all the memory assigned to that VM (2GB). Can you fix things so it runs into its process ulimit before the OOM killer triggers? Then we'd get a memory map dumped to stderr, which would be helpful in localizing the problem. > ... So this seems like a > memory leak somewhere in the cache invalidation code. Smells that way to me too, but let's get some more evidence. regards, tom lane
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tomas Vondra
Date:
On 17.5.2014 19:55, Tom Lane wrote: > Tomas Vondra <tv@fuzzy.cz> writes: >> ... then of course the usual 'terminating connection because of crash of >> another server process' warning. Apparently, it's getting killed by the >> OOM killer, because it exhausts all the memory assigned to that VM (2GB). > > Can you fix things so it runs into its process ulimit before the OOM killer > triggers? Then we'd get a memory map dumped to stderr, which would be > helpful in localizing the problem. I did this in /etc/security/limits.d/80-pgbuild.conf: pgbuild hard as 1835008 so the user the buildfarm runs under will have up to ~1.75GB of RAM (of the 2GB available to the container). > >> ... So this seems like a >> memory leak somewhere in the cache invalidation code. > > Smells that way to me too, but let's get some more evidence. The tests are already running, and there are a few postgres processes: PID VIRT RES %CPU TIME+ COMMAND 11478 449m 240m 100.0 112:53.57 postgres: pgbuild regression [local] CREATE VIEW 11423 219m 19m 0.0 0:00.17 postgres: checkpointer process 11424 219m 2880 0.0 0:00.05 postgres: writer process 11425 219m 5920 0.0 0:00.12 postgres: wal writer process 11426 219m 2708 0.0 0:00.05 postgres: autovacuum launcher process 11427 79544 1836 0.0 0:00.17 postgres: stats collector process 11479 1198m 1.0g 0.0 91:09.99 postgres: pgbuild regression [local] CREATE INDEX waiting Attached is 'pmap -x' output for the two interesting processes (11478, 11479). Tomas
Attachment
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Andres Freund
Date:
On 2014-05-17 20:41:37 +0200, Tomas Vondra wrote: > On 17.5.2014 19:55, Tom Lane wrote: > > Tomas Vondra <tv@fuzzy.cz> writes: > The tests are already running, and there are a few postgres processes: > > PID VIRT RES %CPU TIME+ COMMAND > 11478 449m 240m 100.0 112:53.57 postgres: pgbuild regression [local] > CREATE VIEW > 11423 219m 19m 0.0 0:00.17 postgres: checkpointer process > 11424 219m 2880 0.0 0:00.05 postgres: writer process > 11425 219m 5920 0.0 0:00.12 postgres: wal writer process > 11426 219m 2708 0.0 0:00.05 postgres: autovacuum launcher process > 11427 79544 1836 0.0 0:00.17 postgres: stats collector process > 11479 1198m 1.0g 0.0 91:09.99 postgres: pgbuild regression [local] > CREATE INDEX waiting > > Attached is 'pmap -x' output for the two interesting processes (11478, > 11479). Could you gdb -p 11479 into the process and issue 'p MemoryContextStats(TopMemoryContext)'. That should print information about the server's allocation to its stderr. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tomas Vondra
Date:
On 17.5.2014 21:31, Andres Freund wrote: > On 2014-05-17 20:41:37 +0200, Tomas Vondra wrote: >> On 17.5.2014 19:55, Tom Lane wrote: >>> Tomas Vondra <tv@fuzzy.cz> writes: >> The tests are already running, and there are a few postgres processes: >> >> PID VIRT RES %CPU TIME+ COMMAND >> 11478 449m 240m 100.0 112:53.57 postgres: pgbuild regression [local] >> CREATE VIEW >> 11423 219m 19m 0.0 0:00.17 postgres: checkpointer process >> 11424 219m 2880 0.0 0:00.05 postgres: writer process >> 11425 219m 5920 0.0 0:00.12 postgres: wal writer process >> 11426 219m 2708 0.0 0:00.05 postgres: autovacuum launcher process >> 11427 79544 1836 0.0 0:00.17 postgres: stats collector process >> 11479 1198m 1.0g 0.0 91:09.99 postgres: pgbuild regression [local] >> CREATE INDEX waiting >> >> Attached is 'pmap -x' output for the two interesting processes (11478, >> 11479). > > Could you gdb -p 11479 into the process and issue 'p > MemoryContextStats(TopMemoryContext)'. That should print information > about the server's allocation to its stderr. That process already finished, but I've done that for another process (that had ~400MB allocated, and was growing steadily - about 1MB / 10 seconds). It was running a SELECT query and already completed. Then it executed ANALYZE, and I took several snapshots of it - not sure how much memory it had at the beginning (maybe ~250MB), then when it had ~350MB, 400MB, 500MB and 600MB. It's still running, not sure how much will it grow. Anyway, the main difference between the analyze snapshot seems to be this: init: CacheMemoryContext: 67100672 total in 17 blocks; ... 350MB: CacheMemoryContext: 134209536 total in 25 blocks; ... 400MB: CacheMemoryContext: 192929792 total in 32 blocks; ... 500MB: CacheMemoryContext: 293593088 total in 44 blocks; ... 600MB: CacheMemoryContext: 411033600 total in 58 blocks; ... Not sure if there's something wrong with the SELECT memory context. It has ~1500 of nested nodes like these: SQL function data: 24576 total in 2 blocks; ... ExecutorState: 24576 total in 2 blocks; ... SQL function data: 24576 total in 2 blocks; ... ExprContext: 8192 total in 1 blocks; ... But maybe it's expected / OK. regards Tomas
Attachment
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tomas Vondra
Date:
On 17.5.2014 22:33, Tomas Vondra wrote: > On 17.5.2014 21:31, Andres Freund wrote: >> On 2014-05-17 20:41:37 +0200, Tomas Vondra wrote: >>> On 17.5.2014 19:55, Tom Lane wrote: >>>> Tomas Vondra <tv@fuzzy.cz> writes: >>> The tests are already running, and there are a few postgres processes: >>> >>> PID VIRT RES %CPU TIME+ COMMAND >>> 11478 449m 240m 100.0 112:53.57 postgres: pgbuild regression [local] >>> CREATE VIEW >>> 11423 219m 19m 0.0 0:00.17 postgres: checkpointer process >>> 11424 219m 2880 0.0 0:00.05 postgres: writer process >>> 11425 219m 5920 0.0 0:00.12 postgres: wal writer process >>> 11426 219m 2708 0.0 0:00.05 postgres: autovacuum launcher process >>> 11427 79544 1836 0.0 0:00.17 postgres: stats collector process >>> 11479 1198m 1.0g 0.0 91:09.99 postgres: pgbuild regression [local] >>> CREATE INDEX waiting >>> >>> Attached is 'pmap -x' output for the two interesting processes (11478, >>> 11479). >> >> Could you gdb -p 11479 into the process and issue 'p >> MemoryContextStats(TopMemoryContext)'. That should print information >> about the server's allocation to its stderr. And another memory context stats for a session executing CREATE INDEX, while having allocated The interesting thing is there are ~11k lines that look exactly like this: pg_namespace_oid_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used Seems a bit strange to me. Tomas
Attachment
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Andres Freund
Date:
Hi, On 2014-05-17 22:33:31 +0200, Tomas Vondra wrote: > Anyway, the main difference between the analyze snapshot seems to be this: > > init: CacheMemoryContext: 67100672 total in 17 blocks; ... > 350MB: CacheMemoryContext: 134209536 total in 25 blocks; ... > 400MB: CacheMemoryContext: 192929792 total in 32 blocks; ... > 500MB: CacheMemoryContext: 293593088 total in 44 blocks; ... > 600MB: CacheMemoryContext: 411033600 total in 58 blocks; ... Hm, so something is definitely leaking memory inside CacheMemoryContext itself. Is that happening constantly or just with individual tests? > Not sure if there's something wrong with the SELECT memory context. It > has ~1500 of nested nodes like these: > > SQL function data: 24576 total in 2 blocks; ... > ExecutorState: 24576 total in 2 blocks; ... > SQL function data: 24576 total in 2 blocks; ... > ExprContext: 8192 total in 1 blocks; ... > > But maybe it's expected / OK. I'd guess that's a recursive function call. Any chance you know what's been executing at that point? I'd bet it's been the 'errors' check. That has: -- Check that stack depth detection mechanism works and -- max_stack_depth is not set too high create function infinite_recurse() returns int as 'select infinite_recurse()' language sql; \set VERBOSITY terse select infinite_recurse(); ERROR: stack depth limit exceeded which'd pretty much produce a tree of executions like yours. Greetings, Andres Freund
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tomas Vondra
Date:
On 17.5.2014 23:09, Andres Freund wrote: > Hi, > > On 2014-05-17 22:33:31 +0200, Tomas Vondra wrote: >> Anyway, the main difference between the analyze snapshot seems to be this: >> >> init: CacheMemoryContext: 67100672 total in 17 blocks; ... >> 350MB: CacheMemoryContext: 134209536 total in 25 blocks; ... >> 400MB: CacheMemoryContext: 192929792 total in 32 blocks; ... >> 500MB: CacheMemoryContext: 293593088 total in 44 blocks; ... >> 600MB: CacheMemoryContext: 411033600 total in 58 blocks; ... > > Hm, so something is definitely leaking memory inside CacheMemoryContext > itself. Is that happening constantly or just with individual tests? I think it's happening constantly - I've been eyeballing the VM for some time, and the backends usually hoover around ~220 MBs, executing SELECTs etc. And then it executes CREATE INDEX and it starts growing. Attached is a log for a CREATE INDEX session at the monent it had ~1.3GB allocated (create-index-5.log.gz). It has ~217k copies of pg_attrdef_adrelid_adnum_index: 3072 total in 2 blocks; 1968 free (0 chunks); 1104 used And also another log (AFAIK from another CREATE INDEX session), with ~11k copies of pg_namespace_oid_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used >> Not sure if there's something wrong with the SELECT memory context. It >> has ~1500 of nested nodes like these: >> >> SQL function data: 24576 total in 2 blocks; ... >> ExecutorState: 24576 total in 2 blocks; ... >> SQL function data: 24576 total in 2 blocks; ... >> ExprContext: 8192 total in 1 blocks; ... >> >> But maybe it's expected / OK. > > I'd guess that's a recursive function call. Any chance you know what's > been executing at that point? I'd bet it's been the 'errors' check. That > has: > -- Check that stack depth detection mechanism works and > -- max_stack_depth is not set too high > create function infinite_recurse() returns int as > 'select infinite_recurse()' language sql; > \set VERBOSITY terse > select infinite_recurse(); > ERROR: stack depth limit exceeded > > which'd pretty much produce a tree of executions like yours. Yeah, I was thinking it's something like that. I'm not sure what tests were running at that that particular moment, so maybe it really was the 'errors' check. Tomas
Attachment
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Andres Freund
Date:
On 2014-05-17 22:55:14 +0200, Tomas Vondra wrote: > And another memory context stats for a session executing CREATE INDEX, > while having allocated The interesting thing is there are ~11k lines > that look exactly like this: > > pg_namespace_oid_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used Heh. That certainly doesn't look right. I bet it's the contexts from RelationInitIndexAccessInfo(). Looks like there generally are recursion 'troubles' with system indexes. RelationClearRelation() will mark them as invalid causing any further lookup to do RelationBuildDesc() calls. Which will again rebuild the same relation if it's used somewhere inside RelationBuildDesc() or RelationInitIndexAccessInfo(). From a quick look it looks like it should resolve itself after some time. Even freeing the superflous memory contexts. But I am not sure if there scenarios where that won't happen... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-05-17 22:55:14 +0200, Tomas Vondra wrote: >> And another memory context stats for a session executing CREATE INDEX, >> while having allocated The interesting thing is there are ~11k lines >> that look exactly like this: >> >> pg_namespace_oid_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used > Heh. That certainly doesn't look right. I bet it's the contexts from > RelationInitIndexAccessInfo(). > Looks like there generally are recursion 'troubles' with system > indexes. RelationClearRelation() will mark them as invalid causing any > further lookup to do RelationBuildDesc() calls. Which will again rebuild > the same relation if it's used somewhere inside RelationBuildDesc() or > RelationInitIndexAccessInfo(). From a quick look it looks like it should > resolve itself after some time. Even freeing the superflous memory > contexts. But I am not sure if there scenarios where that won't > happen... I've identified a couple of memory leakage scenarios in relcache.c: 1. The RelationCacheInsert macro just blithely ignores the possibility that hash_search(HASH_ENTER) returns found = true. There's a comment/* used to give notice if found -- now just keep quiet*/ which AFAICT was there when we got the code from Berkeley. Well, it turns out that that's a can-happen scenario, at least for a few system catalogs and indexes that are referenced during relcache build. The scenario is that we come in through RelationIdGetRelation, don't find a cache entry for, say, pg_index, and set about building one in RelationBuildDesc. Somewhere in the guts of that we have need to read pg_index, whereupon there's a recursive call of RelationIdGetRelation, which still doesn't find the entry, so we again call RelationBuildDesc which builds an entirely separate Relation structure. The recursion does terminate thanks to the recursion-stopping provisions in relcache.c, so the inner call finishes and enters its completed Relation structure into the RelationIdCache hashtable. Control returns out to the outer invocation of RelationBuildDesc, which finishes, and enters its completed Relation structure into the RelationIdCache hashtable --- overwriting the link to the Relation made by the inner invocation. That Relation and subsidiary data are now unreferenced and permanently leaked in CacheMemoryContext. If it's an index you can see its leaked subsidiary rd_indexcxt in a memory dump, which is what we're looking at above. AFAICT, the inner invocation's Relation should always have zero reference count by the time we get back to the outer invocation. Therefore it should be possible for RelationCacheInsert to just delete the about-to-be-unreachable Relation struct. I'm experimenting with a patch that adds logic like this to RelationCacheInsert: if (found) {Relation oldrel = idhentry->reldesc;idhentry->reldesc = RELATION;if (RelationHasReferenceCountZero(oldrel)) RelationDestroyRelation(oldrel, false);else elog(WARNING, "leaking still-referencedduplicate relation"); } and so far it looks good. 2. There's a much smaller leak in AttrDefaultFetch: it doesn't bother to pfree the result of TextDatumGetCString(). This leakage occurs in the caller's context not CacheMemoryContext, so it's only query lifespan not session lifespan, and it would not ordinarily be significant --- but with the CLOBBER_CACHE logic enabled, we rebuild some relcache entries a damn lot of times within some queries, so the leak adds up. With both of these things fixed, I'm not seeing any significant memory bloat during the first parallel group of the regression tests. I don't think I'll have the patience to let it run much further than that (the uuid and enum tests are still running after an hour :-(). BTW, it strikes me that we could probably improve the runtime of the CLOBBER tests noticeably if we were to nail AttrDefaultIndexId, IndexIndrelidIndexId, and ConstraintRelidIndexId into cache. I see no very good reason not to do that; it should help performance a bit in normal cases too. While I'm at it: I could not help noticing RememberToFreeTupleDescAtEOX, which was not there last time I looked at this code. Isn't that broken by design? It's basically a deliberately induced transaction-lifespan memory leak, and AFAICS if it does anything at all, it's supporting incorrect calling code. There should *never* be any situation where it's not okay to free a tupledesc with zero refcount. And the comment justifying it is insane on its face: * If we Rebuilt a relcache entry during a transaction then its * possible we did that because the TupDescchanged as the result of * an ALTER TABLE that ran at less than AccessExclusiveLock. It's * possiblesomeone copied that TupDesc, in which case the copy would * point to free'd memory. So if we rebuild an entrywe keep the If someone copied the tupdesc, how would freeing the original result in the copy being damaged? And even more to the point, if anyone is doing anything that involves inspecting a table's rowtype, how could it possibly be safe for someone else to be altering the rowtype with less than exclusive lock? Unless I hear an adequate defense of that code PDQ, it's going away. regards, tom lane
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Andres Freund
Date:
Hi, On 2014-05-18 14:49:10 -0400, Tom Lane wrote: > AFAICT, the inner invocation's Relation should always have zero reference > count by the time we get back to the outer invocation. Therefore it > should be possible for RelationCacheInsert to just delete the > about-to-be-unreachable Relation struct. I'm experimenting with a patch > that adds logic like this to RelationCacheInsert: > > if (found) > { > Relation oldrel = idhentry->reldesc; > idhentry->reldesc = RELATION; > if (RelationHasReferenceCountZero(oldrel)) > RelationDestroyRelation(oldrel, false); > else > elog(WARNING, "leaking still-referenced duplicate relation"); > } > > and so far it looks good. If that happens we'd essentially have a too low reference count on the entry remaining in the relcache. I'd consider putting an Assert() in that branch. Perhaps it should also be only allowed for system relations? If it somehow happens for user defined ones we'd loose rd_newRelfilenodeSubid and such. Which we treat as reliable these days. > With both of these things fixed, I'm not seeing any significant memory > bloat during the first parallel group of the regression tests. I don't > think I'll have the patience to let it run much further than that > (the uuid and enum tests are still running after an hour :-(). Yea, it's really unbearable. A *single* UPDATE took close to 5 minutes here when testing the decoding stuff. > BTW, it strikes me that we could probably improve the runtime of the > CLOBBER tests noticeably if we were to nail AttrDefaultIndexId, > IndexIndrelidIndexId, and ConstraintRelidIndexId into cache. I see > no very good reason not to do that; it should help performance a bit > in normal cases too. Sounds like a good plan. From what I've seen here, the mojority of the time is spent iterating over various hashes. Primarily relcache.c and catcache.c. Removing some syscache lookups will reduce the repetitiveness of tha ta bit. > While I'm at it: I could not help noticing RememberToFreeTupleDescAtEOX, > which was not there last time I looked at this code. Isn't that broken > by design? It's basically a deliberately induced transaction-lifespan > memory leak, and AFAICS if it does anything at all, it's supporting > incorrect calling code. There should *never* be any situation where it's > not okay to free a tupledesc with zero refcount. I am not sure I understand that code. The whole keep_tupdesc/keep_rules logic isn't new and a bit underdocumented as well :(. Seems to come from 491dd4a9. If I understand correctly the issue is that callers have accessed the tupledesc with RelationGetDescr() which doesn't increase the refcount. If then there's rebuild of that relation's relcache entry they'll still be looking at the old tupledesc. Which will now have a reference count of 0. The existing code tries to handle that by not invalidating the old tupledescs if equal to the rebuilt one. Thereby leaving the uncounted reference from the outside in a valid state. The reason that mostly works < 9.4 is that the tupledesc normally won't change without an AEL lock on the relation which can't happen while there are still open references. But I am wondering, is that actually true, even before today? ... No, doesn't look that way: CREATE TABLE a(id serial primary key); BEGIN; DECLARE foo CURSOR FOR SELECT * FROM a; ALTER TABLE a RENAME COLUMN id TO id2; So, renaming a column doesn't enforce CheckTableNotInUse() which means that it's possible to access a RelationGetDescr() before calling an external function which might do a ALTER TABLE .. RENAME COLUMN thereby triggering !keep_tupdesc. And freeing the existing tupledesc. Now, it will likely work in this case because the DECLARE has probably properly incremented the refcount, but nonetheless... I am pretty sure I could construct a crasher with that, even < 9.4. This is a mess... I think the code is really only triggering if !keep_tupdesc is false which should be a fairly minor amount of cases. So it's probably not too bad performancewise. > * If we Rebuilt a relcache entry during a transaction then its > * possible we did that because the TupDesc changed as the result of > * an ALTER TABLE that ran at less than AccessExclusiveLock. It's > * possible someone copied that TupDesc, in which case the copy would > * point to free'd memory. So if we rebuild an entry we keep the > > If someone copied the tupdesc, how would freeing the original result > in the copy being damaged? The sentence only seems to make sense with s/copy/accessed without increasing the refcoung, e.g. with RelationGetDescr,/. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-05-18 14:49:10 -0400, Tom Lane wrote: >> if (RelationHasReferenceCountZero(oldrel)) >> RelationDestroyRelation(oldrel, false); >> else >> elog(WARNING, "leaking still-referenced duplicate relation"); > If that happens we'd essentially have a too low reference count on the > entry remaining in the relcache. No, we'd have two independent entries, each with its own correct refcount. When the refcount on the no-longer-linked-in-the-hashtable entry goes to zero, it'd be leaked, same as it's always been. (The refcount presumably corresponds to someone holding a direct pointer to the Relation struct, which is what they'd use to decrement the refcount.) > I'd consider putting an Assert() in that branch. I'm a bit afraid to do that for a condition that the system's never tested for at all up to now; in any case, a WARNING would be visible in production whereas an Assert would probably do nothing. If we see no reports of this WARNING for a release cycle or so, maybe asserting would be appropriate. I did put in an assert to prevent this path from being taken in cases where we don't expect that a previous entry could possibly exist. > Perhaps it should also be only allowed for system > relations? One would hope those are the only ones that get opened during relcache load ;-) >> BTW, it strikes me that we could probably improve the runtime of the >> CLOBBER tests noticeably if we were to nail AttrDefaultIndexId, >> IndexIndrelidIndexId, and ConstraintRelidIndexId into cache. I see >> no very good reason not to do that; it should help performance a bit >> in normal cases too. > Sounds like a good plan. I experimented with that and didn't really see any marked improvement in the CLOBBER_CACHE_ALWAYS runtime, so I'm less excited about it than I was yesterday. Still concerned about RememberToFreeTupleDescAtEOX, but that's an independent issue really. regards, tom lane
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Andres Freund
Date:
On 2014-05-19 11:25:04 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-05-18 14:49:10 -0400, Tom Lane wrote: > >> if (RelationHasReferenceCountZero(oldrel)) > >> RelationDestroyRelation(oldrel, false); > >> else > >> elog(WARNING, "leaking still-referenced duplicate relation"); > > > If that happens we'd essentially have a too low reference count on the > > entry remaining in the relcache. > > No, we'd have two independent entries, each with its own correct refcount. > When the refcount on the no-longer-linked-in-the-hashtable entry goes to > zero, it'd be leaked, same as it's always been. (The refcount presumably > corresponds to someone holding a direct pointer to the Relation struct, > which is what they'd use to decrement the refcount.) The problem is that only one of these entries will get properly handled by cache invalidation. I wonder if the correct thing wouldn't be to return the entry already in the cache. But that'd not be trivial to do either, without the potential to return a invalid entry :( > > I'd consider putting an Assert() in that branch. > > I'm a bit afraid to do that for a condition that the system's never tested > for at all up to now; in any case, a WARNING would be visible in production > whereas an Assert would probably do nothing. Ifentry we see no reports of this > WARNING for a release cycle or so, maybe asserting would be appropriate. Fair point. > > Perhaps it should also be only allowed for system > > relations? > > One would hope those are the only ones that get opened during relcache > load ;-) I thought about it for a while and I wonder if that's necessarily correct. If somebody registers a relcache invalidation callback that could happen when invalidations are processed somewhere while rebuilding a entry? > Still concerned about RememberToFreeTupleDescAtEOX, but that's an > independent issue really. Me too and yes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-05-19 11:25:04 -0400, Tom Lane wrote: >> No, we'd have two independent entries, each with its own correct refcount. >> When the refcount on the no-longer-linked-in-the-hashtable entry goes to >> zero, it'd be leaked, same as it's always been. (The refcount presumably >> corresponds to someone holding a direct pointer to the Relation struct, >> which is what they'd use to decrement the refcount.) > The problem is that only one of these entries will get properly handled > by cache invalidation. I don't think that's a problem; if the old entry is still referenced, presumably the holder of the reference is also holding a lock sufficient to prevent any problematic schema changes. > I wonder if the correct thing wouldn't be to > return the entry already in the cache. But that'd not be trivial to do > either, without the potential to return a invalid entry :( The one we just finished computing is presumably "newer" in some sense, otherwise I would've done it like that. In particular, if the older entry is no longer rd_isvalid, it'd definitely be unsafe to return. >> One would hope those are the only ones that get opened during relcache >> load ;-) > I thought about it for a while and I wonder if that's necessarily > correct. If somebody registers a relcache invalidation callback that > could happen when invalidations are processed somewhere while rebuilding > a entry? Cache invalidation callbacks shouldn't do anything that would involve opening non-system relations, IMO. regards, tom lane
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tomas Vondra
Date:
On 18.5.2014 20:49, Tom Lane wrote: > With both of these things fixed, I'm not seeing any significant memory > bloat during the first parallel group of the regression tests. I don't > think I'll have the patience to let it run much further than that > (the uuid and enum tests are still running after an hour :-(). I've been watching barnacle running the tests with this commit applied, and so far it seems fine - the memory consumption is hovering around 240MB, and it's running for ~8 hours now without a single OOM. Previously the first OOM happened within ~3 hours. I've also included a simple python script that checks amount of memory used by the postgres processes every minute, and if it's more that 512MB uses gdb to flush memory context statistics into postmaster log. So far there was no such case. Sadly, with 'mite' this resulted in 900+ MB postmaster.log, which then apparently caused the buildfarm client to be killed by OOM (seems the client is loading the log into memory, not expecting it to be this large). That's why there are no results from mite available. BTW, I see this was applied to HEAD only. Shouldn't this be applied to the other branches too? regards Tomas
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tom Lane
Date:
Tomas Vondra <tv@fuzzy.cz> writes: > On 18.5.2014 20:49, Tom Lane wrote: >> With both of these things fixed, I'm not seeing any significant memory >> bloat during the first parallel group of the regression tests. I don't >> think I'll have the patience to let it run much further than that >> (the uuid and enum tests are still running after an hour :-(). > BTW, I see this was applied to HEAD only. Shouldn't this be applied to > the other branches too? I intentionally didn't do that, first because I have only a limited amount of confidence in the patch, and second because I don't think it matters for anything except CLOBBER_CACHE_RECURSIVELY testing. Which surely you're not going to do in the back branches? regards, tom lane
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tomas Vondra
Date:
On 19.5.2014 22:11, Tom Lane wrote: > Tomas Vondra <tv@fuzzy.cz> writes: >> On 18.5.2014 20:49, Tom Lane wrote: >>> With both of these things fixed, I'm not seeing any significant memory >>> bloat during the first parallel group of the regression tests. I don't >>> think I'll have the patience to let it run much further than that >>> (the uuid and enum tests are still running after an hour :-(). > >> BTW, I see this was applied to HEAD only. Shouldn't this be applied to >> the other branches too? > > I intentionally didn't do that, first because I have only a limited > amount of confidence in the patch, and second because I don't think > it matters for anything except CLOBBER_CACHE_RECURSIVELY testing. > Which surely you're not going to do in the back branches? Yes, I'm running this on HEAD only. I was however wondering if this might be related to OOM errors a few local users reported to us. IIRC they've been using temporary tables quite heavily - not sure if that could be related. regards Tomas
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Andres Freund
Date:
On 2014-05-19 23:40:32 +0200, Tomas Vondra wrote: > On 19.5.2014 22:11, Tom Lane wrote: > > Tomas Vondra <tv@fuzzy.cz> writes: > > I intentionally didn't do that, first because I have only a limited > > amount of confidence in the patch, and second because I don't think > > it matters for anything except CLOBBER_CACHE_RECURSIVELY testing. > > Which surely you're not going to do in the back branches? > > Yes, I'm running this on HEAD only. > > I was however wondering if this might be related to OOM errors a few > local users reported to us. IIRC they've been using temporary tables > quite heavily - not sure if that could be related. I've significant doubts that it is. I suggest you disable memory overcommit - then you'll get nice memory dumps on stderr instead of oom kills. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-05-19 23:40:32 +0200, Tomas Vondra wrote: >> I was however wondering if this might be related to OOM errors a few >> local users reported to us. IIRC they've been using temporary tables >> quite heavily - not sure if that could be related. > I've significant doubts that it is. The main leak (the whole-Relation one) only applies to entries for certain system catalogs. The smaller leak would apply to any relation having DEFAULT column values --- but it's fairly hard to see how any table would get invalidated so many times in one query as to make that leak an issue, unless you were doing CLOBBER_CACHE or similar stress testing. > I suggest you disable memory overcommit - then you'll get nice memory > dumps on stderr instead of oom kills. Yeah, a context stats dump would help a lot here. regards, tom lane
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Andres Freund
Date:
On 2014-05-19 13:45:15 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-05-19 11:25:04 -0400, Tom Lane wrote: > >> No, we'd have two independent entries, each with its own correct refcount. > >> When the refcount on the no-longer-linked-in-the-hashtable entry goes to > >> zero, it'd be leaked, same as it's always been. (The refcount presumably > >> corresponds to someone holding a direct pointer to the Relation struct, > >> which is what they'd use to decrement the refcount.) > > > The problem is that only one of these entries will get properly handled > > by cache invalidation. > > I don't think that's a problem; if the old entry is still referenced, > presumably the holder of the reference is also holding a lock sufficient > to prevent any problematic schema changes. Except if it's during ddl itself... But we'd probably seen problems by now if there aactually probem. > > I thought about it for a while and I wonder if that's necessarily > > correct. If somebody registers a relcache invalidation callback that > > could happen when invalidations are processed somewhere while rebuilding > > a entry? > > Cache invalidation callbacks shouldn't do anything that would involve > opening non-system relations, IMO. I think that's unneccessarily restrictive. There's good reasons to build caches ontop the relcache that also have to visit user defined relations. Imagine the configuration tables of a replication solution for example. Since there's no builtin, integrated, way to add data to relcache entries that's pretty much the only choice. It's probably best to avoid these problems by simply setting an ->invalid flag or something, but I can imagine cases where that's not the best strategy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
From
Noah Misch
Date:
On Mon, May 19, 2014 at 05:08:04PM +0200, Andres Freund wrote: > On 2014-05-18 14:49:10 -0400, Tom Lane wrote: > > While I'm at it: I could not help noticing RememberToFreeTupleDescAtEOX, > > which was not there last time I looked at this code. Isn't that broken > > by design? It's basically a deliberately induced transaction-lifespan > > memory leak, and AFAICS if it does anything at all, it's supporting > > incorrect calling code. There should *never* be any situation where it's > > not okay to free a tupledesc with zero refcount. > > I am not sure I understand that code. The whole keep_tupdesc/keep_rules > logic isn't new and a bit underdocumented as well :(. Seems to come from > 491dd4a9. > If I understand correctly the issue is that callers have accessed the > tupledesc with RelationGetDescr() which doesn't increase the > refcount. Right. See here for background: http://www.postgresql.org/message-id/flat/20130801005351.GA325106@tornado.leadboat.com > > * If we Rebuilt a relcache entry during a transaction then its > > * possible we did that because the TupDesc changed as the result of > > * an ALTER TABLE that ran at less than AccessExclusiveLock. It's > > * possible someone copied that TupDesc, in which case the copy would > > * point to free'd memory. So if we rebuild an entry we keep the > > > > If someone copied the tupdesc, how would freeing the original result > > in the copy being damaged? > > The sentence only seems to make sense with s/copy/accessed without > increasing the refcoung, e.g. with RelationGetDescr,/. Agreed. The hazards arise when copying the rd_att pointer, not when deep-copying the whole TupleDesc. -- Noah Misch EnterpriseDB http://www.enterprisedb.com