Thread: [HACKERS] check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY
[HACKERS] check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY
From
Andrew Dunstan
Date:
I have been setting up a buildfarm member with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY, settings which Alvaro suggested to me.I got core dumps with these stack traces. The platform is Amazon Linux. ================== stack trace: pgsql.build/src/test/regress/tmp_check/data/core.4149 ================== [New LWP 4149] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `postgres: ec2-user regression [local] VACUUM '. Program terminated with signal 11, Segmentation fault. #0 0x00000000005916bf in rebuild_relation (verbose=0 '\000', indexOid=0, OldHeap=0x1dd7ae0) at cluster.c:576 576 OIDNewHeap = make_new_heap(tableOid, tableSpace, #0 0x00000000005916bf in rebuild_relation (verbose=0 '\000', indexOid=0, OldHeap=0x1dd7ae0) at cluster.c:576 #1 cluster_rel (tableOid=tableOid@entry=28441, indexOid=indexOid@entry=0, recheck=recheck@entry=0 '\000', verbose=verbose@entry=0 '\000') at cluster.c:404 #2 0x00000000005ef228 in vacuum_rel (relid=relid@entry=28441, relation=relation@entry=0x1dab408, options=options@entry=17, params=params@entry=0x7ffdd87d72a0) at vacuum.c:1441 #3 0x00000000005f0542 in vacuum (options=17, relation=0x1dab408, relid=relid@entry=0, params=params@entry=0x7ffdd87d72a0, va_cols=0x0, bstrategy=<optimized out>, bstrategy@entry=0x0, isTopLevel=1 '\001') at vacuum.c:304 #4 0x00000000005f093e in ExecVacuum (vacstmt=vacstmt@entry=0x1dab460, isTopLevel=isTopLevel@entry=1 '\001') at vacuum.c:122 #5 0x0000000000728925 in standard_ProcessUtility (pstmt=0x1dab7c0, queryString=0x1daa9a8 "VACUUM FULL concur_heap;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x1dab8b8, completionTag=0x7ffdd87d76a0 "") at utility.c:670 #6 0x0000000000725d82 in PortalRunUtility (portal=0x1d48a68, pstmt=0x1dab7c0, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=<optimized out>, completionTag=0x7ffdd87d76a0 "") at pquery.c:1165 #7 0x0000000000726819 in PortalRunMulti (portal=portal@entry=0x1d48a68, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x1dab8b8, altdest=altdest@entry=0x1dab8b8, completionTag=completionTag@entry=0x7ffdd87d76a0 "") at pquery.c:1315 #8 0x0000000000727488 in PortalRun (portal=portal@entry=0x1d48a68, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x1dab8b8, altdest=altdest@entry=0x1dab8b8, completionTag=completionTag@entry=0x7ffdd87d76a0 "") at pquery.c:788 #9 0x000000000072500a in exec_simple_query (query_string=0x1daa9a8 "VACUUM FULL concur_heap;") at postgres.c:1101 #10 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1d561e0, dbname=0x1d55f30 "regression", username=<optimized out>) at postgres.c:4066 #11 0x00000000004765b4 in BackendRun (port=0x1d51420) at postmaster.c:4317 #12 BackendStartup (port=0x1d51420) at postmaster.c:3989 #13 ServerLoop () at postmaster.c:1729 #14 0x00000000006b9a0a in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x1d2a260) at postmaster.c:1337 #15 0x00000000004775c2 in main (argc=8, argv=0x1d2a260) at main.c:228 ================== stack trace: pgsql.build/src/test/regress/tmp_check/data/core.4180 ================== [New LWP 4180] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `postgres: ec2-user regression [local] VACUUM '. Program terminated with signal 11, Segmentation fault. #0 0x00000000005916bf in rebuild_relation (verbose=0 '\000', indexOid=0, OldHeap=0x7f460d159930) at cluster.c:576 576 OIDNewHeap = make_new_heap(tableOid, tableSpace, #0 0x00000000005916bf in rebuild_relation (verbose=0 '\000', indexOid=0, OldHeap=0x7f460d159930) at cluster.c:576 #1 cluster_rel (tableOid=tableOid@entry=28479, indexOid=indexOid@entry=0, recheck=recheck@entry=0 '\000', verbose=verbose@entry=0 '\000') at cluster.c:404 #2 0x00000000005ef228 in vacuum_rel (relid=relid@entry=28479, relation=relation@entry=0x1dab400, options=options@entry=17, params=params@entry=0x7ffdd87d72a0) at vacuum.c:1441 #3 0x00000000005f0542 in vacuum (options=17, relation=0x1dab400, relid=relid@entry=0, params=params@entry=0x7ffdd87d72a0, va_cols=0x0, bstrategy=<optimized out>, bstrategy@entry=0x0, isTopLevel=1 '\001') at vacuum.c:304 #4 0x00000000005f093e in ExecVacuum (vacstmt=vacstmt@entry=0x1dab458, isTopLevel=isTopLevel@entry=1 '\001') at vacuum.c:122 #5 0x0000000000728925 in standard_ProcessUtility (pstmt=0x1dab7b8, queryString=0x1daa9a8 "VACUUM FULL vactst;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x1dab8b0, completionTag=0x7ffdd87d76a0 "") at utility.c:670 #6 0x0000000000725d82 in PortalRunUtility (portal=0x1d48a68, pstmt=0x1dab7b8, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=<optimized out>, completionTag=0x7ffdd87d76a0 "") at pquery.c:1165 #7 0x0000000000726819 in PortalRunMulti (portal=portal@entry=0x1d48a68, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x1dab8b0, altdest=altdest@entry=0x1dab8b0, completionTag=completionTag@entry=0x7ffdd87d76a0 "") at pquery.c:1315 #8 0x0000000000727488 in PortalRun (portal=portal@entry=0x1d48a68, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x1dab8b0, altdest=altdest@entry=0x1dab8b0, completionTag=completionTag@entry=0x7ffdd87d76a0 "") at pquery.c:788 #9 0x000000000072500a in exec_simple_query (query_string=0x1daa9a8 "VACUUM FULL vactst;") at postgres.c:1101 #10 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1d561e0, dbname=0x1d55f30 "regression", username=<optimized out>) at postgres.c:4066 #11 0x00000000004765b4 in BackendRun (port=0x1d51420) at postmaster.c:4317 #12 BackendStartup (port=0x1d51420) at postmaster.c:3989 #13 ServerLoop () at postmaster.c:1729 #14 0x00000000006b9a0a in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x1d2a260) at postmaster.c:1337 #15 0x00000000004775c2 in main (argc=8, argv=0x1d2a260) at main.c:228 cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
[HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE-DCLOBBER_FREED_MEMORY
From
Andrew Dunstan
Date:
On 03/03/2017 02:24 PM, Andrew Dunstan wrote: > I have been setting up a buildfarm member with -DRELCACHE_FORCE_RELEASE > -DCLOBBER_FREED_MEMORY, settings which Alvaro suggested to me.I got core > dumps with these stack traces. The platform is Amazon Linux. > I have replicated this on a couple of other platforms (Fedora, FreeBSD) and back to 9.5. The same failure doesn't happen with buildfarm runs on earlier branches, although possibly they don't have the same set of tests. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY
From
Tom Lane
Date:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 03/03/2017 02:24 PM, Andrew Dunstan wrote: >> I have been setting up a buildfarm member with -DRELCACHE_FORCE_RELEASE >> -DCLOBBER_FREED_MEMORY, settings which Alvaro suggested to me.I got core >> dumps with these stack traces. The platform is Amazon Linux. > I have replicated this on a couple of other platforms (Fedora, FreeBSD) > and back to 9.5. The same failure doesn't happen with buildfarm runs on > earlier branches, although possibly they don't have the same set of tests. well, the problem in rebuild_relation() seems pretty blatant: /* Close relcache entry, but keep lock until transaction commit */ heap_close(OldHeap, NoLock); /* Create the transient table that will receive the re-ordered data */ OIDNewHeap = make_new_heap(tableOid, tableSpace, OldHeap->rd_rel->relpersistence, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AccessExclusiveLock); There are two such references after the heap_close. I don't know that those are the only bugs, but this reference is certainly the proximate cause of the crash I'm seeing. Will push a fix in a little bit. regards, tom lane
Re: [HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY
From
Tom Lane
Date:
I wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 03/03/2017 02:24 PM, Andrew Dunstan wrote: >>> I have been setting up a buildfarm member with -DRELCACHE_FORCE_RELEASE >>> -DCLOBBER_FREED_MEMORY, settings which Alvaro suggested to me.I got core >>> dumps with these stack traces. The platform is Amazon Linux. >> I have replicated this on a couple of other platforms (Fedora, FreeBSD) >> and back to 9.5. The same failure doesn't happen with buildfarm runs on >> earlier branches, although possibly they don't have the same set of tests. > well, the problem in rebuild_relation() seems pretty blatant: I fixed that, and the basic regression tests no longer crash outright with these settings, but I do see half a dozen errors that all seem to be in RLS-related tests. They all look like something is trying to access an already-closed relcache entry, much like the problem in rebuild_relation(). But I have no time to look closer for the next several days. Stephen, I think this is your turf anyway. regards, tom lane
Re: [HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE -DCLOBBER_FREED_MEMORY
From
Tom Lane
Date:
I wrote: > I fixed that, and the basic regression tests no longer crash outright with > these settings, but I do see half a dozen errors that all seem to be in > RLS-related tests. Those turned out to all be the same bug in DoCopy. "make check-world" passes for me now with -DRELCACHE_FORCE_RELEASE, but I've only tried HEAD not the back branches. regards, tom lane
Re: [HACKERS] Re: check failure with -DRELCACHE_FORCE_RELEASE-DCLOBBER_FREED_MEMORY
From
Andrew Dunstan
Date:
On 03/06/2017 05:14 PM, Tom Lane wrote: > I wrote: >> I fixed that, and the basic regression tests no longer crash outright with >> these settings, but I do see half a dozen errors that all seem to be in >> RLS-related tests. > Those turned out to all be the same bug in DoCopy. "make check-world" > passes for me now with -DRELCACHE_FORCE_RELEASE, but I've only tried > HEAD not the back branches. > > I have tied the back branches. They are good. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services