Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint - Mailing list pgsql-hackers
From | Ashutosh Sharma |
---|---|
Subject | Re: [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint |
Date | |
Msg-id | CAE9k0P=V2LhtyeMXd295fhisp=NWUhRVJ9EZQCDowWiY9rSohQ@mail.gmail.com Whole thread Raw |
In response to | [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint (Andreas Seltenreich <seltenreich@gmx.de>) |
List | pgsql-hackers |
Hi, > testing with master as of cf366e97ff, sqlsmith occasionally triggers the > following assertion: > > TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) (&(bufHdr)->content_lock))))", File: "bufmgr.c", Line: 3397) > > Backtraces always look like the one below. It is reproducible on a > cluster once it happens. I could provide a tarball if needed. > > regards, > Andreas > > #2 0x00000000008324b1 in ExceptionalCondition (conditionName=conditionName@entry=0x9e4e28 "!(LWLockHeldByMe(((LWLock*)(&(bufHdr)->content_lock))))", errorType=errorType@entry=0x87b03d "FailedAssertion", fileName=fileName@entry=0x9e5856"bufmgr.c", lineNumber=lineNumber@entry=3397) at assert.c:54 > #3 0x0000000000706971 in MarkBufferDirtyHint (buffer=2844, buffer_std=buffer_std@entry=1 '\001') at bufmgr.c:3397 > #4 0x00000000004b3ecd in _hash_kill_items (scan=scan@entry=0x66dcf70) at hashutil.c:514 > #5 0x00000000004a9c1b in hashendscan (scan=0x66dcf70) at hash.c:512 > #6 0x00000000004cf17a in index_endscan (scan=0x66dcf70) at indexam.c:353 > #7 0x000000000061fa51 in ExecEndIndexScan (node=0x3093f30) at nodeIndexscan.c:852 > #8 0x0000000000608e59 in ExecEndNode (node=<optimized out>) at execProcnode.c:715 > #9 0x00000000006045b8 in ExecEndPlan (estate=0x3064000, planstate=<optimized out>) at execMain.c:1540 > #10 standard_ExecutorEnd (queryDesc=0x30cb880) at execMain.c:487 > #11 0x00000000005c87b0 in PortalCleanup (portal=0x1a60060) at portalcmds.c:302 > #12 0x000000000085cbb3 in PortalDrop (portal=0x1a60060, isTopCommit=<optimized out>) at portalmem.c:489 > #13 0x0000000000736ed2 in exec_simple_query (query_string=0x315b7a0 "...") at postgres.c:1111 > #14 0x0000000000738b51 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x1a6c6c8, dbname=<optimized out>, username=<optimizedout>) at postgres.c:4071 > #15 0x0000000000475fef in BackendRun (port=0x1a65b90) at postmaster.c:4317 > #16 BackendStartup (port=0x1a65b90) at postmaster.c:3989 > #17 ServerLoop () at postmaster.c:1729 > #18 0x00000000006c8662 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x1a3f540) at postmaster.c:1337 > #19 0x000000000047729d in main (argc=4, argv=0x1a3f540) at main.c:228 > Hi, Thanks for reporting this problem. Could you please let me know on for how long did you run sqlsmith to get this crash. However, I have found the reason for this crash. This is basically happening when trying to retrieve the tuples using cursor. Basically the current hash index scan work tuple-at-a-time which means once it finds tuple on page, it releases lock from the page but keeps pin on it and finally returns the tuple. When the requested number of tuples are processed there is no lock on the page that was being scanned but yes there is a pin on it. Finally, when trying to close a cursor at the end of scan, if any killed tuples has been identified we try to first mark these items as dead with the help of _hash_kill_items(). But, since we only have pin on this page, the assert check 'LWLockHeldByMe()' fails. When scanning tuples using normal SELECT * statement, before moving to next page in a bucket we first deal with all the killed items but we do this without releasing lock and pin on the current page. Hence, with SELECT queries this crash is not visible. The attached patch fixes this. But, please note that all these changes will get removed with the patch for page scan mode - [1]. [1] - https://www.postgresql.org/message-id/CA%2BTgmobYTvexcjqMhXoNCyEUHChzmdC_2xVGgj7eqaYVgoJA%2Bg%40mail.gmail.com -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
Attachment
pgsql-hackers by date: