Re: Bug in new buffer freelist code - Mailing list pgsql-hackers
From | Jan Wieck |
---|---|
Subject | Re: Bug in new buffer freelist code |
Date | |
Msg-id | 3FE8A88D.10309@Yahoo.com Whole thread Raw |
In response to | Bug in new buffer freelist code (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Bug in new buffer freelist code
Re: Bug in new buffer freelist code |
List | pgsql-hackers |
Tom Lane wrote: > I just had the parallel regression tests hang up due to what appears to > be a bug in the new ARC code. The CLUSTER test gets into an infinite > loop trying to do "CLUSTER clstr_1;". The loop is in > StrategyInvalidateBuffer's check that the buffer is already in the > freelist; it isn't, and the freelist is circular. It seems to me that buffers that are thrown away via StrategyInvalidateBuffer() do not get their relnode and blocknum cleaned out. That causes FlushRelationBuffers() while doing a full scan of the whole buffer pool to find buffers that once contained the block again. If buffer 839 once contained that block, and it was given up that way, and later on buffer 850 contains it, there is a CDB for it. If now FlushRelationBuffers() scans the buffer pool, it will find buffer 839 first and call StrategyInvalidateBuffer() for it. That finds the CDB for buffer 850, and add's buffer 839 to the list again. Later on FlushRB() calls StrategyIB() for buffer 850 and we have the situation at hand. Does that make sense? Jan > > (gdb) bt > #0 0x1fe8a8 in StrategyInvalidateBuffer (buf=0xc3a56f60) at freelist.c:733 > #1 0x1fbf08 in FlushRelationBuffers (rel=0x400fa298, firstDelBlock=0) > at bufmgr.c:1596 > #2 0x1479fc in swap_relfilenodes (r1=143786, r2=143915) at cluster.c:736 > #3 0x147458 in rebuild_relation (OldHeap=0x2322b, indexOid=143788) > at cluster.c:455 > #4 0x1473b0 in cluster_rel (rvtc=0x7b03bed8, recheck=0 '\000') > at cluster.c:395 > #5 0x146ff4 in cluster (stmt=0x400b88a8) at cluster.c:232 > #6 0x21c60c in ProcessUtility (parsetree=0x400b88a8, dest=0x400b88e8, > completionTag=0x7b03bbe8 "") at utility.c:1033 > ... etc ... > > (gdb) p *buf > $5 = {bufNext = -1, data = 7211904, tag = {rnode = {tblNode = 17142, > relNode = 143906}, blockNum = 0}, buf_id = 850, flags = 14, > refcount = 0, io_in_progress_lock = 1721, cntx_lock = 1722, > cntxDirty = 0 '\000', wait_backend_id = 0} > (gdb) p *StrategyControl > $1 = {target_T1_size = 423, listUnusedCDB = 249, listHead = {464, 967, 1692, > 1227}, listTail = {968, 645, 1528, 1694}, listSize = {364, 413, 584, 636}, > listFreeBuffers = 839, num_lookup = 546939, num_hit = {1378, 246896, 282639, > 3935}, stat_report = 0, cdb = {{prev = 386, next = 23, list = 3, > buf_tag = {rnode = {tblNode = 17142, relNode = 19080}, blockNum = 30}, > buf_id = -1, t1_xid = 3402}}} > (gdb) p BufferDescriptors[839] > $2 = {bufNext = 839, data = 7121792, tag = {rnode = {tblNode = 17142, > relNode = 143906}, blockNum = 0}, buf_id = 839, flags = 14, > refcount = 0, io_in_progress_lock = 1699, cntx_lock = 1700, > cntxDirty = 0 '\000', wait_backend_id = 0} > > So we've got a couple of problems here: buffers 839 and 850 both claim > to contain block 0 of rel 143906 (which is clstr_1), and the freelist > is circular. > > This doesn't seem to be super reproducible, but there's definitely a > problem in there somewhere. > > regards, tom lane -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
pgsql-hackers by date: