Thread: Questions on extending a relation
Hi Hackers, Here I have two questions related to extending a relation: 1. When we want a new page, we will do something like this: LockPage(relation, 0, ExclusiveLock); blockNum = smgrnblocks(reln->rd_smgr); /* Try to locate this blockNum in buffer pool, but definitely can't? */ smgrextend(blockNum); LockPage(relation, 0, ExclusiveLock); So if I have concurrently 10 backends reach here, we will have 10 new pages? Suppose they all insert one new tuple, commit, then quit. Next time when they connected again, they only reuse the last page, so we almost lost 9 pages? 2. Suppose an insert on a relation with index is performed in this sequence: begin transation; extend relation for a new page A; insert a heap tuple T on page A; insert an index tuple I on another page B; page B get written out by bgwriter; System crashed. System recovered. At this time, page A is empty since we won't replay xlog. Now we insert a heap tuple on page A again, which will use the same slot of the tuple T. So now the index tuple I points to T? Thanks a lot, Qingqing
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes: > 1. When we want a new page, we will do something like this: > LockPage(relation, 0, ExclusiveLock); > blockNum = smgrnblocks(reln->rd_smgr); > /* Try to locate this blockNum in buffer pool, but definitely can't? */ > smgrextend(blockNum); > LockPage(relation, 0, ExclusiveLock); You should be using ReadBuffer with P_NEW, not calling smgr yourself. > So if I have concurrently 10 backends reach here, we will have 10 new pages? Yes. That's intentional --- otherwise they'd all block each other. > 2. Suppose an insert on a relation with index is performed in this sequence: > begin transation; > extend relation for a new page A; > insert a heap tuple T on page A; > insert an index tuple I on another page B; > page B get written out by bgwriter; > System crashed. > System recovered. > At this time, page A is empty since we won't replay xlog. Why wouldn't we replay xlog? Note in particular that the bgwriter is not allowed to push page B to disk until the xlog entry describing the index change has been flushed to disk. Since that will come after the xlog entry about the heap change, both changes are necessarily on-disk in the xlog, and both will be remade during replay. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes > > Yes. That's intentional --- otherwise they'd all block each other. > So if I saw the last two pages on a disk relation are half full, that's nothing wrong? > > Why wouldn't we replay xlog? Note in particular that the bgwriter is > not allowed to push page B to disk until the xlog entry describing the > index change has been flushed to disk. Since that will come after the > xlog entry about the heap change, both changes are necessarily on-disk > in the xlog, and both will be remade during replay. > Yes, I made a mistake. We reply xlog in any ways (no matter the transaction commits or not). Thanks, Qingqing