Thread: Performance problem partially identified
I've been looking into Mergl's "update" performance problem. With current sources, on a sequential-scan update of about 10,000 out of 1,000,000 records, I observe 33712 read() calls and 34107 write() calls. The table occupies 33334 disk blocks, so the number of reads looks about right -- but the number of writes is at least a factor of 3 higher than it should be! It looks to me like something is broken such that bufmgr.c *always* thinks that a buffer is dirty (and needs written out) when it is released. Poking around for the cause, I find that heapgettup() calls SetBufferCommitInfoNeedsSave() for every single tuple read from the table: 7.14 42.15 1000055/1000055 heap_getnext [9] [10] 18.8 7.14 42.15 1000055 heapgettup [10] 1.53 30.10 1000020/1000020 HeapTupleSatisfiesSnapshot[11] 1.68 3.27 1000055/1000055 RelationGetBufferWithBuffer [50] 4.31 0.00 2066832/4129472 LockBuffer [45] 0.25 0.56 33361/33698 ReleaseAndReadBuffer[76] 0.44 0.00 1000000/1000000 SetBufferCommitInfoNeedsSave [92] 0.01 0.00 33371/33371 nextpage [240] 0.00 0.00 10/2033992 ReleaseBuffer [46] 0.00 0.00 45/201 HeapTupleSatisfiesNow [647] 0.00 0.00 5/9 nocachegetattr[730] This could only be from the call to SetBufferCommitInfoNeedsSave in the HeapTupleSatisfies macro. If I'm reading the code correctly, that means that HeapTupleSatisfiesSnapshot() always changes the t_infomask field of the tuple. I don't understand this code well enough to fix it, but I assert that it's broken. Most of these tuples are *not* being modified, and there is no reason to have to rewrite the buffer. regards, tom lane
Tom Lane wrote: > > It looks to me like something is broken such that bufmgr.c *always* > thinks that a buffer is dirty (and needs written out) when it is > released. That could also explain why the performance increases quite noticeably even for _select_ queries when you specify "no fsync" for backend. (I have'nt checked it lately, but it was the case about a year ago) > Poking around for the cause, I find that heapgettup() calls > SetBufferCommitInfoNeedsSave() for every single tuple read from the > table: ... > I don't understand this code well enough to fix it, but I assert that > it's broken. More likely this is a "quick fix - will look at it later" for something else, praobably an execution path that fails to call SetBufferCommitInfoNeedsSave() when needed. Or it can be just code to check if this fixes it which has been forgotten in. > Most of these tuples are *not* being modified, and there > is no reason to have to rewrite the buffer. It can be quite a lot of work to find all the places that can modify the tuple, even with some special tools. Is there any tool that can report writes to areas set read only (similar to malloc/free debuggers ?) If there is then we can replace SetBufferCommitInfoNeedsSave() with a macro that does both SetBufferCommitInfoNeedsSave() and allows writing, so that we can automate the checking. ----------------- Hannu
Further info: I think this behavior may be state-dependent. The first update after loading the table rewrites all the blocks, but subsequent ones do not. Is it just a matter of marking recently-written tuples as confirmed good? If so, then there is not a problem (or at least, not as big a problem as I thought). regards, tom lane
Tom Lane wrote: > > Poking around for the cause, I find that heapgettup() calls > SetBufferCommitInfoNeedsSave() for every single tuple read from the > table: > > 7.14 42.15 1000055/1000055 heap_getnext [9] > [10] 18.8 7.14 42.15 1000055 heapgettup [10] > 1.53 30.10 1000020/1000020 HeapTupleSatisfiesSnapshot [11] > 1.68 3.27 1000055/1000055 RelationGetBufferWithBuffer [50] > 4.31 0.00 2066832/4129472 LockBuffer [45] > 0.25 0.56 33361/33698 ReleaseAndReadBuffer [76] > 0.44 0.00 1000000/1000000 SetBufferCommitInfoNeedsSave [92] > 0.01 0.00 33371/33371 nextpage [240] > 0.00 0.00 10/2033992 ReleaseBuffer [46] > 0.00 0.00 45/201 HeapTupleSatisfiesNow [647] > 0.00 0.00 5/9 nocachegetattr [730] > > This could only be from the call to SetBufferCommitInfoNeedsSave in > the HeapTupleSatisfies macro. If I'm reading the code correctly, > that means that HeapTupleSatisfiesSnapshot() always changes the ^^^^^^^^^^^^^^ > t_infomask field of the tuple. Not always but only if HEAP_XMIN_COMMITTED/HEAP_XMAX_COMMITTED are not setted. Run vacuum before update and SetBufferCommitInfoNeedSave will not be called. This func is just to avoid pg_log lookup without vacuum. Vadim
Hannu Krosing wrote: > > Tom Lane wrote: > > > > It looks to me like something is broken such that bufmgr.c *always* > > thinks that a buffer is dirty (and needs written out) when it is > > released. > > That could also explain why the performance increases quite noticeably > even for _select_ queries when you specify "no fsync" for backend. > (I have'nt checked it lately, but it was the case about a year ago) Enev selects try to update t_infomask to avoid pg_log lookup for other queries. Vadim
Tom Lane wrote: > > Further info: > > I think this behavior may be state-dependent. The first update after > loading the table rewrites all the blocks, but subsequent ones do not. > Is it just a matter of marking recently-written tuples as confirmed > good? If so, then there is not a problem (or at least, not as big > a problem as I thought). Exactly! One have to run vacuum after loading a big amount of data. Vadim
> > This could only be from the call to SetBufferCommitInfoNeedsSave in > > the HeapTupleSatisfies macro. If I'm reading the code correctly, > > that means that HeapTupleSatisfiesSnapshot() always changes the > ^^^^^^^^^^^^^^ > > t_infomask field of the tuple. > > Not always but only if HEAP_XMIN_COMMITTED/HEAP_XMAX_COMMITTED > are not setted. Run vacuum before update and SetBufferCommitInfoNeedSave > will not be called. This func is just to avoid pg_log lookup > without vacuum. > Yes, we store the transaction status on first in the tuple so we don't have to look it up. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026