Thread: Q about heap_getattr
I've been doing some more backend profiling, and observe that in a large SELECT from a table with lots of columns, nocachegetattr (the guts of heap_getattr) is at the top of the list, accounting for about 15% of runtime. The percentage would be lower in a table with fewer columns or no null columns, but it still seems worth working on. (Besides, this case right here is a real-world case for me.) What's drawing my eye is that printtup() is calling heap_getattr twice for each attribute of each tuple --- once in the first scan that prepares the null-fields bitmap, and then again to actually output the field value. So, what I want to do is call heap_getattr only once per attribute and save the returned value for use in the second loop. That should halve the time spent in nocachegetattr and thus knock 7 or so percent off the runtime of SELECT. The question for the list: how long is the Datum value returned by heap_getattr valid? In particular, could it be invalidated by calling heap_getattr for another field of the same tuple? If there are any cases like that, then this optimization won't work. I don't know the backend well enough to guess whether this is safe. regards, tom lane
Tom Lane wrote: > > I've been doing some more backend profiling, and observe that in a large > SELECT from a table with lots of columns, nocachegetattr (the guts of > heap_getattr) is at the top of the list, accounting for about 15% of > runtime. > > The percentage would be lower in a table with fewer columns or no null > columns, but it still seems worth working on. (Besides, this case right > here is a real-world case for me.) > > What's drawing my eye is that printtup() is calling heap_getattr twice > for each attribute of each tuple --- once in the first scan that > prepares the null-fields bitmap, and then again to actually output the > field value. So, what I want to do is call heap_getattr only once per > attribute and save the returned value for use in the second loop. > That should halve the time spent in nocachegetattr and thus knock > 7 or so percent off the runtime of SELECT. Try to use heap_attisnull in first scan! This func just tests nulls bitmap array of tuple... Vadim P.S. Tom, I forgot to attach new allocation code in my prev letter, but now I want to reimplement them.
Vadim Mikheev <vadim@krs.ru> writes: > Tom Lane wrote: >> So, what I want to do is call heap_getattr only once per >> attribute and save the returned value for use in the second loop. > Try to use heap_attisnull in first scan! Ah, that looks like a much better idea. Consider it done... thanks, tom lane
> Tom Lane wrote: > > > > I've been doing some more backend profiling, and observe that in a large > > SELECT from a table with lots of columns, nocachegetattr (the guts of > > heap_getattr) is at the top of the list, accounting for about 15% of > > runtime. > > > > The percentage would be lower in a table with fewer columns or no null > > columns, but it still seems worth working on. (Besides, this case right > > here is a real-world case for me.) > > > > What's drawing my eye is that printtup() is calling heap_getattr twice > > for each attribute of each tuple --- once in the first scan that > > prepares the null-fields bitmap, and then again to actually output the > > field value. So, what I want to do is call heap_getattr only once per > > attribute and save the returned value for use in the second loop. > > That should halve the time spent in nocachegetattr and thus knock > > 7 or so percent off the runtime of SELECT. > > Try to use heap_attisnull in first scan! > This func just tests nulls bitmap array of tuple... > > Vadim > P.S. Tom, I forgot to attach new allocation code in my prev letter, > but now I want to reimplement them. > > Good idea. Hadn't thought of that. To answer Tom's question, it doesn't matter how many times you call heap_getattr(). You can cache the values, as long as the tuple doesn't change. nocachegetattr() computes all offsets, even offsets after the column you are requesting, to prevent future calls. You must have nulls or varlena's that is causing nocachegetattr to be called so many times. Is this true? heap_getattr() certainly is called many times, and needs any optimization we can give it. I have done as much as I could. Perhaps there are more opportunities I missed. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> Tom Lane wrote: >>>> I've been doing some more backend profiling, and observe that in a large >>>> SELECT from a table with lots of columns, nocachegetattr (the guts of >>>> heap_getattr) is at the top of the list, accounting for about 15% of >>>> runtime. >>>> >>>> The percentage would be lower in a table with fewer columns or no null >>>> columns, but it still seems worth working on. (Besides, this case right >>>> here is a real-world case for me.) > nocachegetattr() computes all offsets, even offsets after the column you > are requesting, to prevent future calls. You must have nulls or > varlena's that is causing nocachegetattr to be called so many times. > Is this true? Right, this table has 38 columns, many of which can be NULL and several of which are variable-size. So it's probably the worst-case scenario as far as the cost of nocachegetattr is concerned. It looked to me like the pre-computation aspect of nocachegetattr only works for tables where all the tuples have the same physical layout, ie, no varlenas or nulls; is that right? > heap_getattr() certainly is called many times, and needs any > optimization we can give it. I have done as much as I could. Perhaps > there are more opportunities I missed. I thought I had spotted a couple of possibilities for small improvements of the code inside nocachegetattr, but it was awfully late by then so I didn't try changing anything. I'll take another look. regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> Tom Lane wrote: > >>>> I've been doing some more backend profiling, and observe that in a large > >>>> SELECT from a table with lots of columns, nocachegetattr (the guts of > >>>> heap_getattr) is at the top of the list, accounting for about 15% of > >>>> runtime. > >>>> > >>>> The percentage would be lower in a table with fewer columns or no null > >>>> columns, but it still seems worth working on. (Besides, this case right > >>>> here is a real-world case for me.) > > > nocachegetattr() computes all offsets, even offsets after the column you > > are requesting, to prevent future calls. You must have nulls or > > varlena's that is causing nocachegetattr to be called so many times. > > Is this true? > > Right, this table has 38 columns, many of which can be NULL and several > of which are variable-size. So it's probably the worst-case scenario as > far as the cost of nocachegetattr is concerned. It looked to me like > the pre-computation aspect of nocachegetattr only works for tables where > all the tuples have the same physical layout, ie, no varlenas or nulls; > is that right? > > > heap_getattr() certainly is called many times, and needs any > > optimization we can give it. I have done as much as I could. Perhaps > > there are more opportunities I missed. > > I thought I had spotted a couple of possibilities for small improvements > of the code inside nocachegetattr, but it was awfully late by then so > I didn't try changing anything. I'll take another look. Also, I see a few places where heap_getattr is called, just looking for a null. You can use mkid(see developers faq) to find them. If you don't modify them, I can. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026