Thread: Q about heap_getattr

Q about heap_getattr

From
Tom Lane
Date:
I've been doing some more backend profiling, and observe that in a large
SELECT from a table with lots of columns, nocachegetattr (the guts of
heap_getattr) is at the top of the list, accounting for about 15% of
runtime.

The percentage would be lower in a table with fewer columns or no null
columns, but it still seems worth working on.  (Besides, this case right
here is a real-world case for me.)

What's drawing my eye is that printtup() is calling heap_getattr twice
for each attribute of each tuple --- once in the first scan that
prepares the null-fields bitmap, and then again to actually output the
field value.  So, what I want to do is call heap_getattr only once per
attribute and save the returned value for use in the second loop.
That should halve the time spent in nocachegetattr and thus knock
7 or so percent off the runtime of SELECT.

The question for the list: how long is the Datum value returned by
heap_getattr valid?  In particular, could it be invalidated by calling
heap_getattr for another field of the same tuple?  If there are any
cases like that, then this optimization won't work.  I don't know the
backend well enough to guess whether this is safe.
        regards, tom lane


Re: [HACKERS] Q about heap_getattr

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> I've been doing some more backend profiling, and observe that in a large
> SELECT from a table with lots of columns, nocachegetattr (the guts of
> heap_getattr) is at the top of the list, accounting for about 15% of
> runtime.
> 
> The percentage would be lower in a table with fewer columns or no null
> columns, but it still seems worth working on.  (Besides, this case right
> here is a real-world case for me.)
> 
> What's drawing my eye is that printtup() is calling heap_getattr twice
> for each attribute of each tuple --- once in the first scan that
> prepares the null-fields bitmap, and then again to actually output the
> field value.  So, what I want to do is call heap_getattr only once per
> attribute and save the returned value for use in the second loop.
> That should halve the time spent in nocachegetattr and thus knock
> 7 or so percent off the runtime of SELECT.

Try to use heap_attisnull in first scan!
This func just tests nulls bitmap array of tuple...

Vadim
P.S. Tom, I forgot to attach new allocation code in my prev letter,
but now I want to reimplement them.


Re: [HACKERS] Q about heap_getattr

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
> Tom Lane wrote:
>> So, what I want to do is call heap_getattr only once per
>> attribute and save the returned value for use in the second loop.

> Try to use heap_attisnull in first scan!

Ah, that looks like a much better idea.  Consider it done...
    thanks, tom lane


Re: [HACKERS] Q about heap_getattr

From
Bruce Momjian
Date:
> Tom Lane wrote:
> > 
> > I've been doing some more backend profiling, and observe that in a large
> > SELECT from a table with lots of columns, nocachegetattr (the guts of
> > heap_getattr) is at the top of the list, accounting for about 15% of
> > runtime.
> > 
> > The percentage would be lower in a table with fewer columns or no null
> > columns, but it still seems worth working on.  (Besides, this case right
> > here is a real-world case for me.)
> > 
> > What's drawing my eye is that printtup() is calling heap_getattr twice
> > for each attribute of each tuple --- once in the first scan that
> > prepares the null-fields bitmap, and then again to actually output the
> > field value.  So, what I want to do is call heap_getattr only once per
> > attribute and save the returned value for use in the second loop.
> > That should halve the time spent in nocachegetattr and thus knock
> > 7 or so percent off the runtime of SELECT.
> 
> Try to use heap_attisnull in first scan!
> This func just tests nulls bitmap array of tuple...
> 
> Vadim
> P.S. Tom, I forgot to attach new allocation code in my prev letter,
> but now I want to reimplement them.
> 
> 

Good idea.  Hadn't thought of that.  To answer Tom's question, it
doesn't matter how many times you call heap_getattr().  You can cache
the values, as long as the tuple doesn't change.

nocachegetattr() computes all offsets, even offsets after the column you
are requesting, to prevent future calls.  You must have nulls or
varlena's that is causing nocachegetattr to be called so many times.
Is this true? 

heap_getattr() certainly is called many times, and needs any
optimization we can give it.  I have done as much as I could.  Perhaps
there are more opportunities I missed.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Q about heap_getattr

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> Tom Lane wrote:
>>>> I've been doing some more backend profiling, and observe that in a large
>>>> SELECT from a table with lots of columns, nocachegetattr (the guts of
>>>> heap_getattr) is at the top of the list, accounting for about 15% of
>>>> runtime.
>>>> 
>>>> The percentage would be lower in a table with fewer columns or no null
>>>> columns, but it still seems worth working on.  (Besides, this case right
>>>> here is a real-world case for me.)

> nocachegetattr() computes all offsets, even offsets after the column you
> are requesting, to prevent future calls.  You must have nulls or
> varlena's that is causing nocachegetattr to be called so many times.
> Is this true? 

Right, this table has 38 columns, many of which can be NULL and several
of which are variable-size.  So it's probably the worst-case scenario as
far as the cost of nocachegetattr is concerned.  It looked to me like
the pre-computation aspect of nocachegetattr only works for tables where
all the tuples have the same physical layout, ie, no varlenas or nulls;
is that right?

> heap_getattr() certainly is called many times, and needs any
> optimization we can give it.  I have done as much as I could.  Perhaps
> there are more opportunities I missed.

I thought I had spotted a couple of possibilities for small improvements
of the code inside nocachegetattr, but it was awfully late by then so
I didn't try changing anything.  I'll take another look.
        regards, tom lane


Re: [HACKERS] Q about heap_getattr

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >> Tom Lane wrote:
> >>>> I've been doing some more backend profiling, and observe that in a large
> >>>> SELECT from a table with lots of columns, nocachegetattr (the guts of
> >>>> heap_getattr) is at the top of the list, accounting for about 15% of
> >>>> runtime.
> >>>> 
> >>>> The percentage would be lower in a table with fewer columns or no null
> >>>> columns, but it still seems worth working on.  (Besides, this case right
> >>>> here is a real-world case for me.)
> 
> > nocachegetattr() computes all offsets, even offsets after the column you
> > are requesting, to prevent future calls.  You must have nulls or
> > varlena's that is causing nocachegetattr to be called so many times.
> > Is this true? 
> 
> Right, this table has 38 columns, many of which can be NULL and several
> of which are variable-size.  So it's probably the worst-case scenario as
> far as the cost of nocachegetattr is concerned.  It looked to me like
> the pre-computation aspect of nocachegetattr only works for tables where
> all the tuples have the same physical layout, ie, no varlenas or nulls;
> is that right?
> 
> > heap_getattr() certainly is called many times, and needs any
> > optimization we can give it.  I have done as much as I could.  Perhaps
> > there are more opportunities I missed.
> 
> I thought I had spotted a couple of possibilities for small improvements
> of the code inside nocachegetattr, but it was awfully late by then so
> I didn't try changing anything.  I'll take another look.

Also, I see a few places where heap_getattr is called, just looking for
a null.  You can use mkid(see developers faq) to find them.  If you
don't modify them, I can.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026