Thread: [HACKERS] Does having a NULL column automatically exclude the table from thetupleDesc cache?

Hi all,

I was looking through some of the implementation details of the heap/tuples, specifically src/include/access/htup_details.h - and I came across the big macro fastgetattr, and had a question about it.  I've included the code here for clarity and convenience:

    #define fastgetattr(tup, attnum, tupleDesc, isnull)                    \
    (                                                                    \
        AssertMacro((attnum) > 0),                                        \
        (*(isnull) = false),                                            \
        HeapTupleNoNulls(tup) ?                                            \
        (                                                                \
            (tupleDesc)->attrs[(attnum)-1]->attcacheoff >= 0 ?            \
            (                                                            \
                fetchatt((tupleDesc)->attrs[(attnum)-1],                \
                    (char *) (tup)->t_data + (tup)->t_data->t_hoff +    \
                        (tupleDesc)->attrs[(attnum)-1]->attcacheoff)    \
            )                                                            \
            :                                                            \
                nocachegetattr((tup), (attnum), (tupleDesc))            \
        )                                                                \
        :                                                                \
        (                                                                \
            att_isnull((attnum)-1, (tup)->t_data->t_bits) ?                \
            (                                                            \
                (*(isnull) = true),                                        \
                (Datum)NULL                                                \
            )                                                            \
            :                                                            \
            (                                                            \
                nocachegetattr((tup), (attnum), (tupleDesc))            \
            )                                                            \
        )                                                                \
    )


My question is this:  HeapTupleNoNulls() is run first to see if there are any columns that can be NULL.  It looks like the fetchatt() that uses the cache in the tupleDesc can only be used if there are no NULLable columns in the table.  Is my understanding correct?  Does this mean that there is significant performance gain by never allowing any column to be null in your table?

Thanks!
Ryan

Ryan Murphy <ryanfmurphy@gmail.com> writes:
> My question is this:  HeapTupleNoNulls() is run first to see if there are
> any columns that can be NULL.  It looks like the fetchatt() that uses the
> cache in the tupleDesc can only be used if there are no NULLable columns in
> the table.  Is my understanding correct?

No, that tests whether the particular tuple contains any null fields, not
whether the whole table is declared NOT NULL.
        regards, tom lane




No, that tests whether the particular tuple contains any null fields, not
whether the whole table is declared NOT NULL.

                        regards, tom lane

Ok, thanks, that makes sense.

My question kind of remains though - does that mean that having any nulls in the tuple loses the ability to use the tupleDesc cache? and how much of a performance impact is this?  I'm sure there's a good reason why you can't really use the cache if you have a null column, just was curious of the implications.  Thanks again!
Ryan Murphy wrote:

> My question kind of remains though - does that mean that having any nulls
> in the tuple loses the ability to use the tupleDesc cache? and how much of
> a performance impact is this?  I'm sure there's a good reason why you can't
> really use the cache if you have a null column, just was curious of the
> implications.

attcacheoff can only be set positive for fields preceding any varlena
(typlen<0, but including the first such) or nullable values.  I don't
know how much faster it is with the cache; you can measure it if your
curiosity is strong enough -- just set the first column to nullable.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




attcacheoff can only be set positive for fields preceding any varlena
(typlen<0, but including the first such) or nullable values.  I don't
know how much faster it is with the cache; you can measure it if your
curiosity is strong enough -- just set the first column to nullable.


Thanks!  Maybe I'll do some benchmarks.

On 2/15/17 1:37 PM, Ryan Murphy wrote:
>
>     attcacheoff can only be set positive for fields preceding any varlena
>     (typlen<0, but including the first such) or nullable values.  I don't
>     know how much faster it is with the cache; you can measure it if your
>     curiosity is strong enough -- just set the first column to nullable.
>
>
> Thanks!  Maybe I'll do some benchmarks.

You'll probably want to do those at a C level, bypassing the executor. I 
would guess that executor overhead will completely swamp the effect of 
the cache in most cases.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



On Sat, Feb 18, 2017 at 12:33 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 2/15/17 1:37 PM, Ryan Murphy wrote:
>>     attcacheoff can only be set positive for fields preceding any varlena
>>     (typlen<0, but including the first such) or nullable values.  I don't
>>     know how much faster it is with the cache; you can measure it if your
>>     curiosity is strong enough -- just set the first column to nullable.
>>
>> Thanks!  Maybe I'll do some benchmarks.
>
> You'll probably want to do those at a C level, bypassing the executor. I
> would guess that executor overhead will completely swamp the effect of the
> cache in most cases.

That seems like it's kind of missing the point.  If the tupleDesc
cache saves so little that it's irrelevant when tested through the
executor, it's not a very useful cache.  I bet that's not the case,
though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




> You'll probably want to do those at a C level, bypassing the executor. I
> would guess that executor overhead will completely swamp the effect of the
> cache in most cases.

That seems like it's kind of missing the point.  If the tupleDesc
cache saves so little that it's irrelevant when tested through the
executor, it's not a very useful cache.  I bet that's not the case,
though.

Thank you both for your insight.  I'll probably hold off on the benchmarks for right now.  I didn't have a production reason to worry about the cache, just getting more familiar with the codebase.  Thanks again!