Thread: PageGetMaxOffsetNumber on uninitialized pages

PageGetMaxOffsetNumber on uninitialized pages

From

Tom Lane

Date:

03 June 2004, 12:29:48

I was just looking at this macro:

/** PageGetMaxOffsetNumber*        Returns the maximum offset number used by the given page.*        Since offset
numbersare 1-based, this is also the number*        of items on the page.**        NOTE: to ensure sane behavior if the
pageis not initialized*        (pd_lower == 0), cast the unsigned values to int before dividing.*        That way we
get-1 or so, not a huge positive number...*/
 
#define PageGetMaxOffsetNumber(page) \   (((int) (((PageHeader) (page))->pd_lower - SizeOfPageHeaderData)) \    /
((int)sizeof(ItemIdData)))
 

The macro does the right thing on its own terms when applied to a zeroed
page, but in some places it's used like this:
   OffsetNumber n;   OffsetNumber maxoff;
   maxoff = PageGetMaxOffsetNumber(p);   for (n = FirstOffsetNumber; n <= maxoff; n++)

and OffsetNumber is uint16 not int.  So a loop like this would go nuts
instead of treating the zeroed page as if it were empty.  This is not
good (see the comments for PageHeaderIsValid in bufpage.c if you
disremember why).

We could fix this by changing the declarations of the "maxoff" variables
to int, but I think it's probably cleaner to recode
PageGetMaxOffsetNumber like so:

#define PageGetMaxOffsetNumber(page) \   (((PageHeader) (page))->pd_lower <= SizeOfPageHeaderData ? 0 : \
((((PageHeader)(page))->pd_lower - SizeOfPageHeaderData) \     / sizeof(ItemIdData)))
 

This means evaluating the macro argument twice which is not real good,
but in all existing uses the argument is just a simple variable, so
no harm done.

Anyone see a better way?
        regards, tom lane

Re: PageGetMaxOffsetNumber on uninitialized pages

From

Gaetano Mendola

Date:

04 June 2004, 21:17:46

Tom Lane wrote:

> I was just looking at this macro:
> 
> /*
>  * PageGetMaxOffsetNumber
>  *        Returns the maximum offset number used by the given page.
>  *        Since offset numbers are 1-based, this is also the number
>  *        of items on the page.
>  *
>  *        NOTE: to ensure sane behavior if the page is not initialized
>  *        (pd_lower == 0), cast the unsigned values to int before dividing.
>  *        That way we get -1 or so, not a huge positive number...
>  */
> #define PageGetMaxOffsetNumber(page) \
>     (((int) (((PageHeader) (page))->pd_lower - SizeOfPageHeaderData)) \
>      / ((int) sizeof(ItemIdData)))
> 
> The macro does the right thing on its own terms when applied to a zeroed
> page, but in some places it's used like this:
> 
>     OffsetNumber n;
>     OffsetNumber maxoff;
> 
>     maxoff = PageGetMaxOffsetNumber(p);
>     for (n = FirstOffsetNumber; n <= maxoff; n++)
> 
> and OffsetNumber is uint16 not int.  So a loop like this would go nuts
> instead of treating the zeroed page as if it were empty.  This is not
> good (see the comments for PageHeaderIsValid in bufpage.c if you
> disremember why).
> 
> We could fix this by changing the declarations of the "maxoff" variables
> to int, but I think it's probably cleaner to recode
> PageGetMaxOffsetNumber like so:
> 
> #define PageGetMaxOffsetNumber(page) \
>     (((PageHeader) (page))->pd_lower <= SizeOfPageHeaderData ? 0 : \
>      ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \
>       / sizeof(ItemIdData)))

Well I think that is safe change:


a <= b ? 0 : ( a-b ) /c

in

max( 0, (a-b)/c )



so I think (not tested) you can rewrite that macro in:

#define PageGetMaxOffsetNumber(page) \    (max(0, ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \       /
sizeof(ItemIdData))))



Regards
Gaeatano Mendola

Re: PageGetMaxOffsetNumber on uninitialized pages

From

Gaetano Mendola

Date:

04 June 2004, 21:17:47

Gaetano Mendola wrote:
> Tom Lane wrote:
>>
>> We could fix this by changing the declarations of the "maxoff" variables
>> to int, but I think it's probably cleaner to recode
>> PageGetMaxOffsetNumber like so:
>>
>> #define PageGetMaxOffsetNumber(page) \
>>     (((PageHeader) (page))->pd_lower <= SizeOfPageHeaderData ? 0 : \
>>      ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \
>>       / sizeof(ItemIdData)))
> 
> 
> Well I think that is safe change:
> 
> 
> a <= b ? 0 : ( a-b ) /c
> 
> in
> 
> max( 0, (a-b)/c )
> 
> 
> 
> so I think (not tested) you can rewrite that macro in:
> 
> #define PageGetMaxOffsetNumber(page) \
>     (max(0, ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \
>        / sizeof(ItemIdData))))

Hi all,
no reply yet! I did this post in a provocative way.
Let me explain.

I know that most probably the max function is written in this way:

int max(int a, int b) { return a < b ? b : a; }

so this means obtain almost the Tom's proposal.

I seen that usually postgres rpm distributed code, I think a big percentage
of postgres installation is used by an rpm, is compiled without taking
care of the architecture. Am I wrong ?

make now some benchmark using these two implementation:

(a) int max(int a, int b) { return a < b ? b : a; }

or this unusual version:

(b) int max(int a, int b) { int i = -(a > b); return (a & i)|(b & ~i); }

make an order of 10E6 maxing compiling your program without specify
the -march parameter.
Do the same specifying if you can -march=pentium3 or -march=pentium4

what I see on my pentiumIII is 100% of improvement, I didn't believe this
improvement just avoid ( I think ) dead branch,  specifying the architecture.
So, am I hand waving/red herring ? May be yes, but my conclusion (wrong as
always in this list :-) ) is: if we don't specify the architecture as we do => it's better use the nifty ( IMHO ) max
implementation=> is better write

my suggested macro ( with the opposite max implementation of course ).

Regards
Gaetano Mendola