Thread: PageGetMaxOffsetNumber on uninitialized pages
I was just looking at this macro: /** PageGetMaxOffsetNumber* Returns the maximum offset number used by the given page.* Since offset numbersare 1-based, this is also the number* of items on the page.** NOTE: to ensure sane behavior if the pageis not initialized* (pd_lower == 0), cast the unsigned values to int before dividing.* That way we get-1 or so, not a huge positive number...*/ #define PageGetMaxOffsetNumber(page) \ (((int) (((PageHeader) (page))->pd_lower - SizeOfPageHeaderData)) \ / ((int)sizeof(ItemIdData))) The macro does the right thing on its own terms when applied to a zeroed page, but in some places it's used like this: OffsetNumber n; OffsetNumber maxoff; maxoff = PageGetMaxOffsetNumber(p); for (n = FirstOffsetNumber; n <= maxoff; n++) and OffsetNumber is uint16 not int. So a loop like this would go nuts instead of treating the zeroed page as if it were empty. This is not good (see the comments for PageHeaderIsValid in bufpage.c if you disremember why). We could fix this by changing the declarations of the "maxoff" variables to int, but I think it's probably cleaner to recode PageGetMaxOffsetNumber like so: #define PageGetMaxOffsetNumber(page) \ (((PageHeader) (page))->pd_lower <= SizeOfPageHeaderData ? 0 : \ ((((PageHeader)(page))->pd_lower - SizeOfPageHeaderData) \ / sizeof(ItemIdData))) This means evaluating the macro argument twice which is not real good, but in all existing uses the argument is just a simple variable, so no harm done. Anyone see a better way? regards, tom lane
Tom Lane wrote: > I was just looking at this macro: > > /* > * PageGetMaxOffsetNumber > * Returns the maximum offset number used by the given page. > * Since offset numbers are 1-based, this is also the number > * of items on the page. > * > * NOTE: to ensure sane behavior if the page is not initialized > * (pd_lower == 0), cast the unsigned values to int before dividing. > * That way we get -1 or so, not a huge positive number... > */ > #define PageGetMaxOffsetNumber(page) \ > (((int) (((PageHeader) (page))->pd_lower - SizeOfPageHeaderData)) \ > / ((int) sizeof(ItemIdData))) > > The macro does the right thing on its own terms when applied to a zeroed > page, but in some places it's used like this: > > OffsetNumber n; > OffsetNumber maxoff; > > maxoff = PageGetMaxOffsetNumber(p); > for (n = FirstOffsetNumber; n <= maxoff; n++) > > and OffsetNumber is uint16 not int. So a loop like this would go nuts > instead of treating the zeroed page as if it were empty. This is not > good (see the comments for PageHeaderIsValid in bufpage.c if you > disremember why). > > We could fix this by changing the declarations of the "maxoff" variables > to int, but I think it's probably cleaner to recode > PageGetMaxOffsetNumber like so: > > #define PageGetMaxOffsetNumber(page) \ > (((PageHeader) (page))->pd_lower <= SizeOfPageHeaderData ? 0 : \ > ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \ > / sizeof(ItemIdData))) Well I think that is safe change: a <= b ? 0 : ( a-b ) /c in max( 0, (a-b)/c ) so I think (not tested) you can rewrite that macro in: #define PageGetMaxOffsetNumber(page) \ (max(0, ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \ / sizeof(ItemIdData)))) Regards Gaeatano Mendola
Gaetano Mendola wrote: > Tom Lane wrote: >> >> We could fix this by changing the declarations of the "maxoff" variables >> to int, but I think it's probably cleaner to recode >> PageGetMaxOffsetNumber like so: >> >> #define PageGetMaxOffsetNumber(page) \ >> (((PageHeader) (page))->pd_lower <= SizeOfPageHeaderData ? 0 : \ >> ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \ >> / sizeof(ItemIdData))) > > > Well I think that is safe change: > > > a <= b ? 0 : ( a-b ) /c > > in > > max( 0, (a-b)/c ) > > > > so I think (not tested) you can rewrite that macro in: > > #define PageGetMaxOffsetNumber(page) \ > (max(0, ((((PageHeader) (page))->pd_lower - SizeOfPageHeaderData) \ > / sizeof(ItemIdData)))) Hi all, no reply yet! I did this post in a provocative way. Let me explain. I know that most probably the max function is written in this way: int max(int a, int b) { return a < b ? b : a; } so this means obtain almost the Tom's proposal. I seen that usually postgres rpm distributed code, I think a big percentage of postgres installation is used by an rpm, is compiled without taking care of the architecture. Am I wrong ? make now some benchmark using these two implementation: (a) int max(int a, int b) { return a < b ? b : a; } or this unusual version: (b) int max(int a, int b) { int i = -(a > b); return (a & i)|(b & ~i); } make an order of 10E6 maxing compiling your program without specify the -march parameter. Do the same specifying if you can -march=pentium3 or -march=pentium4 what I see on my pentiumIII is 100% of improvement, I didn't believe this improvement just avoid ( I think ) dead branch, specifying the architecture. So, am I hand waving/red herring ? May be yes, but my conclusion (wrong as always in this list :-) ) is: if we don't specify the architecture as we do => it's better use the nifty ( IMHO ) max implementation=> is better write my suggested macro ( with the opposite max implementation of course ). Regards Gaetano Mendola