Thread: Re: pgstattuple: fix free space calculation

Re: pgstattuple: fix free space calculation

From
Rafia Sabih
Date:


On Thu, 22 Aug 2024 at 10:11, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
Hello,

I think that pgstattuple should use PageGetExactFreeSpace() instead of
PageGetHeapFreeSpace() or PageGetFreeSpace(). The latter two compute the
free space minus the space of a line pointer. They are used like this in
the rest of the code (heapam.c):

pagefree = PageGetHeapFreeSpace(page);

if (newtupsize > pagefree) { we need a another page for the tuple }

... so it makes sense to take the line pointer into account in this context.

But it in the pgstattuple context, I think we want the exact free space.

I have attached a patch.

Best regards,
Frédéric
 
I agree with the approach here.
A minor comment here is to change the comments in code referring to the PageGetHeapFreeSpace.

--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -111,7 +111,7 @@ statapprox_heap(Relation rel, output_type *stat)
                 * treat them as being free space for our purposes.
                 */
                if (!PageIsNew(page))
-                       stat->free_space += PageGetHeapFreeSpace(page);
+                       stat->free_space += PageGetExactFreeSpace(page);

--
Regards,
Rafia Sabih

Re: pgstattuple: fix free space calculation

From
Rafia Sabih
Date:


On Fri, 23 Aug 2024 at 11:01, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:


On 8/22/24 21:56, Rafia Sabih wrote:
> I agree with the approach here.
> A minor comment here is to change the comments in code referring to the
> PageGetHeapFreeSpace.

Thank you Rafia. Here is a v2 patch.

I've also added this to the commit message:

Also, PageGetHeapFreeSpace() will return zero if there are already
MaxHeapTuplesPerPage line pointers in the page and none are free. We
don't want that either, because here we want to keep track of the free
space after a page pruning operation even in the (very unlikely) case
that there are MaxHeapTuplesPerPage line pointers in the page.

On the other hand, this got me thinking about the purpose of this space information. 
If we want to understand that there's still some space for the tuples in a page, then using PageGetExactFreeSpace is not doing justice in case of heap page, because we will not be able to add any more tuples there if there are already MaxHeapTuplesPerPage tuples there.

--
Regards,
Rafia Sabih

Re: pgstattuple: fix free space calculation

From
Frédéric Yhuel
Date:

On 8/23/24 12:02, Rafia Sabih wrote:
> On the other hand, this got me thinking about the purpose of this space 
> information.
> If we want to understand that there's still some space for the tuples in 
> a page, then using PageGetExactFreeSpace is not doing justice in case of 
> heap page, because we will not be able to add any more tuples there if 
> there are already MaxHeapTuplesPerPage tuples there.

We won't be able to add, but we will be able to update a tuple in this 
page. It's hard to test, because I can't fit more than 226 tuples on a 
single page, while MaxHeapTuplesPerPage = 291 on my machine :-)

In any case, IMVHO, pgstattuple shouldn't answer to the question "can I 
add more tuples?". The goal is for educational, introspection or 
debugging purposes, and we want the exact amount of free space.

Best regards,
Frédéric



Re: pgstattuple: fix free space calculation

From
Andreas Karlsson
Date:
On 8/23/24 12:02 PM, Rafia Sabih wrote:> On the other hand, this got me 
thinking about the purpose of this space > information.
> If we want to understand that there's still some space for the tuples in 
> a page, then using PageGetExactFreeSpace is not doing justice in case of 
> heap page, because we will not be able to add any more tuples there if 
> there are already MaxHeapTuplesPerPage tuples there.

I think the new behavior is the more useful one since what if someone 
wants to know the free space since they want to insert two tuples and 
not just one? I do not think the function should assume that the only 
reason someone would want to know the size is because they want to 
insert exactly one new tuple.

I am less certain about what the right behavior on pages where we are 
out of line pointers should be but I am leaning towards that the new 
behvior is better than the old but can see a case for either.

Tested the patch and it works as advertised.

Andreas




Re: pgstattuple: fix free space calculation

From
Andreas Karlsson
Date:
On 8/29/24 4:53 PM, Frédéric Yhuel wrote:
> So I think we should just use PageGetExactFreeSpace().

I agree, I feel that is the least surprising behavior because we 
currently sum tiny amounts of free space that is unusable anyway. E.g. 
imagine one million pages with 10 free bytes each, that looks like 10 
free MB so I do not see why we should treat the max tuples per page case 
with any special logic.

Andreas



Re: pgstattuple: fix free space calculation

From
Rafia Sabih
Date:


On Thu, 29 Aug 2024 at 16:53, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:


On 8/23/24 12:51, Frédéric Yhuel wrote:
>
>
> On 8/23/24 12:02, Rafia Sabih wrote:
>> On the other hand, this got me thinking about the purpose of this
>> space information.
>> If we want to understand that there's still some space for the tuples
>> in a page, then using PageGetExactFreeSpace is not doing justice in
>> case of heap page, because we will not be able to add any more tuples
>> there if there are already MaxHeapTuplesPerPage tuples there.
>
> We won't be able to add, but we will be able to update a tuple in this
> page.

Sorry, that's not true.

So in this marginal case we have free space that's unusable in practice.
No INSERT or UPDATE (HOT or not) is possible inside the page.

I don't know what pgstattuple should do in this case.

However, we should never encounter this case in practice (maybe on some
exotic architectures with strange alignment behavior?). As I said, I
can't fit more than 226 tuples per page on my machine, while
MaxHeapTuplesPerPage is 291. Am I missing something?

Besides, pgstattuple isn't mission critical, is it?
 
Yes, also as stated before I am not sure of the utility of this field in real-world scenarios.
So, I can not comment more on that. That was just one thought that popped into my head.
Otherwise, the idea seems fine to me.

So I think we should just use PageGetExactFreeSpace().

Here is a v3 patch. It's the same as v2, I only removed the last
paragraph in the commit message.

Thanks for the new patch. LGTM.
 

Thank you Rafia and Andreas for your review and test.
Thanks to you too.

Best regards,
Frédéric


--
Regards,
Rafia Sabih

Re: pgstattuple: fix free space calculation

From
Tom Lane
Date:
Rafia Sabih <rafia.pghackers@gmail.com> writes:
> On Thu, 29 Aug 2024 at 16:53, Frédéric Yhuel <frederic.yhuel@dalibo.com>
> wrote:
>> So I think we should just use PageGetExactFreeSpace().
>> 
>> Here is a v3 patch. It's the same as v2, I only removed the last
>> paragraph in the commit message.

> Thanks for the new patch. LGTM.

I looked at this patch.  I agree with making the change.  However,
I don't agree with the CF entry's marking of "target version: stable"
(i.e., requesting back-patch).  I think this falls somewhere in the
gray area between a bug fix and a definitional change.  Also, people
are unlikely to be happy if they suddenly get new, not-comparable
numbers after a minor version update.  So I think we should just fix
it in HEAD.

As far as the patch itself goes, the one thing that is bothering me
is this comment change

         /*
-         * It's not safe to call PageGetHeapFreeSpace() on new pages, so we
+         * It's not safe to call PageGetExactFreeSpace() on new pages, so we
          * treat them as being free space for our purposes.
          */

which looks like it wasn't made with a great deal of thought.
Now it seems to me that the comment was already bogus when written:
there isn't anything uncertain about what will happen if you call
either of these functions on a "new" page.  PageIsNew checks for

        return ((PageHeader) page)->pd_upper == 0;

If pd_upper is 0, PageGet[Exact]FreeSpace is absolutely guaranteed
to return zero, even if pd_lower contains garbage.  And then
PageGetHeapFreeSpace will likewise return zero.  Perhaps there
could be trouble if we got into the line-pointer-checking part
of PageGetHeapFreeSpace, but we can't.  So this comment is wrong,
and is even more obviously wrong after the above change.  I thought
for a moment about removing the PageIsNew test altogether, but
then I decided that it probably *is* what we want and is just
mis-explained.  I think the comment should read more like

        /*
         * PageGetExactFreeSpace() will return zero for a "new" page,
         * but it's actually usable free space, so count it that way.
         */

Now alternatively you could argue that a "new" page isn't usable free
space yet and so we should count it as zero, just as we don't count
dead tuples as usable free space.  You need VACUUM to turn either of
those things into real free space.  But that'd be a bigger definitional
change, and I'm not sure we want it.  Thoughts?

Also, do we need any documentation change for this?  I looked through
https://www.postgresql.org/docs/devel/pgstattuple.html
and didn't see anything that was being very specific about what
"free space" means, so maybe it's fine as-is.

            regards, tom lane



Re: pgstattuple: fix free space calculation

From
Tom Lane
Date:
I wrote:
> Now alternatively you could argue that a "new" page isn't usable free
> space yet and so we should count it as zero, just as we don't count
> dead tuples as usable free space.  You need VACUUM to turn either of
> those things into real free space.  But that'd be a bigger definitional
> change, and I'm not sure we want it.  Thoughts?

On the third hand: the code in question is in statapprox_heap, which
is presumably meant to deliver numbers comparable to pgstat_heap.
And pgstat_heap takes no special care for "new" pages, it just applies
PageGetHeapFreeSpace (or PageGetExactFreeSpace after this patch).
So that leaves me feeling pretty strongly that this whole stanza
is wrong and we should just do PageGetExactFreeSpace here.

            regards, tom lane



Re: pgstattuple: fix free space calculation

From
Frédéric Yhuel
Date:
Hi Tom, thanks for your review.

On 9/7/24 22:10, Tom Lane wrote:
> I looked at this patch.  I agree with making the change.  However,
> I don't agree with the CF entry's marking of "target version: stable"
> (i.e., requesting back-patch).  I think this falls somewhere in the
> gray area between a bug fix and a definitional change.  Also, people
> are unlikely to be happy if they suddenly get new, not-comparable
> numbers after a minor version update.  So I think we should just fix
> it in HEAD.
>

OK, I did the change.

> As far as the patch itself goes, the one thing that is bothering me
> is this comment change
> 
>           /*
> -         * It's not safe to call PageGetHeapFreeSpace() on new pages, so we
> +         * It's not safe to call PageGetExactFreeSpace() on new pages, so we
>            * treat them as being free space for our purposes.
>            */
> 
> which looks like it wasn't made with a great deal of thought.
> Now it seems to me that the comment was already bogus when written:
> there isn't anything uncertain about what will happen if you call
> either of these functions on a "new" page.  PageIsNew checks for
> 
>          return ((PageHeader) page)->pd_upper == 0;
> 
> If pd_upper is 0, PageGet[Exact]FreeSpace is absolutely guaranteed
> to return zero, even if pd_lower contains garbage.  And then

Indeed. I failed to notice that LocationIndex was an unsigned int, so I 
thought that pg_upper - pd_upper could be positive with garbage in pg_upper.

> PageGetHeapFreeSpace will likewise return zero.  Perhaps there
> could be trouble if we got into the line-pointer-checking part
> of PageGetHeapFreeSpace, but we can't.  So this comment is wrong,
> and is even more obviously wrong after the above change.  I thought
> for a moment about removing the PageIsNew test altogether, but
> then I decided that it probably*is*  what we want and is just
> mis-explained.  I think the comment should read more like
> 
>          /*
>           * PageGetExactFreeSpace() will return zero for a "new" page,
>           * but it's actually usable free space, so count it that way.
>           */
> 
> Now alternatively you could argue that a "new" page isn't usable free
> space yet and so we should count it as zero, just as we don't count
> dead tuples as usable free space.  You need VACUUM to turn either of
> those things into real free space.  But that'd be a bigger definitional
> change, and I'm not sure we want it.  Thoughts?
> 
> Also, do we need any documentation change for this?  I looked through
> https://www.postgresql.org/docs/devel/pgstattuple.html
> and didn't see anything that was being very specific about what
> "free space" means, so maybe it's fine as-is.

It's not easy. Maybe something like this?

"For any initialized page, free space refers to anything that isn't page 
metadata (header and special), a line pointer or a tuple pointed to by a 
valid line pointer. In particular, a dead tuple is not free space 
because there's still a valid line pointer pointer pointing to it, until 
VACUUM or some other maintenance mechanism (e.g. page pruning) cleans up 
the page. A dead line pointer is not free space either, but the tuple it 
points to has become free space. An unused line pointer could be 
considered free space, but pgstattuple doesn't take it into account."



Re: pgstattuple: fix free space calculation

From
Tom Lane
Date:
=?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes:
> v4 patch attached.

LGTM, pushed.

            regards, tom lane