Thread: Re: pgstattuple: fix free space calculation
On Thu, 22 Aug 2024 at 10:11, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
Hello,
I think that pgstattuple should use PageGetExactFreeSpace() instead of
PageGetHeapFreeSpace() or PageGetFreeSpace(). The latter two compute the
free space minus the space of a line pointer. They are used like this in
the rest of the code (heapam.c):
pagefree = PageGetHeapFreeSpace(page);
if (newtupsize > pagefree) { we need a another page for the tuple }
... so it makes sense to take the line pointer into account in this context.
But it in the pgstattuple context, I think we want the exact free space.
I have attached a patch.
Best regards,
Frédéric
A minor comment here is to change the comments in code referring to the PageGetHeapFreeSpace.
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -111,7 +111,7 @@ statapprox_heap(Relation rel, output_type *stat)
* treat them as being free space for our purposes.
*/
if (!PageIsNew(page))
- stat->free_space += PageGetHeapFreeSpace(page);
+ stat->free_space += PageGetExactFreeSpace(page);
Regards,
Rafia Sabih
Rafia Sabih
On Fri, 23 Aug 2024 at 11:01, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
On 8/22/24 21:56, Rafia Sabih wrote:
> I agree with the approach here.
> A minor comment here is to change the comments in code referring to the
> PageGetHeapFreeSpace.
Thank you Rafia. Here is a v2 patch.
I've also added this to the commit message:
Also, PageGetHeapFreeSpace() will return zero if there are already
MaxHeapTuplesPerPage line pointers in the page and none are free. We
don't want that either, because here we want to keep track of the free
space after a page pruning operation even in the (very unlikely) case
that there are MaxHeapTuplesPerPage line pointers in the page.
On the other hand, this got me thinking about the purpose of this space information.
If we want to understand that there's still some space for the tuples in a page, then using PageGetExactFreeSpace is not doing justice in case of heap page, because we will not be able to add any more tuples there if there are already MaxHeapTuplesPerPage tuples there.
Regards,
Rafia Sabih
Rafia Sabih
On 8/23/24 12:02, Rafia Sabih wrote: > On the other hand, this got me thinking about the purpose of this space > information. > If we want to understand that there's still some space for the tuples in > a page, then using PageGetExactFreeSpace is not doing justice in case of > heap page, because we will not be able to add any more tuples there if > there are already MaxHeapTuplesPerPage tuples there. We won't be able to add, but we will be able to update a tuple in this page. It's hard to test, because I can't fit more than 226 tuples on a single page, while MaxHeapTuplesPerPage = 291 on my machine :-) In any case, IMVHO, pgstattuple shouldn't answer to the question "can I add more tuples?". The goal is for educational, introspection or debugging purposes, and we want the exact amount of free space. Best regards, Frédéric
On 8/23/24 12:02 PM, Rafia Sabih wrote:> On the other hand, this got me thinking about the purpose of this space > information. > If we want to understand that there's still some space for the tuples in > a page, then using PageGetExactFreeSpace is not doing justice in case of > heap page, because we will not be able to add any more tuples there if > there are already MaxHeapTuplesPerPage tuples there. I think the new behavior is the more useful one since what if someone wants to know the free space since they want to insert two tuples and not just one? I do not think the function should assume that the only reason someone would want to know the size is because they want to insert exactly one new tuple. I am less certain about what the right behavior on pages where we are out of line pointers should be but I am leaning towards that the new behvior is better than the old but can see a case for either. Tested the patch and it works as advertised. Andreas
On 8/29/24 4:53 PM, Frédéric Yhuel wrote: > So I think we should just use PageGetExactFreeSpace(). I agree, I feel that is the least surprising behavior because we currently sum tiny amounts of free space that is unusable anyway. E.g. imagine one million pages with 10 free bytes each, that looks like 10 free MB so I do not see why we should treat the max tuples per page case with any special logic. Andreas
On Thu, 29 Aug 2024 at 16:53, Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
On 8/23/24 12:51, Frédéric Yhuel wrote:
>
>
> On 8/23/24 12:02, Rafia Sabih wrote:
>> On the other hand, this got me thinking about the purpose of this
>> space information.
>> If we want to understand that there's still some space for the tuples
>> in a page, then using PageGetExactFreeSpace is not doing justice in
>> case of heap page, because we will not be able to add any more tuples
>> there if there are already MaxHeapTuplesPerPage tuples there.
>
> We won't be able to add, but we will be able to update a tuple in this
> page.
Sorry, that's not true.
So in this marginal case we have free space that's unusable in practice.
No INSERT or UPDATE (HOT or not) is possible inside the page.
I don't know what pgstattuple should do in this case.
However, we should never encounter this case in practice (maybe on some
exotic architectures with strange alignment behavior?). As I said, I
can't fit more than 226 tuples per page on my machine, while
MaxHeapTuplesPerPage is 291. Am I missing something?
Besides, pgstattuple isn't mission critical, is it?
Yes, also as stated before I am not sure of the utility of this field in real-world scenarios.
So, I can not comment more on that. That was just one thought that popped into my head.
Otherwise, the idea seems fine to me.
So I think we should just use PageGetExactFreeSpace().
Here is a v3 patch. It's the same as v2, I only removed the last
paragraph in the commit message.
Thanks for the new patch. LGTM.
Thank you Rafia and Andreas for your review and test.
Thanks to you too.
Best regards,
Frédéric
Regards,
Rafia Sabih
Rafia Sabih
Rafia Sabih <rafia.pghackers@gmail.com> writes: > On Thu, 29 Aug 2024 at 16:53, Frédéric Yhuel <frederic.yhuel@dalibo.com> > wrote: >> So I think we should just use PageGetExactFreeSpace(). >> >> Here is a v3 patch. It's the same as v2, I only removed the last >> paragraph in the commit message. > Thanks for the new patch. LGTM. I looked at this patch. I agree with making the change. However, I don't agree with the CF entry's marking of "target version: stable" (i.e., requesting back-patch). I think this falls somewhere in the gray area between a bug fix and a definitional change. Also, people are unlikely to be happy if they suddenly get new, not-comparable numbers after a minor version update. So I think we should just fix it in HEAD. As far as the patch itself goes, the one thing that is bothering me is this comment change /* - * It's not safe to call PageGetHeapFreeSpace() on new pages, so we + * It's not safe to call PageGetExactFreeSpace() on new pages, so we * treat them as being free space for our purposes. */ which looks like it wasn't made with a great deal of thought. Now it seems to me that the comment was already bogus when written: there isn't anything uncertain about what will happen if you call either of these functions on a "new" page. PageIsNew checks for return ((PageHeader) page)->pd_upper == 0; If pd_upper is 0, PageGet[Exact]FreeSpace is absolutely guaranteed to return zero, even if pd_lower contains garbage. And then PageGetHeapFreeSpace will likewise return zero. Perhaps there could be trouble if we got into the line-pointer-checking part of PageGetHeapFreeSpace, but we can't. So this comment is wrong, and is even more obviously wrong after the above change. I thought for a moment about removing the PageIsNew test altogether, but then I decided that it probably *is* what we want and is just mis-explained. I think the comment should read more like /* * PageGetExactFreeSpace() will return zero for a "new" page, * but it's actually usable free space, so count it that way. */ Now alternatively you could argue that a "new" page isn't usable free space yet and so we should count it as zero, just as we don't count dead tuples as usable free space. You need VACUUM to turn either of those things into real free space. But that'd be a bigger definitional change, and I'm not sure we want it. Thoughts? Also, do we need any documentation change for this? I looked through https://www.postgresql.org/docs/devel/pgstattuple.html and didn't see anything that was being very specific about what "free space" means, so maybe it's fine as-is. regards, tom lane
I wrote: > Now alternatively you could argue that a "new" page isn't usable free > space yet and so we should count it as zero, just as we don't count > dead tuples as usable free space. You need VACUUM to turn either of > those things into real free space. But that'd be a bigger definitional > change, and I'm not sure we want it. Thoughts? On the third hand: the code in question is in statapprox_heap, which is presumably meant to deliver numbers comparable to pgstat_heap. And pgstat_heap takes no special care for "new" pages, it just applies PageGetHeapFreeSpace (or PageGetExactFreeSpace after this patch). So that leaves me feeling pretty strongly that this whole stanza is wrong and we should just do PageGetExactFreeSpace here. regards, tom lane
Hi Tom, thanks for your review. On 9/7/24 22:10, Tom Lane wrote: > I looked at this patch. I agree with making the change. However, > I don't agree with the CF entry's marking of "target version: stable" > (i.e., requesting back-patch). I think this falls somewhere in the > gray area between a bug fix and a definitional change. Also, people > are unlikely to be happy if they suddenly get new, not-comparable > numbers after a minor version update. So I think we should just fix > it in HEAD. > OK, I did the change. > As far as the patch itself goes, the one thing that is bothering me > is this comment change > > /* > - * It's not safe to call PageGetHeapFreeSpace() on new pages, so we > + * It's not safe to call PageGetExactFreeSpace() on new pages, so we > * treat them as being free space for our purposes. > */ > > which looks like it wasn't made with a great deal of thought. > Now it seems to me that the comment was already bogus when written: > there isn't anything uncertain about what will happen if you call > either of these functions on a "new" page. PageIsNew checks for > > return ((PageHeader) page)->pd_upper == 0; > > If pd_upper is 0, PageGet[Exact]FreeSpace is absolutely guaranteed > to return zero, even if pd_lower contains garbage. And then Indeed. I failed to notice that LocationIndex was an unsigned int, so I thought that pg_upper - pd_upper could be positive with garbage in pg_upper. > PageGetHeapFreeSpace will likewise return zero. Perhaps there > could be trouble if we got into the line-pointer-checking part > of PageGetHeapFreeSpace, but we can't. So this comment is wrong, > and is even more obviously wrong after the above change. I thought > for a moment about removing the PageIsNew test altogether, but > then I decided that it probably*is* what we want and is just > mis-explained. I think the comment should read more like > > /* > * PageGetExactFreeSpace() will return zero for a "new" page, > * but it's actually usable free space, so count it that way. > */ > > Now alternatively you could argue that a "new" page isn't usable free > space yet and so we should count it as zero, just as we don't count > dead tuples as usable free space. You need VACUUM to turn either of > those things into real free space. But that'd be a bigger definitional > change, and I'm not sure we want it. Thoughts? > > Also, do we need any documentation change for this? I looked through > https://www.postgresql.org/docs/devel/pgstattuple.html > and didn't see anything that was being very specific about what > "free space" means, so maybe it's fine as-is. It's not easy. Maybe something like this? "For any initialized page, free space refers to anything that isn't page metadata (header and special), a line pointer or a tuple pointed to by a valid line pointer. In particular, a dead tuple is not free space because there's still a valid line pointer pointer pointing to it, until VACUUM or some other maintenance mechanism (e.g. page pruning) cleans up the page. A dead line pointer is not free space either, but the tuple it points to has become free space. An unused line pointer could be considered free space, but pgstattuple doesn't take it into account."
=?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes: > v4 patch attached. LGTM, pushed. regards, tom lane