Thread: Provide a pg_truncate_freespacemap function

Provide a pg_truncate_freespacemap function

From
Ronan Dunklau
Date:
Hello,

As we are currently experiencing a FSM corruption issue [1], we need to 
rebuild FSM when we detect it. 

I noticed we have something to truncate a visibility map, but nothing for the 
freespace map, so I propose the attached (liberally copied from the VM 
counterpart) to allow to truncate a FSM without incurring downtime, as 
currently our only options are to either VACUUM FULL the table or stop the 
cluster and remove the FSM manually.

Does that seem correct ?


[1]  https://www.postgresql.org/message-id/flat/
1925490.taCxCBeP46%40aivenlaptop#7ace95c95cab17b6d92607e5362984ac

--
Ronan Dunklau




Attachment

Re: Provide a pg_truncate_freespacemap function

From
Stephen Frost
Date:
Greetings,

* Ronan Dunklau (ronan.dunklau@aiven.io) wrote:
> As we are currently experiencing a FSM corruption issue [1], we need to
> rebuild FSM when we detect it.

Ideally, we'd figure out a way to pick up on this and address it without
the user needing to intervene, however ...

> I noticed we have something to truncate a visibility map, but nothing for the
> freespace map, so I propose the attached (liberally copied from the VM
> counterpart) to allow to truncate a FSM without incurring downtime, as
> currently our only options are to either VACUUM FULL the table or stop the
> cluster and remove the FSM manually.

I agree that this would generally be a useful thing to have.

> Does that seem correct ?

Definitely needs to have a 'REVOKE ALL ON FUNCTION' at the end of the
upgrade script, similar to what you'll find at the bottom of
pg_visibility--1.1.sql in the tree today, otherwise anyone could run it.

Beyond that, I'd suggest a function-level comment above the definition
of the function itself (which is where we tend to put those- not at the
point where we declare the function).

Thanks!

Stephen

Attachment

Re: Provide a pg_truncate_freespacemap function

From
Ronan Dunklau
Date:
Le mercredi 6 mars 2024, 20:28:44 CET Stephen Frost a écrit :
> I agree that this would generally be a useful thing to have.

Thanks !

>
> > Does that seem correct ?
>
> Definitely needs to have a 'REVOKE ALL ON FUNCTION' at the end of the
> upgrade script, similar to what you'll find at the bottom of
> pg_visibility--1.1.sql in the tree today, otherwise anyone could run it.
>
> Beyond that, I'd suggest a function-level comment above the definition
> of the function itself (which is where we tend to put those- not at the
> point where we declare the function).

Thank you for the review. Here is an updated patch for both of those.


Best regards,

--
Ronan
Attachment

Re: Provide a pg_truncate_freespacemap function

From
Fujii Masao
Date:

On 2024/03/07 16:59, Ronan Dunklau wrote:
> Le mercredi 6 mars 2024, 20:28:44 CET Stephen Frost a écrit :
>> I agree that this would generally be a useful thing to have.
> 
> Thanks !
> 
>>
>>> Does that seem correct ?
>>
>> Definitely needs to have a 'REVOKE ALL ON FUNCTION' at the end of the
>> upgrade script, similar to what you'll find at the bottom of
>> pg_visibility--1.1.sql in the tree today, otherwise anyone could run it.
>>
>> Beyond that, I'd suggest a function-level comment above the definition
>> of the function itself (which is where we tend to put those- not at the
>> point where we declare the function).
> 
> Thank you for the review. Here is an updated patch for both of those.

Here are my review comments:

The documentation for pg_freespace needs updating.


A regression test for pg_truncate_freespace_map() should be added.


+    /* Only some relkinds have a freespace map */
+    if (!RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind))
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("relation \"%s\" is of wrong relation kind",
+                        RelationGetRelationName(rel)),
+                 errdetail_relkind_not_supported(rel->rd_rel->relkind)));

An index can have an FSM, but this code doesn't account for that.


+        smgrtruncate(RelationGetSmgr(rel), &fork, 1, &block);

Shouldn't truncation be performed after WAL-logging due to the WAL rule?
I'm not sure if the current order might actually cause significant issues
in FSM truncation case, though.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Provide a pg_truncate_freespacemap function

From
Tom Lane
Date:
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>> Le mercredi 6 mars 2024, 20:28:44 CET Stephen Frost a écrit :
>>> I agree that this would generally be a useful thing to have.

Personally, I want to push back on whether this has any legitimate
use-case.  Even if the FSM is corrupt, it should self-heal over
time, and I'm not seeing the argument why truncating it would
speed convergence towards correct values.  Worse, in the interim
where you don't have any FSM, you will suffer table bloat because
insertions will be appended at the end of the table.  So this
looks like a foot-gun, and the patch's lack of user-visible
documentation surely does nothing to make it safer.

(The analogy to pg_truncate_visibility_map seems forced.
If you are in a situation with a trashed visibility map,
you are probably getting wrong query answers, and truncating
the map will make that better.  But a trashed FSM doesn't
result in incorrect output, and zeroing it will make things
worse not better.)

            regards, tom lane



Re: Provide a pg_truncate_freespacemap function

From
Ronan Dunklau
Date:
Le dimanche 21 juillet 2024, 07:39:13 UTC+2 Tom Lane a écrit :
> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> >> Le mercredi 6 mars 2024, 20:28:44 CET Stephen Frost a écrit :
> >>> I agree that this would generally be a useful thing to have.
>

Sorry for the late reply, as I was not available during the late summer.

> Personally, I want to push back on whether this has any legitimate
> use-case.  Even if the FSM is corrupt, it should self-heal over
> time, and I'm not seeing the argument why truncating it would
> speed convergence towards correct values.  Worse, in the interim
> where you don't have any FSM, you will suffer table bloat because
> insertions will be appended at the end of the table.  So this
> looks like a foot-gun, and the patch's lack of user-visible
> documentation surely does nothing to make it safer.

> (The analogy to pg_truncate_visibility_map seems forced.
> If you are in a situation with a trashed visibility map,
> you are probably getting wrong query answers, and truncating
> the map will make that better.  But a trashed FSM doesn't
> result in incorrect output, and zeroing it will make things
> worse not better.)

Now that the other patch for self-healing is in, I agree it may not be that
useful. I'm withdrawing the patch and will keep it in mind if we encounter
other FSM issues in the future.

Best regards,

--
Ronan Dunklau