Thread: Avoiding "wrong tuple length" errors at the end of VACUUM on pg_database update (Backpatch of 947789f to v12 and v13)
Avoiding "wrong tuple length" errors at the end of VACUUM on pg_database update (Backpatch of 947789f to v12 and v13)
From
Michael Paquier
Date:
Hi all, The problem mentioned in $subject has been discussed here: https://www.postgresql.org/message-id/DM5PR0501MB38800D9E4605BCA72DD35557CCE10@DM5PR0501MB3880.namprd05.prod.outlook.com Ths issue has been fixed by 947789f, without a backpatch to v12 (as per 96cdeae) as the risk seemed rather limited seen from here, back when the problem was discussed. Unfortunately, I have seen customer deployments on v12 and v13 playing with pg_database entries large enough that they would have toast entries and would be able to trigger the problem fixed in v14 at the end of a vacuum. Any objections about getting 947789f applied to REL_13_STABLE and REL_12_STABLE and see this issue completely gone from all the versions supported? Thanks, -- Michael
Attachment
Re: Avoiding "wrong tuple length" errors at the end of VACUUM on pg_database update (Backpatch of 947789f to v12 and v13)
From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes: > Any objections about getting 947789f applied to REL_13_STABLE and > REL_12_STABLE and see this issue completely gone from all the versions > supported? No objections to back-patching the fix, but I wonder if we can find some mechanical way to prevent this sort of error in future. It's surely far from obvious that we need to apply heap_inplace_update to a raw tuple rather than a syscache entry. A partial fix perhaps could be to verify that the supplied tuple is the same length as what we see on-disk? It's partial because it'd only trigger if there had actually been a toasted-field expansion, so it'd most likely not catch such coding errors during developer testing. regards, tom lane
Re: Avoiding "wrong tuple length" errors at the end of VACUUM on pg_database update (Backpatch of 947789f to v12 and v13)
From
Nathan Bossart
Date:
On Tue, Jan 10, 2023 at 02:57:43AM -0500, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: >> Any objections about getting 947789f applied to REL_13_STABLE and >> REL_12_STABLE and see this issue completely gone from all the versions >> supported? > > No objections to back-patching the fix... +1 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoiding "wrong tuple length" errors at the end of VACUUM on pg_database update (Backpatch of 947789f to v12 and v13)
From
Andres Freund
Date:
Hi, On 2023-01-10 02:57:43 -0500, Tom Lane wrote: > No objections to back-patching the fix, but I wonder if we can find > some mechanical way to prevent this sort of error in future. What about a define that forces external toasting very aggressively for catalog tables, iff they have a toast table? I suspect doing so for non-catalog tables as well would trigger test changes. Running a buildfarm animal with that would at least make issues like this much easier to discover. Greetings, Andres Freund
Re: Avoiding "wrong tuple length" errors at the end of VACUUM on pg_database update (Backpatch of 947789f to v12 and v13)
From
Michael Paquier
Date:
On Tue, Jan 10, 2023 at 09:54:31AM -0800, Nathan Bossart wrote: > +1 Okay, thanks. Done this part as of c0ee694 and 72b6098, then. -- Michael
Attachment
Re: Avoiding "wrong tuple length" errors at the end of VACUUM on pg_database update (Backpatch of 947789f to v12 and v13)
From
Michael Paquier
Date:
On Tue, Jan 10, 2023 at 11:05:04AM -0800, Andres Freund wrote: > What about a define that forces external toasting very aggressively for > catalog tables, iff they have a toast table? I suspect doing so for > non-catalog tables as well would trigger test changes. Running a buildfarm > animal with that would at least make issues like this much easier to discover. Hmm. That could work. I guess that you mean to do something like that in SearchSysCacheCopy() when we build the tuple copy. There is an access to the where the cacheId, meaning that we know the catalog involved. Still, we would need a lookup at its pg_class entry to check after a reltoastrelid, meaning an extra relation opened, which would be fine under a specific #define, anyway.. -- Michael