Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute. - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.
Date
Msg-id 3f705d5f-4085-458a-a47f-dee3472f7beb@vondra.me
Whole thread Raw
In response to Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.  (Ayush Tiwari <ayushtiwari.slg01@gmail.com>)
Responses Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.
List pgsql-hackers
Hi Ayush,

On 8/13/24 07:37, Ayush Tiwari wrote:
> Hi hackers,
> 
> We encountered an issue lately, that if the database grants too many
> roles `datacl` is toasted, following which, the drop database command
> will fail with error "wrong tuple length".
> 
> To reproduce the issue, please follow below steps:
> 
> CREATE DATABASE test;
> 
> -- create helper function
> CREATE OR REPLACE FUNCTION data_tuple() returns text as $body$
> declare
>           mycounter int;
> begin
>           for mycounter in select i from generate_series(1,2000) i loop
>                     execute 'CREATE
> ROLE aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
>                     execute 'GRANT ALL ON DATABASE test to
> aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
>           end loop;
>           return 'ok';
> end;
> $body$ language plpgsql volatile strict; 
> 
> -- create roles and grant on the database.
> SELECT data_tuple(); 
> 
> -- drop database command, this will result in "wrong tuple length" error.
> DROP DATABASE test;
> 
> The root cause of this behaviour is that the HeapTuple in dropdb
> function fetches a copy of pg_database tuple from system cache.
> But the system cache flattens any toast attributes, which cause the
> length check to fail in heap_inplace_update.
> 
> A patch for this issue is attached to the mail, the solution is to
> change the logic to fetch the tuple by directly scanning pg_database
> rather than using the catcache.
> 

Thanks for the report. I can reproduce the issue following your
instructions, and the fix seems reasonable ...


But there's also one thing I don't quite understand. I did look for
other places that might have a similar issue, that is places that

  1) lookup tuple using SearchSysCacheCopy1

  2) call on the tuple heap_inplace_update


And I found about four places doing that:

  - index_update_stats (src/backend/catalog/index.c)

  - create_toast_table (src/backend/catalog/toasting.c)

  - vac_update_relstats / vac_update_datfrozenxid (commands/vacuum.c)


But I haven't managed to trigger the same kind of failure for any of
those places, despite trying. AFAIK that's because those places update
pg_class, and that doesn't have TOAST, so the tuple length can't change.


So this fix seems reasonable.

-- 
Tomas Vondra



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: refactor the CopyOneRowTo
Next
From: Michail Nikolaev
Date:
Subject: Re: Conflict detection and logging in logical replication