Hi Ayush,
On 8/13/24 07:37, Ayush Tiwari wrote:
> Hi hackers,
>
> We encountered an issue lately, that if the database grants too many
> roles `datacl` is toasted, following which, the drop database command
> will fail with error "wrong tuple length".
>
> To reproduce the issue, please follow below steps:
>
> CREATE DATABASE test;
>
> -- create helper function
> CREATE OR REPLACE FUNCTION data_tuple() returns text as $body$
> declare
> mycounter int;
> begin
> for mycounter in select i from generate_series(1,2000) i loop
> execute 'CREATE
> ROLE aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
> execute 'GRANT ALL ON DATABASE test to
> aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbb ' || mycounter;
> end loop;
> return 'ok';
> end;
> $body$ language plpgsql volatile strict;
>
> -- create roles and grant on the database.
> SELECT data_tuple();
>
> -- drop database command, this will result in "wrong tuple length" error.
> DROP DATABASE test;
>
> The root cause of this behaviour is that the HeapTuple in dropdb
> function fetches a copy of pg_database tuple from system cache.
> But the system cache flattens any toast attributes, which cause the
> length check to fail in heap_inplace_update.
>
> A patch for this issue is attached to the mail, the solution is to
> change the logic to fetch the tuple by directly scanning pg_database
> rather than using the catcache.
>
Thanks for the report. I can reproduce the issue following your
instructions, and the fix seems reasonable ...
But there's also one thing I don't quite understand. I did look for
other places that might have a similar issue, that is places that
1) lookup tuple using SearchSysCacheCopy1
2) call on the tuple heap_inplace_update
And I found about four places doing that:
- index_update_stats (src/backend/catalog/index.c)
- create_toast_table (src/backend/catalog/toasting.c)
- vac_update_relstats / vac_update_datfrozenxid (commands/vacuum.c)
But I haven't managed to trigger the same kind of failure for any of
those places, despite trying. AFAIK that's because those places update
pg_class, and that doesn't have TOAST, so the tuple length can't change.
So this fix seems reasonable.
--
Tomas Vondra